All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] drm/i915: per context slice/subslice powergating
@ 2018-04-25 11:45 Lionel Landwerlin
  2018-04-25 11:45 ` [PATCH 1/8] drm/i915: expose helper mapping exec flag engine to intel_engine_cs Lionel Landwerlin
                   ` (11 more replies)
  0 siblings, 12 replies; 50+ messages in thread
From: Lionel Landwerlin @ 2018-04-25 11:45 UTC (permalink / raw)
  To: intel-gfx

Hi all,

This is an update a series that was sent out a few months ago. The end
goal here is to optimize some media workloads.

Here is some information provided by Dmitry (cc) on why we want this :

Video decoding/encoding tends to work with macroblocks, dividing up a
frame into smaller elements. Dependencies exist between those
macroblocks, meaning that they cannot be processed in a random order
and also there is a maximum number of macroblock that can process at a
given time (called wave front).

As a result, some workloads (below a certain resolution) will not make
use of all the GPU's execution units. On a SKLGT4 (3 slices), for a
transcoding workload at a 720x480p, we were able to measure a low
number of active EUs (~3%) with 3 slices enabled. As we reduce the
number of slices used to 1, the percentage of active EUs obviously
increases (~9%). The execution time of the workload also decreases as
we decrease the number of slices used (we measure an up to ~20%
improvement with 1 slice).

It's not clear what speeds up the workload. We currently think that
the power budget is redistributed to other parts (including the CPU)
and that the GPU thread scheduling is also sped up because it doesn't
involve as many slices. We haven't found a way to measure these
assumptions.

Changing the powergating configuration doesn't come free though. We
have some numbers in an IGT benchmark on how much delay is added each
time we switch between 2 contexts of different powergating
configurations. Measurements are in the order of ~50us on SKLGT4 (3
slices) and ~40us on KBLGT3 (2 slices).

Cheers,

Chris Wilson (3):
  drm/i915: Program RPCS for Broadwell
  drm/i915: Record the sseu configuration per-context & engine
  drm/i915: Expose RPCS (SSEU) configuration to userspace

Lionel Landwerlin (5):
  drm/i915: expose helper mapping exec flag engine to intel_engine_cs
  drm/i915: don't specify pinned size for wa_bb pin/allocation
  drm/i915: extract per-ctx/indirect bb programming
  drm/i915: pass wa_ctx as argument
  drm/i915: reprogram NOA muxes on context switch when using perf

 drivers/gpu/drm/i915/i915_drv.h            |   5 +
 drivers/gpu/drm/i915/i915_gem_context.c    | 104 +++++++++-
 drivers/gpu/drm/i915/i915_gem_context.h    |  10 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  18 +-
 drivers/gpu/drm/i915/i915_perf.c           |  92 +++++++-
 drivers/gpu/drm/i915/intel_lrc.c           | 231 ++++++++++++++++-----
 drivers/gpu/drm/i915/intel_lrc.h           |   5 +
 include/uapi/drm/i915_drm.h                |  28 +++
 8 files changed, 419 insertions(+), 74 deletions(-)

--
2.17.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/8] drm/i915: expose helper mapping exec flag engine to intel_engine_cs
  2018-04-25 11:45 [PATCH 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
@ 2018-04-25 11:45 ` Lionel Landwerlin
  2018-04-25 11:50   ` Chris Wilson
  2018-04-25 11:45 ` [PATCH 2/8] drm/i915: Program RPCS for Broadwell Lionel Landwerlin
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 50+ messages in thread
From: Lionel Landwerlin @ 2018-04-25 11:45 UTC (permalink / raw)
  To: intel-gfx

This function will be used later by the per (context,engine) power
programming interface.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            |  3 +++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 18 +++++++++---------
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2c72c596057f..fdf71164ee24 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2895,6 +2895,9 @@ void i915_gem_cleanup_early(struct drm_i915_private *dev_priv);
 void i915_gem_load_init_fences(struct drm_i915_private *dev_priv);
 int i915_gem_freeze(struct drm_i915_private *dev_priv);
 int i915_gem_freeze_late(struct drm_i915_private *dev_priv);
+struct intel_engine_cs *i915_gem_engine_from_flags(struct drm_i915_private *dev_priv,
+						   struct drm_file *file,
+						   u64 flags);
 
 void *i915_gem_object_alloc(struct drm_i915_private *dev_priv);
 void i915_gem_object_free(struct drm_i915_gem_object *obj);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index c74f5df3fb5a..9c13ec39d87b 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -2032,12 +2032,12 @@ static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {
 	[I915_EXEC_VEBOX]	= VECS
 };
 
-static struct intel_engine_cs *
-eb_select_engine(struct drm_i915_private *dev_priv,
-		 struct drm_file *file,
-		 struct drm_i915_gem_execbuffer2 *args)
+struct intel_engine_cs *
+i915_gem_engine_from_flags(struct drm_i915_private *dev_priv,
+			   struct drm_file *file,
+			   u64 flags)
 {
-	unsigned int user_ring_id = args->flags & I915_EXEC_RING_MASK;
+	unsigned int user_ring_id = flags & I915_EXEC_RING_MASK;
 	struct intel_engine_cs *engine;
 
 	if (user_ring_id > I915_USER_RINGS) {
@@ -2046,14 +2046,14 @@ eb_select_engine(struct drm_i915_private *dev_priv,
 	}
 
 	if ((user_ring_id != I915_EXEC_BSD) &&
-	    ((args->flags & I915_EXEC_BSD_MASK) != 0)) {
+	    ((flags & I915_EXEC_BSD_MASK) != 0)) {
 		DRM_DEBUG("execbuf with non bsd ring but with invalid "
-			  "bsd dispatch flags: %d\n", (int)(args->flags));
+			  "bsd dispatch flags: %d\n", (int)(flags));
 		return NULL;
 	}
 
 	if (user_ring_id == I915_EXEC_BSD && HAS_BSD2(dev_priv)) {
-		unsigned int bsd_idx = args->flags & I915_EXEC_BSD_MASK;
+		unsigned int bsd_idx = flags & I915_EXEC_BSD_MASK;
 
 		if (bsd_idx == I915_EXEC_BSD_DEFAULT) {
 			bsd_idx = gen8_dispatch_bsd_engine(dev_priv, file);
@@ -2256,7 +2256,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (args->flags & I915_EXEC_IS_PINNED)
 		eb.batch_flags |= I915_DISPATCH_PINNED;
 
-	eb.engine = eb_select_engine(eb.i915, file, args);
+	eb.engine = i915_gem_engine_from_flags(eb.i915, file, args->flags);
 	if (!eb.engine)
 		return -EINVAL;
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/8] drm/i915: Program RPCS for Broadwell
  2018-04-25 11:45 [PATCH 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
  2018-04-25 11:45 ` [PATCH 1/8] drm/i915: expose helper mapping exec flag engine to intel_engine_cs Lionel Landwerlin
@ 2018-04-25 11:45 ` Lionel Landwerlin
  2018-04-25 11:45 ` [PATCH 3/8] drm/i915: don't specify pinned size for wa_bb pin/allocation Lionel Landwerlin
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 50+ messages in thread
From: Lionel Landwerlin @ 2018-04-25 11:45 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

Currently we only configure the power gating for Skylake and above, but
the configuration should equally apply to Broadwell and Braswell. Even
though, there is not as much variation as for later generations, we want
to expose control over the configuration to userspace and may want to
opt out of the "always-enabled" setting.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 029901a8fa38..407a44a341b9 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2332,13 +2332,6 @@ make_rpcs(struct drm_i915_private *dev_priv)
 {
 	u32 rpcs = 0;
 
-	/*
-	 * No explicit RPCS request is needed to ensure full
-	 * slice/subslice/EU enablement prior to Gen9.
-	*/
-	if (INTEL_GEN(dev_priv) < 9)
-		return 0;
-
 	/*
 	 * Starting in Gen9, render power gating can leave
 	 * slice/subslice/EU in a partially enabled state. We
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 3/8] drm/i915: don't specify pinned size for wa_bb pin/allocation
  2018-04-25 11:45 [PATCH 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
  2018-04-25 11:45 ` [PATCH 1/8] drm/i915: expose helper mapping exec flag engine to intel_engine_cs Lionel Landwerlin
  2018-04-25 11:45 ` [PATCH 2/8] drm/i915: Program RPCS for Broadwell Lionel Landwerlin
@ 2018-04-25 11:45 ` Lionel Landwerlin
  2018-04-25 11:52   ` Chris Wilson
  2018-04-25 11:45 ` [PATCH 4/8] drm/i915: extract per-ctx/indirect bb programming Lionel Landwerlin
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 50+ messages in thread
From: Lionel Landwerlin @ 2018-04-25 11:45 UTC (permalink / raw)
  To: intel-gfx

We can rely on the i915_vma_pin() to use vma->size instead.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 407a44a341b9..3ca5a1d33fe9 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1602,7 +1602,7 @@ static int lrc_setup_wa_ctx(struct intel_engine_cs *engine)
 		goto err;
 	}
 
-	err = i915_vma_pin(vma, 0, PAGE_SIZE, PIN_GLOBAL | PIN_HIGH);
+	err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL | PIN_HIGH);
 	if (err)
 		goto err;
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 4/8] drm/i915: extract per-ctx/indirect bb programming
  2018-04-25 11:45 [PATCH 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (2 preceding siblings ...)
  2018-04-25 11:45 ` [PATCH 3/8] drm/i915: don't specify pinned size for wa_bb pin/allocation Lionel Landwerlin
@ 2018-04-25 11:45 ` Lionel Landwerlin
  2018-04-25 11:45 ` [PATCH 5/8] drm/i915: pass wa_ctx as argument Lionel Landwerlin
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 50+ messages in thread
From: Lionel Landwerlin @ 2018-04-25 11:45 UTC (permalink / raw)
  To: intel-gfx

Let's put this in its own function to reuse it later.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 59 ++++++++++++++++++--------------
 1 file changed, 34 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3ca5a1d33fe9..56515091beb4 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2392,6 +2392,38 @@ static u32 intel_lr_indirect_ctx_offset(struct intel_engine_cs *engine)
 	return indirect_ctx_offset;
 }
 
+static void execlists_init_reg_state_wa_bb(u32 *regs,
+					   struct intel_engine_cs *engine)
+{
+	struct i915_ctx_workarounds *wa_ctx = &engine->wa_ctx;
+	u32 base = engine->mmio_base;
+	u32 ggtt_offset;
+
+	if (!wa_ctx->vma)
+		return;
+
+	ggtt_offset = i915_ggtt_offset(wa_ctx->vma);
+
+	CTX_REG(regs, CTX_RCS_INDIRECT_CTX, RING_INDIRECT_CTX(base), 0);
+	CTX_REG(regs, CTX_RCS_INDIRECT_CTX_OFFSET,
+		RING_INDIRECT_CTX_OFFSET(base), 0);
+	if (wa_ctx->indirect_ctx.size) {
+		regs[CTX_RCS_INDIRECT_CTX + 1] =
+			(ggtt_offset + wa_ctx->indirect_ctx.offset) |
+			(wa_ctx->indirect_ctx.size / CACHELINE_BYTES);
+
+		regs[CTX_RCS_INDIRECT_CTX_OFFSET + 1] =
+			intel_lr_indirect_ctx_offset(engine) << 6;
+	}
+
+	CTX_REG(regs, CTX_BB_PER_CTX_PTR, RING_BB_PER_CTX_PTR(base), 0);
+	if (wa_ctx->per_ctx.size) {
+
+		regs[CTX_BB_PER_CTX_PTR + 1] =
+			(ggtt_offset + wa_ctx->per_ctx.offset) | 0x01;
+	}
+}
+
 static void execlists_init_reg_state(u32 *regs,
 				     struct i915_gem_context *ctx,
 				     struct intel_engine_cs *engine,
@@ -2429,31 +2461,8 @@ static void execlists_init_reg_state(u32 *regs,
 	CTX_REG(regs, CTX_SECOND_BB_HEAD_U, RING_SBBADDR_UDW(base), 0);
 	CTX_REG(regs, CTX_SECOND_BB_HEAD_L, RING_SBBADDR(base), 0);
 	CTX_REG(regs, CTX_SECOND_BB_STATE, RING_SBBSTATE(base), 0);
-	if (rcs) {
-		struct i915_ctx_workarounds *wa_ctx = &engine->wa_ctx;
-
-		CTX_REG(regs, CTX_RCS_INDIRECT_CTX, RING_INDIRECT_CTX(base), 0);
-		CTX_REG(regs, CTX_RCS_INDIRECT_CTX_OFFSET,
-			RING_INDIRECT_CTX_OFFSET(base), 0);
-		if (wa_ctx->indirect_ctx.size) {
-			u32 ggtt_offset = i915_ggtt_offset(wa_ctx->vma);
-
-			regs[CTX_RCS_INDIRECT_CTX + 1] =
-				(ggtt_offset + wa_ctx->indirect_ctx.offset) |
-				(wa_ctx->indirect_ctx.size / CACHELINE_BYTES);
-
-			regs[CTX_RCS_INDIRECT_CTX_OFFSET + 1] =
-				intel_lr_indirect_ctx_offset(engine) << 6;
-		}
-
-		CTX_REG(regs, CTX_BB_PER_CTX_PTR, RING_BB_PER_CTX_PTR(base), 0);
-		if (wa_ctx->per_ctx.size) {
-			u32 ggtt_offset = i915_ggtt_offset(wa_ctx->vma);
-
-			regs[CTX_BB_PER_CTX_PTR + 1] =
-				(ggtt_offset + wa_ctx->per_ctx.offset) | 0x01;
-		}
-	}
+	if (rcs)
+		execlists_init_reg_state_wa_bb(regs, engine);
 
 	regs[CTX_LRI_HEADER_1] = MI_LOAD_REGISTER_IMM(9) | MI_LRI_FORCE_POSTED;
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 5/8] drm/i915: pass wa_ctx as argument
  2018-04-25 11:45 [PATCH 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (3 preceding siblings ...)
  2018-04-25 11:45 ` [PATCH 4/8] drm/i915: extract per-ctx/indirect bb programming Lionel Landwerlin
@ 2018-04-25 11:45 ` Lionel Landwerlin
  2018-04-25 11:45 ` [PATCH 6/8] drm/i915: reprogram NOA muxes on context switch when using perf Lionel Landwerlin
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 50+ messages in thread
From: Lionel Landwerlin @ 2018-04-25 11:45 UTC (permalink / raw)
  To: intel-gfx

Rather than accessing it from the engine structure. This will be used
for reprogramming later.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 56515091beb4..7a5efab3e4fb 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1586,7 +1586,8 @@ gen10_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
 
 #define CTX_WA_BB_OBJ_SIZE (PAGE_SIZE)
 
-static int lrc_setup_wa_ctx(struct intel_engine_cs *engine)
+static int lrc_setup_wa_ctx(struct intel_engine_cs *engine,
+			    struct i915_ctx_workarounds *wa_ctx)
 {
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
@@ -1606,7 +1607,7 @@ static int lrc_setup_wa_ctx(struct intel_engine_cs *engine)
 	if (err)
 		goto err;
 
-	engine->wa_ctx.vma = vma;
+	wa_ctx->vma = vma;
 	return 0;
 
 err:
@@ -1621,9 +1622,9 @@ static void lrc_destroy_wa_ctx(struct intel_engine_cs *engine)
 
 typedef u32 *(*wa_bb_func_t)(struct intel_engine_cs *engine, u32 *batch);
 
-static int intel_init_workaround_bb(struct intel_engine_cs *engine)
+static int intel_init_workaround_bb(struct intel_engine_cs *engine,
+				    struct i915_ctx_workarounds *wa_ctx)
 {
-	struct i915_ctx_workarounds *wa_ctx = &engine->wa_ctx;
 	struct i915_wa_ctx_bb *wa_bb[2] = { &wa_ctx->indirect_ctx,
 					    &wa_ctx->per_ctx };
 	wa_bb_func_t wa_bb_fn[2];
@@ -1653,7 +1654,7 @@ static int intel_init_workaround_bb(struct intel_engine_cs *engine)
 		return 0;
 	}
 
-	ret = lrc_setup_wa_ctx(engine);
+	ret = lrc_setup_wa_ctx(engine, wa_ctx);
 	if (ret) {
 		DRM_DEBUG_DRIVER("Failed to setup context WA page: %d\n", ret);
 		return ret;
@@ -2306,7 +2307,7 @@ int logical_render_ring_init(struct intel_engine_cs *engine)
 	if (ret)
 		return ret;
 
-	ret = intel_init_workaround_bb(engine);
+	ret = intel_init_workaround_bb(engine, &engine->wa_ctx);
 	if (ret) {
 		/*
 		 * We continue even if we fail to initialize WA batch
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 6/8] drm/i915: reprogram NOA muxes on context switch when using perf
  2018-04-25 11:45 [PATCH 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (4 preceding siblings ...)
  2018-04-25 11:45 ` [PATCH 5/8] drm/i915: pass wa_ctx as argument Lionel Landwerlin
@ 2018-04-25 11:45 ` Lionel Landwerlin
  2018-04-25 11:57   ` Chris Wilson
  2018-04-25 11:45 ` [PATCH 7/8] drm/i915: Record the sseu configuration per-context & engine Lionel Landwerlin
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 50+ messages in thread
From: Lionel Landwerlin @ 2018-04-25 11:45 UTC (permalink / raw)
  To: intel-gfx

If some of the contexts submitting workloads to the GPU have been
configured to shutdown slices/subslices, we might loose the NOA
configurations written in the NOA muxes. We need to reprogram them at
context switch.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  2 +
 drivers/gpu/drm/i915/i915_perf.c | 92 +++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/intel_lrc.c | 71 +++++++++++++++++++++---
 drivers/gpu/drm/i915/intel_lrc.h |  1 +
 4 files changed, 153 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index fdf71164ee24..b15f1fa4453a 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3263,6 +3263,8 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data,
 void i915_oa_init_reg_state(struct intel_engine_cs *engine,
 			    struct i915_gem_context *ctx,
 			    uint32_t *reg_state);
+u32 i915_oa_get_perctx_bb_size(struct intel_engine_cs *engine);
+u32 *i915_oa_emit_perctx_bb(struct intel_engine_cs *engine, u32 *batch);
 
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct i915_address_space *vm,
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 2fc9c85a0d99..d1598f54e63b 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1752,6 +1752,71 @@ static int gen8_emit_oa_config(struct i915_request *rq,
 	return 0;
 }
 
+#define MAX_LRI_SIZE (125U)
+
+u32 i915_oa_get_perctx_bb_size(struct intel_engine_cs *engine)
+{
+	struct drm_i915_private *dev_priv = engine->i915;
+	struct i915_perf_stream *stream = dev_priv->perf.oa.exclusive_stream;
+	struct i915_oa_config *oa_config;
+	u32 n_lri;
+
+	/* We only care about RCS. */
+	if (engine->id != RCS)
+		return 0;
+
+	/* Perf not supported. */
+	if (!dev_priv->perf.initialized)
+		return 0;
+
+	/* OA not currently configured. */
+	if (!stream)
+		return 0;
+
+	oa_config = stream->oa_config;
+
+	/* Very unlikely but possible that we have no muxes to configure. */
+	if (!oa_config->mux_regs_len)
+		return 0;
+
+	n_lri = (oa_config->mux_regs_len / MAX_LRI_SIZE) +
+		(oa_config->mux_regs_len % MAX_LRI_SIZE) != 0;
+
+	/* Return the size of MI_LOAD_REGISTER_IMMs + PIPE_CONTROL . */
+	return n_lri * 4 + oa_config->mux_regs_len * 8 + 24;
+}
+
+u32 *i915_oa_emit_perctx_bb(struct intel_engine_cs *engine, u32 *batch)
+{
+	struct drm_i915_private *dev_priv = engine->i915;
+	struct i915_oa_config *oa_config;
+	u32 i, n_loaded_regs;
+
+	if (i915_oa_get_perctx_bb_size(engine) == 0)
+		return batch;
+
+	oa_config = dev_priv->perf.oa.exclusive_stream->oa_config;
+
+	n_loaded_regs = 0;
+	for (i = 0; i < oa_config->mux_regs_len; i++) {
+		if ((n_loaded_regs % MAX_LRI_SIZE) == 0) {
+			u32 n_lri = min(oa_config->mux_regs_len - n_loaded_regs,
+					MAX_LRI_SIZE);
+			*batch++ = MI_LOAD_REGISTER_IMM(n_lri);
+		}
+
+		*batch++ = i915_mmio_reg_offset(oa_config->mux_regs[i].addr);
+		*batch++ = oa_config->mux_regs[i].value;
+		n_loaded_regs++;
+	}
+
+	batch = gen8_emit_pipe_control(batch,
+				       PIPE_CONTROL_MMIO_WRITE,
+				       0);
+
+	return batch;
+}
+
 static int gen8_switch_to_updated_kernel_context(struct drm_i915_private *dev_priv,
 						 const struct i915_oa_config *oa_config)
 {
@@ -1829,7 +1894,7 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 	/* Switch away from any user context. */
 	ret = gen8_switch_to_updated_kernel_context(dev_priv, oa_config);
 	if (ret)
-		goto out;
+		return ret;
 
 	/*
 	 * The OA register config is setup through the context image. This image
@@ -1846,7 +1911,16 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 	 */
 	ret = i915_gem_wait_for_idle(dev_priv, wait_flags);
 	if (ret)
-		goto out;
+		return ret;
+
+	/*
+	 * Reload the workaround batchbuffer to include NOA muxes
+	 * reprogramming on context-switch, so we don't loose configurations
+	 * after switch-from a context with disabled slices/subslices.
+	 */
+	ret = logical_render_ring_reload_wa_bb(dev_priv->engine[RCS]);
+	if (ret)
+		return ret;
 
 	/* Update all contexts now that we've stalled the submission. */
 	list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
@@ -1858,10 +1932,8 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 			continue;
 
 		regs = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
-		if (IS_ERR(regs)) {
-			ret = PTR_ERR(regs);
-			goto out;
-		}
+		if (IS_ERR(regs))
+			return PTR_ERR(regs);
 
 		ce->state->obj->mm.dirty = true;
 		regs += LRC_STATE_PN * PAGE_SIZE / sizeof(*regs);
@@ -1871,7 +1943,6 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 		i915_gem_object_unpin_map(ce->state->obj);
 	}
 
- out:
 	return ret;
 }
 
@@ -2213,6 +2284,13 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 
 	dev_priv->perf.oa.exclusive_stream = stream;
 
+	ret = dev_priv->perf.oa.ops.enable_metric_set(dev_priv,
+						      stream->oa_config);
+	if (ret)
+		goto err_enable;
+
+	stream->ops = &i915_oa_stream_ops;
+
 	mutex_unlock(&dev_priv->drm.struct_mutex);
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 7a5efab3e4fb..a0f72fcda0d9 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -169,6 +169,8 @@ static void execlists_init_reg_state(u32 *reg_state,
 				     struct i915_gem_context *ctx,
 				     struct intel_engine_cs *engine,
 				     struct intel_ring *ring);
+static void execlists_init_reg_state_wa_bb(u32 *reg_state,
+					   struct intel_engine_cs *engine);
 
 static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 {
@@ -1584,16 +1586,28 @@ gen10_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
 	return batch;
 }
 
-#define CTX_WA_BB_OBJ_SIZE (PAGE_SIZE)
+static u32 *gen_init_perctx_bb(struct intel_engine_cs *engine, u32 *batch)
+{
+	batch = i915_oa_emit_perctx_bb(engine, batch);
+	*batch++ = MI_BATCH_BUFFER_END;
+
+	return batch;
+}
+
+/* Reserve a minimum of 200 dwords for indirect bb */
+#define CTX_WA_BB_MIN_DWORDS (200)
 
 static int lrc_setup_wa_ctx(struct intel_engine_cs *engine,
 			    struct i915_ctx_workarounds *wa_ctx)
 {
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
+	u32 n_pages = DIV_ROUND_UP(i915_oa_get_perctx_bb_size(engine) +
+				   4 * CTX_WA_BB_MIN_DWORDS,
+				   PAGE_SIZE);
 	int err;
 
-	obj = i915_gem_object_create(engine->i915, CTX_WA_BB_OBJ_SIZE);
+	obj = i915_gem_object_create(engine->i915, n_pages * PAGE_SIZE);
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
 
@@ -1639,15 +1653,15 @@ static int intel_init_workaround_bb(struct intel_engine_cs *engine,
 	switch (INTEL_GEN(engine->i915)) {
 	case 10:
 		wa_bb_fn[0] = gen10_init_indirectctx_bb;
-		wa_bb_fn[1] = NULL;
+		wa_bb_fn[1] = gen_init_perctx_bb;
 		break;
 	case 9:
 		wa_bb_fn[0] = gen9_init_indirectctx_bb;
-		wa_bb_fn[1] = NULL;
+		wa_bb_fn[1] = gen_init_perctx_bb;
 		break;
 	case 8:
 		wa_bb_fn[0] = gen8_init_indirectctx_bb;
-		wa_bb_fn[1] = NULL;
+		wa_bb_fn[1] = gen_init_perctx_bb;
 		break;
 	default:
 		MISSING_CASE(INTEL_GEN(engine->i915));
@@ -1680,7 +1694,7 @@ static int intel_init_workaround_bb(struct intel_engine_cs *engine,
 		wa_bb[i]->size = batch_ptr - (batch + wa_bb[i]->offset);
 	}
 
-	BUG_ON(batch_ptr - batch > CTX_WA_BB_OBJ_SIZE);
+	BUG_ON(batch_ptr - batch > wa_ctx->vma->obj->base.size);
 
 	kunmap_atomic(batch);
 	if (ret)
@@ -2321,6 +2335,51 @@ int logical_render_ring_init(struct intel_engine_cs *engine)
 	return logical_ring_init(engine);
 }
 
+int logical_render_ring_reload_wa_bb(struct intel_engine_cs *engine)
+{
+	struct drm_i915_private *dev_priv = engine->i915;
+	struct i915_ctx_workarounds new_wa_ctx;
+	struct i915_gem_context *ctx;
+	int ret;
+
+	if (WARN_ON(engine->id != RCS))
+		return -EINVAL;
+
+	memset(&new_wa_ctx, 0, sizeof(new_wa_ctx));
+	ret = intel_init_workaround_bb(engine, &new_wa_ctx);
+	if (ret)
+		return ret;
+
+	if (engine->wa_ctx.vma)
+		lrc_destroy_wa_ctx(engine);
+
+	memcpy(&engine->wa_ctx, &new_wa_ctx, sizeof(engine->wa_ctx));
+
+	list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
+		struct intel_context *ce = &ctx->engine[RCS];
+		u32 *regs;
+
+		/* Settings will be set upon first use. */
+		if (!ce->state)
+			continue;
+
+		regs = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
+		if (IS_ERR(regs)) {
+			ret = PTR_ERR(regs);
+			break;
+		}
+
+		ce->state->obj->mm.dirty = true;
+		regs += LRC_STATE_PN * PAGE_SIZE / sizeof(*regs);
+
+		execlists_init_reg_state_wa_bb(regs, engine);
+
+		i915_gem_object_unpin_map(ce->state->obj);
+	}
+
+	return ret;
+}
+
 int logical_xcs_ring_init(struct intel_engine_cs *engine)
 {
 	logical_ring_setup(engine);
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 59d7b86012e9..d91d69a17206 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -71,6 +71,7 @@ enum {
 /* Logical Rings */
 void intel_logical_ring_cleanup(struct intel_engine_cs *engine);
 int logical_render_ring_init(struct intel_engine_cs *engine);
+int logical_render_ring_reload_wa_bb(struct intel_engine_cs *engine);
 int logical_xcs_ring_init(struct intel_engine_cs *engine);
 
 /* Logical Ring Contexts */
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 7/8] drm/i915: Record the sseu configuration per-context & engine
  2018-04-25 11:45 [PATCH 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (5 preceding siblings ...)
  2018-04-25 11:45 ` [PATCH 6/8] drm/i915: reprogram NOA muxes on context switch when using perf Lionel Landwerlin
@ 2018-04-25 11:45 ` Lionel Landwerlin
  2018-04-25 11:45 ` [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 50+ messages in thread
From: Lionel Landwerlin @ 2018-04-25 11:45 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

We want to expose the ability to reconfigure the slices, subslice and
eu per context and per engine. To facilitate that, store the current
configuration on the context for each engine, which is initially set
to the device default upon creation.

v2: record sseu configuration per context & engine (Chris)

v3: introduce the i915_gem_context_sseu to store powergating
    programming, sseu_dev_info has grown quite a bit (Lionel)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 22 ++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_context.h | 10 ++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        | 24 ++++++++++++------------
 3 files changed, 44 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 74435affe23f..bdf050beeb94 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -261,11 +261,26 @@ static u32 default_desc_template(const struct drm_i915_private *i915,
 	return desc;
 }
 
+static struct i915_gem_context_sseu
+i915_gem_context_sseu_from_device_sseu(const struct sseu_dev_info *sseu)
+{
+	struct i915_gem_context_sseu value = {
+		.slice_mask = sseu->slice_mask,
+		.subslice_mask = sseu->subslice_mask[0],
+		.min_eus_per_subslice = sseu->max_eus_per_subslice,
+		.max_eus_per_subslice = sseu->max_eus_per_subslice,
+	};
+
+	return value;
+}
+
 static struct i915_gem_context *
 __create_hw_context(struct drm_i915_private *dev_priv,
 		    struct drm_i915_file_private *file_priv)
 {
 	struct i915_gem_context *ctx;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
 	int ret;
 
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
@@ -314,6 +329,13 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 	 * is no remap info, it will be a NOP. */
 	ctx->remap_slice = ALL_L3_SLICES(dev_priv);
 
+	/* On all engines, use the whole device by default */
+	for_each_engine(engine, dev_priv, id) {
+		ctx->engine[id].sseu =
+			i915_gem_context_sseu_from_device_sseu(
+				&INTEL_INFO(dev_priv)->sseu);
+	}
+
 	i915_gem_context_set_bannable(ctx);
 	ctx->ring_size = 4 * PAGE_SIZE;
 	ctx->desc_template =
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index b12a8a8c5af9..2294be644574 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -30,6 +30,7 @@
 #include <linux/radix-tree.h>
 
 #include "i915_gem.h"
+#include "intel_device_info.h"
 
 struct pid;
 
@@ -45,6 +46,13 @@ struct intel_ring;
 
 #define DEFAULT_CONTEXT_HANDLE 0
 
+struct i915_gem_context_sseu {
+	u8 slice_mask;
+	u8 subslice_mask;
+	u8 min_eus_per_subslice;
+	u8 max_eus_per_subslice;
+};
+
 /**
  * struct i915_gem_context - client state
  *
@@ -149,6 +157,8 @@ struct i915_gem_context {
 		u32 *lrc_reg_state;
 		u64 lrc_desc;
 		int pin_count;
+		/** sseu: Control eu/slice partitioning */
+		struct i915_gem_context_sseu sseu;
 	} engine[I915_NUM_ENGINES];
 
 	/** ring_size: size for allocating the per-engine ring buffer */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a0f72fcda0d9..dca17ef24de5 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2387,8 +2387,8 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
 	return logical_ring_init(engine);
 }
 
-static u32
-make_rpcs(struct drm_i915_private *dev_priv)
+static u32 make_rpcs(const struct sseu_dev_info *sseu,
+		     const struct i915_gem_context_sseu *ctx_sseu)
 {
 	u32 rpcs = 0;
 
@@ -2398,24 +2398,23 @@ make_rpcs(struct drm_i915_private *dev_priv)
 	 * must make an explicit request through RPCS for full
 	 * enablement.
 	*/
-	if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) {
+	if (sseu->has_slice_pg) {
 		rpcs |= GEN8_RPCS_S_CNT_ENABLE;
-		rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) <<
-			GEN8_RPCS_S_CNT_SHIFT;
+		rpcs |= hweight8(ctx_sseu->slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
 
-	if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) {
+	if (sseu->has_subslice_pg) {
 		rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
-		rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]) <<
-			GEN8_RPCS_SS_CNT_SHIFT;
+		rpcs |= hweight8(ctx_sseu->subslice_mask) <<
+		        GEN8_RPCS_SS_CNT_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
 
-	if (INTEL_INFO(dev_priv)->sseu.has_eu_pg) {
-		rpcs |= INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
+	if (sseu->has_eu_pg) {
+		rpcs |= ctx_sseu->min_eus_per_subslice <<
 			GEN8_RPCS_EU_MIN_SHIFT;
-		rpcs |= INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
+		rpcs |= ctx_sseu->max_eus_per_subslice <<
 			GEN8_RPCS_EU_MAX_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
@@ -2548,7 +2547,8 @@ static void execlists_init_reg_state(u32 *regs,
 	if (rcs) {
 		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
-			make_rpcs(dev_priv));
+			make_rpcs(&INTEL_INFO(dev_priv)->sseu,
+				  &ctx->engine[engine->id].sseu));
 
 		i915_oa_init_reg_state(engine, ctx, regs);
 	}
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-04-25 11:45 [PATCH 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (6 preceding siblings ...)
  2018-04-25 11:45 ` [PATCH 7/8] drm/i915: Record the sseu configuration per-context & engine Lionel Landwerlin
@ 2018-04-25 11:45 ` Lionel Landwerlin
  2018-04-26 10:00   ` Joonas Lahtinen
  2018-05-03 17:18   ` Tvrtko Ursulin
  2018-04-25 12:34 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: per context slice/subslice powergating Patchwork
                   ` (3 subsequent siblings)
  11 siblings, 2 replies; 50+ messages in thread
From: Lionel Landwerlin @ 2018-04-25 11:45 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

	struct drm_i915_gem_context_param arg;
	struct drm_i915_gem_context_param_sseu sseu = { .flags = I915_EXEC_RENDER };

	memset(&arg, 0, sizeof(arg));
	arg.ctx_id = ctx;
	arg.param = I915_CONTEXT_PARAM_SSEU;
	arg.value = (uintptr_t) &sseu;
	if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
		sseu.packed.subslice_mask = 0;

		drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
	}

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)

v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
CC: Zhipeng Gong <zhipeng.gong@intel.com>
CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 82 ++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_lrc.c        | 55 +++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h        |  4 ++
 include/uapi/drm/i915_drm.h             | 28 +++++++++
 4 files changed, 168 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index bdf050beeb94..b97ddcf47514 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
 	return 0;
 }
 
+static struct i915_gem_context_sseu
+i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
+				     const struct drm_i915_gem_context_param_sseu *user_sseu)
+{
+	struct i915_gem_context_sseu value = {
+		.slice_mask = user_sseu->packed.slice_mask == 0 ?
+			      sseu->slice_mask :
+			      (user_sseu->packed.slice_mask & sseu->slice_mask),
+		.subslice_mask = user_sseu->packed.subslice_mask == 0 ?
+				 sseu->subslice_mask[0] :
+				 (user_sseu->packed.subslice_mask & sseu->subslice_mask[0]),
+		.min_eus_per_subslice = min(user_sseu->packed.min_eus_per_subslice,
+					    sseu->max_eus_per_subslice),
+		.max_eus_per_subslice = min(user_sseu->packed.max_eus_per_subslice,
+					    sseu->max_eus_per_subslice),
+	};
+
+	return value;
+}
+
 int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 				    struct drm_file *file)
 {
@@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_CONTEXT_PARAM_PRIORITY:
 		args->value = ctx->sched.priority;
 		break;
+	case I915_CONTEXT_PARAM_SSEU: {
+		struct drm_i915_gem_context_param_sseu param_sseu;
+		struct intel_engine_cs *engine;
+
+		if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
+				   sizeof(param_sseu))) {
+			ret = -EFAULT;
+			break;
+		}
+
+		engine = i915_gem_engine_from_flags(to_i915(dev), file,
+						    param_sseu.flags);
+		if (!engine) {
+			ret = -EINVAL;
+			break;
+		}
+
+		param_sseu.packed.slice_mask =
+			ctx->engine[engine->id].sseu.slice_mask;
+		param_sseu.packed.subslice_mask =
+			ctx->engine[engine->id].sseu.subslice_mask;
+		param_sseu.packed.min_eus_per_subslice =
+			ctx->engine[engine->id].sseu.min_eus_per_subslice;
+		param_sseu.packed.max_eus_per_subslice =
+			ctx->engine[engine->id].sseu.max_eus_per_subslice;
+
+		if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
+				 sizeof(param_sseu)))
+			ret = -EFAULT;
+		break;
+	}
 	default:
 		ret = -EINVAL;
 		break;
@@ -832,7 +883,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		else
 			i915_gem_context_clear_bannable(ctx);
 		break;
-
 	case I915_CONTEXT_PARAM_PRIORITY:
 		{
 			s64 priority = args->value;
@@ -851,7 +901,37 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 				ctx->sched.priority = priority;
 		}
 		break;
+	case I915_CONTEXT_PARAM_SSEU:
+		if (args->size)
+			ret = -EINVAL;
+		else if (!HAS_EXECLISTS(ctx->i915))
+			ret = -ENODEV;
+		else {
+			struct drm_i915_private *dev_priv = to_i915(dev);
+			struct drm_i915_gem_context_param_sseu user_sseu;
+			struct i915_gem_context_sseu ctx_sseu;
+			struct intel_engine_cs *engine;
+
+			if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
+					   sizeof(user_sseu))) {
+				ret = -EFAULT;
+				break;
+			}
+
+			engine = i915_gem_engine_from_flags(dev_priv, file,
+							    user_sseu.flags);
+			if (!engine) {
+				ret = -EINVAL;
+				break;
+			}
+
+			ctx_sseu =
+				i915_gem_context_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
+								     &user_sseu);
 
+			ret = intel_lr_context_set_sseu(ctx, engine, &ctx_sseu);
+		}
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index dca17ef24de5..9caddcb1f234 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2711,6 +2711,61 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv)
 	}
 }
 
+int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
+			      struct intel_engine_cs *engine,
+			      struct i915_gem_context_sseu *sseu)
+{
+	struct drm_i915_private *dev_priv = ctx->i915;
+	struct intel_context *ce;
+	enum intel_engine_id id;
+	int ret;
+
+	lockdep_assert_held(&dev_priv->drm.struct_mutex);
+
+	if (memcmp(sseu, &ctx->engine[engine->id].sseu, sizeof(*sseu)) == 0)
+		return 0;
+
+	/*
+	 * We can only program this on render ring.
+	 */
+	ce = &ctx->engine[RCS];
+
+	if (ce->pin_count) { /* Assume that the context is active! */
+		ret = i915_gem_switch_to_kernel_context(dev_priv);
+		if (ret)
+			return ret;
+
+		ret = i915_gem_wait_for_idle(dev_priv,
+					     I915_WAIT_INTERRUPTIBLE |
+					     I915_WAIT_LOCKED);
+		if (ret)
+			return ret;
+	}
+
+	if (ce->state) {
+		void *vaddr = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
+		u32 *regs;
+
+		if (IS_ERR(vaddr))
+			return PTR_ERR(vaddr);
+
+		regs = vaddr + LRC_STATE_PN * PAGE_SIZE;
+
+		regs[CTX_R_PWR_CLK_STATE + 1] =
+			make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu);
+		i915_gem_object_unpin_map(ce->state->obj);
+	}
+
+	/*
+	 * Apply the configuration to all engine. Our hardware doesn't
+	 * currently support different configurations for each engine.
+	 */
+	for_each_engine(engine, dev_priv, id)
+		ctx->engine[id].sseu = *sseu;
+
+	return 0;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/intel_lrc.c"
 #endif
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index d91d69a17206..4c84266814fa 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -112,4 +112,8 @@ intel_lr_context_descriptor(struct i915_gem_context *ctx,
 	return ctx->engine[engine->id].lrc_desc;
 }
 
+int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
+			      struct intel_engine_cs *engine,
+			      struct i915_gem_context_sseu *sseu);
+
 #endif /* _INTEL_LRC_H_ */
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 96eda8176030..75749ce11c03 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1456,9 +1456,37 @@ struct drm_i915_gem_context_param {
 #define   I915_CONTEXT_MAX_USER_PRIORITY	1023 /* inclusive */
 #define   I915_CONTEXT_DEFAULT_PRIORITY		0
 #define   I915_CONTEXT_MIN_USER_PRIORITY	-1023 /* inclusive */
+	/*
+	 * When using the following param, value should be a pointer to
+	 * drm_i915_gem_context_param_sseu.
+	 */
+#define I915_CONTEXT_PARAM_SSEU		0x7
 	__u64 value;
 };
 
+struct drm_i915_gem_context_param_sseu {
+	/*
+	 * Engine to be configured or queried. Same value you would use with
+	 * drm_i915_gem_execbuffer2.
+	 */
+	__u64 flags;
+
+	/*
+	 * Setting slice_mask or subslice_mask to 0 will make the context use
+	 * masks reported respectively by I915_PARAM_SLICE_MASK or
+	 * I915_PARAM_SUBSLICE_MASK.
+	 */
+	union {
+		struct {
+			__u8 slice_mask;
+			__u8 subslice_mask;
+			__u8 min_eus_per_subslice;
+			__u8 max_eus_per_subslice;
+		} packed;
+		__u64 value;
+	};
+};
+
 enum drm_i915_oa_format {
 	I915_OA_FORMAT_A13 = 1,	    /* HSW only */
 	I915_OA_FORMAT_A29,	    /* HSW only */
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/8] drm/i915: expose helper mapping exec flag engine to intel_engine_cs
  2018-04-25 11:45 ` [PATCH 1/8] drm/i915: expose helper mapping exec flag engine to intel_engine_cs Lionel Landwerlin
@ 2018-04-25 11:50   ` Chris Wilson
  2018-04-30 14:37     ` Lionel Landwerlin
  0 siblings, 1 reply; 50+ messages in thread
From: Chris Wilson @ 2018-04-25 11:50 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2018-04-25 12:45:14)
> This function will be used later by the per (context,engine) power
> programming interface.

No. This is not the appropriate uABI, please see
intel_engine_lookup_user().
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/8] drm/i915: don't specify pinned size for wa_bb pin/allocation
  2018-04-25 11:45 ` [PATCH 3/8] drm/i915: don't specify pinned size for wa_bb pin/allocation Lionel Landwerlin
@ 2018-04-25 11:52   ` Chris Wilson
  0 siblings, 0 replies; 50+ messages in thread
From: Chris Wilson @ 2018-04-25 11:52 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2018-04-25 12:45:16)
> We can rely on the i915_vma_pin() to use vma->size instead.

But that's the alignment parameter. I didn't change it as I have no idea
why it was set to PAGE_SIZE. i915_vma_pin() treats it as 0 anyway.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/8] drm/i915: reprogram NOA muxes on context switch when using perf
  2018-04-25 11:45 ` [PATCH 6/8] drm/i915: reprogram NOA muxes on context switch when using perf Lionel Landwerlin
@ 2018-04-25 11:57   ` Chris Wilson
  2018-04-25 13:23     ` Chris Wilson
  2018-04-25 14:35     ` Lionel Landwerlin
  0 siblings, 2 replies; 50+ messages in thread
From: Chris Wilson @ 2018-04-25 11:57 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2018-04-25 12:45:19)
> If some of the contexts submitting workloads to the GPU have been
> configured to shutdown slices/subslices, we might loose the NOA
> configurations written in the NOA muxes. We need to reprogram them at
> context switch.

On every single batchbuffer, no way. That's around a 33% latency hit,
ignoring the cost of what's inside the wa_bb.

If they are not context saved, userspace isn't going to be allowed to
modify them. So why do we need this? If they not context saved, then the
kernel needs to control them and doesn't need to reset around every
batch, just the one's that want non-default (fancier if we handle
preemption)?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for drm/i915: per context slice/subslice powergating
  2018-04-25 11:45 [PATCH 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (7 preceding siblings ...)
  2018-04-25 11:45 ` [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
@ 2018-04-25 12:34 ` Patchwork
  2018-04-25 12:36 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 50+ messages in thread
From: Patchwork @ 2018-04-25 12:34 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: per context slice/subslice powergating
URL   : https://patchwork.freedesktop.org/series/42285/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
aad432b4239c drm/i915: expose helper mapping exec flag engine to intel_engine_cs
f01ac5929e4d drm/i915: Program RPCS for Broadwell
1b1c9aba0ce7 drm/i915: don't specify pinned size for wa_bb pin/allocation
418f679dd16e drm/i915: extract per-ctx/indirect bb programming
-:44: CHECK:BRACES: Blank lines aren't necessary after an open brace '{'
#44: FILE: drivers/gpu/drm/i915/intel_lrc.c:2421:
+	if (wa_ctx->per_ctx.size) {
+

total: 0 errors, 0 warnings, 1 checks, 71 lines checked
46f6a90557d5 drm/i915: pass wa_ctx as argument
02aac22145e0 drm/i915: reprogram NOA muxes on context switch when using perf
-:233: WARNING:AVOID_BUG: Avoid crashing the kernel - try using WARN_ON & recovery code rather than BUG() or BUG_ON()
#233: FILE: drivers/gpu/drm/i915/intel_lrc.c:1697:
+	BUG_ON(batch_ptr - batch > wa_ctx->vma->obj->base.size);

total: 0 errors, 1 warnings, 0 checks, 258 lines checked
c67130ef4bd9 drm/i915: Record the sseu configuration per-context & engine
-:57: CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#57: FILE: drivers/gpu/drm/i915/i915_gem_context.c:335:
+			i915_gem_context_sseu_from_device_sseu(

-:133: ERROR:CODE_INDENT: code indent should use tabs where possible
#133: FILE: drivers/gpu/drm/i915/intel_lrc.c:2410:
+^I^I        GEN8_RPCS_SS_CNT_SHIFT;$

total: 1 errors, 0 warnings, 1 checks, 118 lines checked
59ec862256e6 drm/i915: Expose RPCS (SSEU) configuration to userspace
-:25: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#25: 
	struct drm_i915_gem_context_param_sseu sseu = { .flags = I915_EXEC_RENDER };

-:135: CHECK:BRACES: braces {} should be used on all arms of this statement
#135: FILE: drivers/gpu/drm/i915/i915_gem_context.c:905:
+		if (args->size)
[...]
+		else if (!HAS_EXECLISTS(ctx->i915))
[...]
+		else {
[...]

-:139: CHECK:BRACES: Unbalanced braces around else statement
#139: FILE: drivers/gpu/drm/i915/i915_gem_context.c:909:
+		else {

total: 0 errors, 1 warnings, 2 checks, 213 lines checked

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* ✗ Fi.CI.SPARSE: warning for drm/i915: per context slice/subslice powergating
  2018-04-25 11:45 [PATCH 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (8 preceding siblings ...)
  2018-04-25 12:34 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: per context slice/subslice powergating Patchwork
@ 2018-04-25 12:36 ` Patchwork
  2018-04-25 12:49 ` ✓ Fi.CI.BAT: success " Patchwork
  2018-04-25 15:39 ` ✗ Fi.CI.IGT: failure " Patchwork
  11 siblings, 0 replies; 50+ messages in thread
From: Patchwork @ 2018-04-25 12:36 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: per context slice/subslice powergating
URL   : https://patchwork.freedesktop.org/series/42285/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Commit: drm/i915: expose helper mapping exec flag engine to intel_engine_cs
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3659:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3662:16: warning: expression using sizeof(void)

Commit: drm/i915: Program RPCS for Broadwell
Okay!

Commit: drm/i915: don't specify pinned size for wa_bb pin/allocation
Okay!

Commit: drm/i915: extract per-ctx/indirect bb programming
Okay!

Commit: drm/i915: pass wa_ctx as argument
Okay!

Commit: drm/i915: reprogram NOA muxes on context switch when using perf
+drivers/gpu/drm/i915/i915_perf.c:1742:37: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3662:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3664:16: warning: expression using sizeof(void)

Commit: drm/i915: Record the sseu configuration per-context & engine
Okay!

Commit: drm/i915: Expose RPCS (SSEU) configuration to userspace
+drivers/gpu/drm/i915/i915_gem_context.c:754:41: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem_context.c:754:41: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem_context.c:756:41: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem_context.c:756:41: warning: expression using sizeof(void)

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: per context slice/subslice powergating
  2018-04-25 11:45 [PATCH 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (9 preceding siblings ...)
  2018-04-25 12:36 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2018-04-25 12:49 ` Patchwork
  2018-04-25 15:39 ` ✗ Fi.CI.IGT: failure " Patchwork
  11 siblings, 0 replies; 50+ messages in thread
From: Patchwork @ 2018-04-25 12:49 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: per context slice/subslice powergating
URL   : https://patchwork.freedesktop.org/series/42285/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4093 -> Patchwork_8797 =

== Summary - SUCCESS ==

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/42285/revisions/1/mbox/

== Known issues ==

  Here are the changes found in Patchwork_8797 that come from known issues:

  === IGT changes ===

    ==== Possible fixes ====

    igt@kms_chamelium@dp-edid-read:
      fi-kbl-7500u:       FAIL (fdo#103841) -> PASS

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
      fi-bxt-dsi:         INCOMPLETE (fdo#103927) -> PASS

    
  fdo#103841 https://bugs.freedesktop.org/show_bug.cgi?id=103841
  fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927


== Participating hosts (38 -> 33) ==

  Missing    (5): fi-ctg-p8600 fi-ilk-m540 fi-glk-j4005 fi-skl-6700hq fi-hsw-4200u 


== Build changes ==

    * Linux: CI_DRM_4093 -> Patchwork_8797

  CI_DRM_4093: a47694a9464b96e17b12f0bdb1085d10a61c57e9 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4446: e5e8dafc991ee922ec159491c680caff0cfe9235 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_8797: 59ec862256e6d9a083390dd214a93d119d07e6f0 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4446: a2f486679f467cd6e82578384f56d4aabaa8cf2e @ git://anongit.freedesktop.org/piglit


== Linux commits ==

59ec862256e6 drm/i915: Expose RPCS (SSEU) configuration to userspace
c67130ef4bd9 drm/i915: Record the sseu configuration per-context & engine
02aac22145e0 drm/i915: reprogram NOA muxes on context switch when using perf
46f6a90557d5 drm/i915: pass wa_ctx as argument
418f679dd16e drm/i915: extract per-ctx/indirect bb programming
1b1c9aba0ce7 drm/i915: don't specify pinned size for wa_bb pin/allocation
f01ac5929e4d drm/i915: Program RPCS for Broadwell
aad432b4239c drm/i915: expose helper mapping exec flag engine to intel_engine_cs

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8797/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/8] drm/i915: reprogram NOA muxes on context switch when using perf
  2018-04-25 11:57   ` Chris Wilson
@ 2018-04-25 13:23     ` Chris Wilson
  2018-04-25 14:35     ` Lionel Landwerlin
  1 sibling, 0 replies; 50+ messages in thread
From: Chris Wilson @ 2018-04-25 13:23 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Chris Wilson (2018-04-25 12:57:24)
> If they are not context saved, userspace isn't going to be allowed to
> modify them. So why do we need this? If they not context saved, then the
> kernel needs to control them and doesn't need to reset around every
> batch, just the one's that want non-default (fancier if we handle
> preemption)?

Preemption is the killer here. Jumping from one preempted batch back to
inside another and you skip over all ring preambles. We could use the
preemption ring itself to set state though.

Still I would rather see an alternative to wa_bb than incur the cost
where it is never needed :|
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 6/8] drm/i915: reprogram NOA muxes on context switch when using perf
  2018-04-25 11:57   ` Chris Wilson
  2018-04-25 13:23     ` Chris Wilson
@ 2018-04-25 14:35     ` Lionel Landwerlin
  1 sibling, 0 replies; 50+ messages in thread
From: Lionel Landwerlin @ 2018-04-25 14:35 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 25/04/18 12:57, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2018-04-25 12:45:19)
>> If some of the contexts submitting workloads to the GPU have been
>> configured to shutdown slices/subslices, we might loose the NOA
>> configurations written in the NOA muxes. We need to reprogram them at
>> context switch.
> On every single batchbuffer, no way. That's around a 33% latency hit,
> ignoring the cost of what's inside the wa_bb.
>
> If they are not context saved, userspace isn't going to be allowed to
> modify them. So why do we need this? If they not context saved, then the
> kernel needs to control them and doesn't need to reset around every
> batch, just the one's that want non-default (fancier if we handle
> preemption)?
> -Chris
>
These are neither context saved, nor power saved :(

I can't tell whether the GuC might schedule things behind our back.
But eventually that will happen anyway. So that's why I put the 
reprogramming in per ctx wa.
Agreed, it sucks.

I'll look into the fancier things.

Thanks,

-
Lionel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* ✗ Fi.CI.IGT: failure for drm/i915: per context slice/subslice powergating
  2018-04-25 11:45 [PATCH 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (10 preceding siblings ...)
  2018-04-25 12:49 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2018-04-25 15:39 ` Patchwork
  11 siblings, 0 replies; 50+ messages in thread
From: Patchwork @ 2018-04-25 15:39 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: per context slice/subslice powergating
URL   : https://patchwork.freedesktop.org/series/42285/
State : failure

== Summary ==

= CI Bug Log - changes from CI_DRM_4093_full -> Patchwork_8797_full =

== Summary - FAILURE ==

  Serious unknown changes coming with Patchwork_8797_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_8797_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/42285/revisions/1/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_8797_full:

  === IGT changes ===

    ==== Possible regressions ====

    igt@gem_ctx_param@invalid-param-get:
      shard-apl:          PASS -> FAIL
      shard-glk:          PASS -> FAIL
      shard-kbl:          PASS -> FAIL

    igt@gem_ctx_param@invalid-param-set:
      shard-kbl:          PASS -> DMESG-FAIL
      shard-hsw:          PASS -> FAIL +1
      shard-snb:          PASS -> FAIL +1
      shard-glk:          PASS -> DMESG-FAIL
      shard-apl:          PASS -> DMESG-FAIL

    
    ==== Warnings ====

    igt@gem_mocs_settings@mocs-rc6-render:
      shard-kbl:          PASS -> SKIP +1

    
== Known issues ==

  Here are the changes found in Patchwork_8797_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@kms_flip@2x-flip-vs-expired-vblank:
      shard-hsw:          PASS -> FAIL (fdo#102887)

    igt@kms_flip@2x-plain-flip-ts-check-interruptible:
      shard-hsw:          PASS -> FAIL (fdo#100368)

    igt@kms_flip@absolute-wf_vblank-interruptible:
      shard-apl:          PASS -> FAIL (fdo#106087)

    igt@kms_flip@flip-vs-expired-vblank-interruptible:
      shard-glk:          PASS -> FAIL (fdo#102887)

    igt@kms_frontbuffer_tracking@fbc-rgb101010-draw-mmap-gtt:
      shard-apl:          PASS -> FAIL (fdo#103167)

    
    ==== Possible fixes ====

    igt@kms_cursor_legacy@2x-long-flip-vs-cursor-atomic:
      shard-hsw:          FAIL (fdo#104873) -> PASS

    igt@kms_flip@flip-vs-modeset-vs-hang-interruptible:
      shard-kbl:          DMESG-WARN (fdo#103558, fdo#105602) -> PASS +1
      shard-snb:          DMESG-WARN (fdo#103821) -> PASS

    igt@kms_flip@plain-flip-ts-check-interruptible:
      shard-glk:          FAIL (fdo#100368) -> PASS
      shard-hsw:          FAIL (fdo#100368) -> PASS

    igt@kms_frontbuffer_tracking@fbc-suspend:
      shard-kbl:          DMESG-WARN (fdo#103558, fdo#105602, fdo#103841) -> PASS

    
  fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
  fdo#102887 https://bugs.freedesktop.org/show_bug.cgi?id=102887
  fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
  fdo#103558 https://bugs.freedesktop.org/show_bug.cgi?id=103558
  fdo#103821 https://bugs.freedesktop.org/show_bug.cgi?id=103821
  fdo#103841 https://bugs.freedesktop.org/show_bug.cgi?id=103841
  fdo#104873 https://bugs.freedesktop.org/show_bug.cgi?id=104873
  fdo#105602 https://bugs.freedesktop.org/show_bug.cgi?id=105602
  fdo#106087 https://bugs.freedesktop.org/show_bug.cgi?id=106087


== Participating hosts (6 -> 5) ==

  Missing    (1): shard-glkb 


== Build changes ==

    * Linux: CI_DRM_4093 -> Patchwork_8797

  CI_DRM_4093: a47694a9464b96e17b12f0bdb1085d10a61c57e9 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4446: e5e8dafc991ee922ec159491c680caff0cfe9235 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_8797: 59ec862256e6d9a083390dd214a93d119d07e6f0 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4446: a2f486679f467cd6e82578384f56d4aabaa8cf2e @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8797/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-04-25 11:45 ` [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
@ 2018-04-26 10:00   ` Joonas Lahtinen
  2018-04-26 10:22     ` Lionel Landwerlin
  2018-05-03 17:18   ` Tvrtko Ursulin
  1 sibling, 1 reply; 50+ messages in thread
From: Joonas Lahtinen @ 2018-04-26 10:00 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2018-04-25 14:45:21)
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> We want to allow userspace to reconfigure the subslice configuration for
> its own use case. To do so, we expose a context parameter to allow
> adjustment of the RPCS register stored within the context image (and
> currently not accessible via LRI). If the context is adjusted before
> first use, the adjustment is for "free"; otherwise if the context is
> active we flush the context off the GPU (stalling all users) and forcing
> the GPU to save the context to memory where we can modify it and so
> ensure that the register is reloaded on next execution.
> 
> The overhead of managing additional EU subslices can be significant,
> especially in multi-context workloads. Non-GPGPU contexts should
> preferably disable the subslices it is not using, and others should
> fine-tune the number to match their workload.

This hit a dead end last time due to the system wide policy needed to
avoid two parties fighting over the slice count (and going back and
forth between two slice counts would counter the benefits received from
this).

Do we now have a solution for the contention? I don't see code to
negotiate a global value, just raw setter.

Regards, Joonas

> 
> We expose complete control over the RPCS register, allowing
> configuration of slice/subslice, via masks packed into a u64 for
> simplicity. For example,
> 
>         struct drm_i915_gem_context_param arg;
>         struct drm_i915_gem_context_param_sseu sseu = { .flags = I915_EXEC_RENDER };
> 
>         memset(&arg, 0, sizeof(arg));
>         arg.ctx_id = ctx;
>         arg.param = I915_CONTEXT_PARAM_SSEU;
>         arg.value = (uintptr_t) &sseu;
>         if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>                 sseu.packed.subslice_mask = 0;
> 
>                 drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>         }
> 
> could be used to disable all subslices where supported.
> 
> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
> 
> v3: Add ability to program this per engine (Chris)
> 
> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> CC: Zhipeng Gong <zhipeng.gong@intel.com>
> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 82 ++++++++++++++++++++++++-
>  drivers/gpu/drm/i915/intel_lrc.c        | 55 +++++++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.h        |  4 ++
>  include/uapi/drm/i915_drm.h             | 28 +++++++++
>  4 files changed, 168 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index bdf050beeb94..b97ddcf47514 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>         return 0;
>  }
>  
> +static struct i915_gem_context_sseu
> +i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
> +                                    const struct drm_i915_gem_context_param_sseu *user_sseu)
> +{
> +       struct i915_gem_context_sseu value = {
> +               .slice_mask = user_sseu->packed.slice_mask == 0 ?
> +                             sseu->slice_mask :
> +                             (user_sseu->packed.slice_mask & sseu->slice_mask),
> +               .subslice_mask = user_sseu->packed.subslice_mask == 0 ?
> +                                sseu->subslice_mask[0] :
> +                                (user_sseu->packed.subslice_mask & sseu->subslice_mask[0]),
> +               .min_eus_per_subslice = min(user_sseu->packed.min_eus_per_subslice,
> +                                           sseu->max_eus_per_subslice),
> +               .max_eus_per_subslice = min(user_sseu->packed.max_eus_per_subslice,
> +                                           sseu->max_eus_per_subslice),
> +       };
> +
> +       return value;
> +}
> +
>  int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>                                     struct drm_file *file)
>  {
> @@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>         case I915_CONTEXT_PARAM_PRIORITY:
>                 args->value = ctx->sched.priority;
>                 break;
> +       case I915_CONTEXT_PARAM_SSEU: {
> +               struct drm_i915_gem_context_param_sseu param_sseu;
> +               struct intel_engine_cs *engine;
> +
> +               if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
> +                                  sizeof(param_sseu))) {
> +                       ret = -EFAULT;
> +                       break;
> +               }
> +
> +               engine = i915_gem_engine_from_flags(to_i915(dev), file,
> +                                                   param_sseu.flags);
> +               if (!engine) {
> +                       ret = -EINVAL;
> +                       break;
> +               }
> +
> +               param_sseu.packed.slice_mask =
> +                       ctx->engine[engine->id].sseu.slice_mask;
> +               param_sseu.packed.subslice_mask =
> +                       ctx->engine[engine->id].sseu.subslice_mask;
> +               param_sseu.packed.min_eus_per_subslice =
> +                       ctx->engine[engine->id].sseu.min_eus_per_subslice;
> +               param_sseu.packed.max_eus_per_subslice =
> +                       ctx->engine[engine->id].sseu.max_eus_per_subslice;
> +
> +               if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
> +                                sizeof(param_sseu)))
> +                       ret = -EFAULT;
> +               break;
> +       }
>         default:
>                 ret = -EINVAL;
>                 break;
> @@ -832,7 +883,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>                 else
>                         i915_gem_context_clear_bannable(ctx);
>                 break;
> -
>         case I915_CONTEXT_PARAM_PRIORITY:
>                 {
>                         s64 priority = args->value;
> @@ -851,7 +901,37 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>                                 ctx->sched.priority = priority;
>                 }
>                 break;
> +       case I915_CONTEXT_PARAM_SSEU:
> +               if (args->size)
> +                       ret = -EINVAL;
> +               else if (!HAS_EXECLISTS(ctx->i915))
> +                       ret = -ENODEV;
> +               else {
> +                       struct drm_i915_private *dev_priv = to_i915(dev);
> +                       struct drm_i915_gem_context_param_sseu user_sseu;
> +                       struct i915_gem_context_sseu ctx_sseu;
> +                       struct intel_engine_cs *engine;
> +
> +                       if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
> +                                          sizeof(user_sseu))) {
> +                               ret = -EFAULT;
> +                               break;
> +                       }
> +
> +                       engine = i915_gem_engine_from_flags(dev_priv, file,
> +                                                           user_sseu.flags);
> +                       if (!engine) {
> +                               ret = -EINVAL;
> +                               break;
> +                       }
> +
> +                       ctx_sseu =
> +                               i915_gem_context_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
> +                                                                    &user_sseu);
>  
> +                       ret = intel_lr_context_set_sseu(ctx, engine, &ctx_sseu);
> +               }
> +               break;
>         default:
>                 ret = -EINVAL;
>                 break;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index dca17ef24de5..9caddcb1f234 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2711,6 +2711,61 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv)
>         }
>  }
>  
> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
> +                             struct intel_engine_cs *engine,
> +                             struct i915_gem_context_sseu *sseu)
> +{
> +       struct drm_i915_private *dev_priv = ctx->i915;
> +       struct intel_context *ce;
> +       enum intel_engine_id id;
> +       int ret;
> +
> +       lockdep_assert_held(&dev_priv->drm.struct_mutex);
> +
> +       if (memcmp(sseu, &ctx->engine[engine->id].sseu, sizeof(*sseu)) == 0)
> +               return 0;
> +
> +       /*
> +        * We can only program this on render ring.
> +        */
> +       ce = &ctx->engine[RCS];
> +
> +       if (ce->pin_count) { /* Assume that the context is active! */
> +               ret = i915_gem_switch_to_kernel_context(dev_priv);
> +               if (ret)
> +                       return ret;
> +
> +               ret = i915_gem_wait_for_idle(dev_priv,
> +                                            I915_WAIT_INTERRUPTIBLE |
> +                                            I915_WAIT_LOCKED);
> +               if (ret)
> +                       return ret;
> +       }
> +
> +       if (ce->state) {
> +               void *vaddr = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
> +               u32 *regs;
> +
> +               if (IS_ERR(vaddr))
> +                       return PTR_ERR(vaddr);
> +
> +               regs = vaddr + LRC_STATE_PN * PAGE_SIZE;
> +
> +               regs[CTX_R_PWR_CLK_STATE + 1] =
> +                       make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu);
> +               i915_gem_object_unpin_map(ce->state->obj);
> +       }
> +
> +       /*
> +        * Apply the configuration to all engine. Our hardware doesn't
> +        * currently support different configurations for each engine.
> +        */
> +       for_each_engine(engine, dev_priv, id)
> +               ctx->engine[id].sseu = *sseu;
> +
> +       return 0;
> +}
> +
>  #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>  #include "selftests/intel_lrc.c"
>  #endif
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index d91d69a17206..4c84266814fa 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -112,4 +112,8 @@ intel_lr_context_descriptor(struct i915_gem_context *ctx,
>         return ctx->engine[engine->id].lrc_desc;
>  }
>  
> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
> +                             struct intel_engine_cs *engine,
> +                             struct i915_gem_context_sseu *sseu);
> +
>  #endif /* _INTEL_LRC_H_ */
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 96eda8176030..75749ce11c03 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1456,9 +1456,37 @@ struct drm_i915_gem_context_param {
>  #define   I915_CONTEXT_MAX_USER_PRIORITY       1023 /* inclusive */
>  #define   I915_CONTEXT_DEFAULT_PRIORITY                0
>  #define   I915_CONTEXT_MIN_USER_PRIORITY       -1023 /* inclusive */
> +       /*
> +        * When using the following param, value should be a pointer to
> +        * drm_i915_gem_context_param_sseu.
> +        */
> +#define I915_CONTEXT_PARAM_SSEU                0x7
>         __u64 value;
>  };
>  
> +struct drm_i915_gem_context_param_sseu {
> +       /*
> +        * Engine to be configured or queried. Same value you would use with
> +        * drm_i915_gem_execbuffer2.
> +        */
> +       __u64 flags;
> +
> +       /*
> +        * Setting slice_mask or subslice_mask to 0 will make the context use
> +        * masks reported respectively by I915_PARAM_SLICE_MASK or
> +        * I915_PARAM_SUBSLICE_MASK.
> +        */
> +       union {
> +               struct {
> +                       __u8 slice_mask;
> +                       __u8 subslice_mask;
> +                       __u8 min_eus_per_subslice;
> +                       __u8 max_eus_per_subslice;
> +               } packed;
> +               __u64 value;
> +       };
> +};
> +
>  enum drm_i915_oa_format {
>         I915_OA_FORMAT_A13 = 1,     /* HSW only */
>         I915_OA_FORMAT_A29,         /* HSW only */
> -- 
> 2.17.0
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-04-26 10:00   ` Joonas Lahtinen
@ 2018-04-26 10:22     ` Lionel Landwerlin
  2018-05-03 16:04       ` Joonas Lahtinen
  0 siblings, 1 reply; 50+ messages in thread
From: Lionel Landwerlin @ 2018-04-26 10:22 UTC (permalink / raw)
  To: Joonas Lahtinen, intel-gfx

On 26/04/18 11:00, Joonas Lahtinen wrote:
> Quoting Lionel Landwerlin (2018-04-25 14:45:21)
>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>
>> We want to allow userspace to reconfigure the subslice configuration for
>> its own use case. To do so, we expose a context parameter to allow
>> adjustment of the RPCS register stored within the context image (and
>> currently not accessible via LRI). If the context is adjusted before
>> first use, the adjustment is for "free"; otherwise if the context is
>> active we flush the context off the GPU (stalling all users) and forcing
>> the GPU to save the context to memory where we can modify it and so
>> ensure that the register is reloaded on next execution.
>>
>> The overhead of managing additional EU subslices can be significant,
>> especially in multi-context workloads. Non-GPGPU contexts should
>> preferably disable the subslices it is not using, and others should
>> fine-tune the number to match their workload.
> This hit a dead end last time due to the system wide policy needed to
> avoid two parties fighting over the slice count (and going back and
> forth between two slice counts would counter the benefits received from
> this).
>
> Do we now have a solution for the contention? I don't see code to
> negotiate a global value, just raw setter.
>
> Regards, Joonas

I've tried to come up with some numbers about the cost of the back & 
forth (see igt series).

Other than that, I don't think we can expect the kernel to workaround 
the inefficient use of the hardware by userspace.

>
>> We expose complete control over the RPCS register, allowing
>> configuration of slice/subslice, via masks packed into a u64 for
>> simplicity. For example,
>>
>>          struct drm_i915_gem_context_param arg;
>>          struct drm_i915_gem_context_param_sseu sseu = { .flags = I915_EXEC_RENDER };
>>
>>          memset(&arg, 0, sizeof(arg));
>>          arg.ctx_id = ctx;
>>          arg.param = I915_CONTEXT_PARAM_SSEU;
>>          arg.value = (uintptr_t) &sseu;
>>          if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>>                  sseu.packed.subslice_mask = 0;
>>
>>                  drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>>          }
>>
>> could be used to disable all subslices where supported.
>>
>> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
>>
>> v3: Add ability to program this per engine (Chris)
>>
>> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
>> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> CC: Zhipeng Gong <zhipeng.gong@intel.com>
>> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_gem_context.c | 82 ++++++++++++++++++++++++-
>>   drivers/gpu/drm/i915/intel_lrc.c        | 55 +++++++++++++++++
>>   drivers/gpu/drm/i915/intel_lrc.h        |  4 ++
>>   include/uapi/drm/i915_drm.h             | 28 +++++++++
>>   4 files changed, 168 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
>> index bdf050beeb94..b97ddcf47514 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>> @@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>>          return 0;
>>   }
>>   
>> +static struct i915_gem_context_sseu
>> +i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
>> +                                    const struct drm_i915_gem_context_param_sseu *user_sseu)
>> +{
>> +       struct i915_gem_context_sseu value = {
>> +               .slice_mask = user_sseu->packed.slice_mask == 0 ?
>> +                             sseu->slice_mask :
>> +                             (user_sseu->packed.slice_mask & sseu->slice_mask),
>> +               .subslice_mask = user_sseu->packed.subslice_mask == 0 ?
>> +                                sseu->subslice_mask[0] :
>> +                                (user_sseu->packed.subslice_mask & sseu->subslice_mask[0]),
>> +               .min_eus_per_subslice = min(user_sseu->packed.min_eus_per_subslice,
>> +                                           sseu->max_eus_per_subslice),
>> +               .max_eus_per_subslice = min(user_sseu->packed.max_eus_per_subslice,
>> +                                           sseu->max_eus_per_subslice),
>> +       };
>> +
>> +       return value;
>> +}
>> +
>>   int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>>                                      struct drm_file *file)
>>   {
>> @@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>>          case I915_CONTEXT_PARAM_PRIORITY:
>>                  args->value = ctx->sched.priority;
>>                  break;
>> +       case I915_CONTEXT_PARAM_SSEU: {
>> +               struct drm_i915_gem_context_param_sseu param_sseu;
>> +               struct intel_engine_cs *engine;
>> +
>> +               if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
>> +                                  sizeof(param_sseu))) {
>> +                       ret = -EFAULT;
>> +                       break;
>> +               }
>> +
>> +               engine = i915_gem_engine_from_flags(to_i915(dev), file,
>> +                                                   param_sseu.flags);
>> +               if (!engine) {
>> +                       ret = -EINVAL;
>> +                       break;
>> +               }
>> +
>> +               param_sseu.packed.slice_mask =
>> +                       ctx->engine[engine->id].sseu.slice_mask;
>> +               param_sseu.packed.subslice_mask =
>> +                       ctx->engine[engine->id].sseu.subslice_mask;
>> +               param_sseu.packed.min_eus_per_subslice =
>> +                       ctx->engine[engine->id].sseu.min_eus_per_subslice;
>> +               param_sseu.packed.max_eus_per_subslice =
>> +                       ctx->engine[engine->id].sseu.max_eus_per_subslice;
>> +
>> +               if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
>> +                                sizeof(param_sseu)))
>> +                       ret = -EFAULT;
>> +               break;
>> +       }
>>          default:
>>                  ret = -EINVAL;
>>                  break;
>> @@ -832,7 +883,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>>                  else
>>                          i915_gem_context_clear_bannable(ctx);
>>                  break;
>> -
>>          case I915_CONTEXT_PARAM_PRIORITY:
>>                  {
>>                          s64 priority = args->value;
>> @@ -851,7 +901,37 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>>                                  ctx->sched.priority = priority;
>>                  }
>>                  break;
>> +       case I915_CONTEXT_PARAM_SSEU:
>> +               if (args->size)
>> +                       ret = -EINVAL;
>> +               else if (!HAS_EXECLISTS(ctx->i915))
>> +                       ret = -ENODEV;
>> +               else {
>> +                       struct drm_i915_private *dev_priv = to_i915(dev);
>> +                       struct drm_i915_gem_context_param_sseu user_sseu;
>> +                       struct i915_gem_context_sseu ctx_sseu;
>> +                       struct intel_engine_cs *engine;
>> +
>> +                       if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
>> +                                          sizeof(user_sseu))) {
>> +                               ret = -EFAULT;
>> +                               break;
>> +                       }
>> +
>> +                       engine = i915_gem_engine_from_flags(dev_priv, file,
>> +                                                           user_sseu.flags);
>> +                       if (!engine) {
>> +                               ret = -EINVAL;
>> +                               break;
>> +                       }
>> +
>> +                       ctx_sseu =
>> +                               i915_gem_context_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
>> +                                                                    &user_sseu);
>>   
>> +                       ret = intel_lr_context_set_sseu(ctx, engine, &ctx_sseu);
>> +               }
>> +               break;
>>          default:
>>                  ret = -EINVAL;
>>                  break;
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index dca17ef24de5..9caddcb1f234 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -2711,6 +2711,61 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv)
>>          }
>>   }
>>   
>> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
>> +                             struct intel_engine_cs *engine,
>> +                             struct i915_gem_context_sseu *sseu)
>> +{
>> +       struct drm_i915_private *dev_priv = ctx->i915;
>> +       struct intel_context *ce;
>> +       enum intel_engine_id id;
>> +       int ret;
>> +
>> +       lockdep_assert_held(&dev_priv->drm.struct_mutex);
>> +
>> +       if (memcmp(sseu, &ctx->engine[engine->id].sseu, sizeof(*sseu)) == 0)
>> +               return 0;
>> +
>> +       /*
>> +        * We can only program this on render ring.
>> +        */
>> +       ce = &ctx->engine[RCS];
>> +
>> +       if (ce->pin_count) { /* Assume that the context is active! */
>> +               ret = i915_gem_switch_to_kernel_context(dev_priv);
>> +               if (ret)
>> +                       return ret;
>> +
>> +               ret = i915_gem_wait_for_idle(dev_priv,
>> +                                            I915_WAIT_INTERRUPTIBLE |
>> +                                            I915_WAIT_LOCKED);
>> +               if (ret)
>> +                       return ret;
>> +       }
>> +
>> +       if (ce->state) {
>> +               void *vaddr = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
>> +               u32 *regs;
>> +
>> +               if (IS_ERR(vaddr))
>> +                       return PTR_ERR(vaddr);
>> +
>> +               regs = vaddr + LRC_STATE_PN * PAGE_SIZE;
>> +
>> +               regs[CTX_R_PWR_CLK_STATE + 1] =
>> +                       make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu);
>> +               i915_gem_object_unpin_map(ce->state->obj);
>> +       }
>> +
>> +       /*
>> +        * Apply the configuration to all engine. Our hardware doesn't
>> +        * currently support different configurations for each engine.
>> +        */
>> +       for_each_engine(engine, dev_priv, id)
>> +               ctx->engine[id].sseu = *sseu;
>> +
>> +       return 0;
>> +}
>> +
>>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>>   #include "selftests/intel_lrc.c"
>>   #endif
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
>> index d91d69a17206..4c84266814fa 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.h
>> +++ b/drivers/gpu/drm/i915/intel_lrc.h
>> @@ -112,4 +112,8 @@ intel_lr_context_descriptor(struct i915_gem_context *ctx,
>>          return ctx->engine[engine->id].lrc_desc;
>>   }
>>   
>> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
>> +                             struct intel_engine_cs *engine,
>> +                             struct i915_gem_context_sseu *sseu);
>> +
>>   #endif /* _INTEL_LRC_H_ */
>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>> index 96eda8176030..75749ce11c03 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -1456,9 +1456,37 @@ struct drm_i915_gem_context_param {
>>   #define   I915_CONTEXT_MAX_USER_PRIORITY       1023 /* inclusive */
>>   #define   I915_CONTEXT_DEFAULT_PRIORITY                0
>>   #define   I915_CONTEXT_MIN_USER_PRIORITY       -1023 /* inclusive */
>> +       /*
>> +        * When using the following param, value should be a pointer to
>> +        * drm_i915_gem_context_param_sseu.
>> +        */
>> +#define I915_CONTEXT_PARAM_SSEU                0x7
>>          __u64 value;
>>   };
>>   
>> +struct drm_i915_gem_context_param_sseu {
>> +       /*
>> +        * Engine to be configured or queried. Same value you would use with
>> +        * drm_i915_gem_execbuffer2.
>> +        */
>> +       __u64 flags;
>> +
>> +       /*
>> +        * Setting slice_mask or subslice_mask to 0 will make the context use
>> +        * masks reported respectively by I915_PARAM_SLICE_MASK or
>> +        * I915_PARAM_SUBSLICE_MASK.
>> +        */
>> +       union {
>> +               struct {
>> +                       __u8 slice_mask;
>> +                       __u8 subslice_mask;
>> +                       __u8 min_eus_per_subslice;
>> +                       __u8 max_eus_per_subslice;
>> +               } packed;
>> +               __u64 value;
>> +       };
>> +};
>> +
>>   enum drm_i915_oa_format {
>>          I915_OA_FORMAT_A13 = 1,     /* HSW only */
>>          I915_OA_FORMAT_A29,         /* HSW only */
>> -- 
>> 2.17.0
>>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/8] drm/i915: expose helper mapping exec flag engine to intel_engine_cs
  2018-04-25 11:50   ` Chris Wilson
@ 2018-04-30 14:37     ` Lionel Landwerlin
  2018-05-01 11:13       ` Chris Wilson
  2018-05-03 17:12       ` Tvrtko Ursulin
  0 siblings, 2 replies; 50+ messages in thread
From: Lionel Landwerlin @ 2018-04-30 14:37 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 25/04/18 12:50, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2018-04-25 12:45:14)
>> This function will be used later by the per (context,engine) power
>> programming interface.
> No. This is not the appropriate uABI, please see
> intel_engine_lookup_user().
> -Chris
>
uAPI wise, does this sound okay? :

#define I915_CONTEXT_PARAM_SSEU         0x7
         __u64 value;

struct drm_i915_gem_context_param_sseu {
/*
          * Engine to be configured or queried.
          */
         __u32 class;
         __u32 instance;

/*
          * Setting slice_mask or subslice_mask to 0 will make the 
context use
          * masks reported respectively by I915_PARAM_SLICE_MASK or
          * I915_PARAM_SUBSLICE_MASK.
          */
         union {
                 struct {
                         __u8 slice_mask;
                         __u8 subslice_mask;
                         __u8 min_eus_per_subslice;
                         __u8 max_eus_per_subslice;
                 } packed;
                 __u64 value;
         };
};

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/8] drm/i915: expose helper mapping exec flag engine to intel_engine_cs
  2018-04-30 14:37     ` Lionel Landwerlin
@ 2018-05-01 11:13       ` Chris Wilson
  2018-05-03 17:12       ` Tvrtko Ursulin
  1 sibling, 0 replies; 50+ messages in thread
From: Chris Wilson @ 2018-05-01 11:13 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2018-04-30 15:37:42)
> On 25/04/18 12:50, Chris Wilson wrote:
> > Quoting Lionel Landwerlin (2018-04-25 12:45:14)
> >> This function will be used later by the per (context,engine) power
> >> programming interface.
> > No. This is not the appropriate uABI, please see
> > intel_engine_lookup_user().
> > -Chris
> >
> uAPI wise, does this sound okay? :
> 
> #define I915_CONTEXT_PARAM_SSEU         0x7
>          __u64 value;
> 
> struct drm_i915_gem_context_param_sseu {
> /*
>           * Engine to be configured or queried.
>           */
>          __u32 class;
>          __u32 instance;

Tvrtko hasn't yet complained about me limiting class/instance to
U32_MAX, so I think this limit will do for the forseeable future ;)

> /*
>           * Setting slice_mask or subslice_mask to 0 will make the 
> context use
>           * masks reported respectively by I915_PARAM_SLICE_MASK or
>           * I915_PARAM_SUBSLICE_MASK.
>           */
>          union {
>                  struct {
>                          __u8 slice_mask;
>                          __u8 subslice_mask;
>                          __u8 min_eus_per_subslice;
>                          __u8 max_eus_per_subslice;
>                  } packed;
>                  __u64 value;
>          };

The union isn't required any more as we aren't packing into the u64
value.

I would include perhaps a rsvd field? And then make the interface accept
an array of these.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-04-26 10:22     ` Lionel Landwerlin
@ 2018-05-03 16:04       ` Joonas Lahtinen
  2018-05-03 16:14         ` Chris Wilson
                           ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Joonas Lahtinen @ 2018-05-03 16:04 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2018-04-26 13:22:30)
> On 26/04/18 11:00, Joonas Lahtinen wrote:
> > Quoting Lionel Landwerlin (2018-04-25 14:45:21)
> >> From: Chris Wilson <chris@chris-wilson.co.uk>
> >>
> >> We want to allow userspace to reconfigure the subslice configuration for
> >> its own use case. To do so, we expose a context parameter to allow
> >> adjustment of the RPCS register stored within the context image (and
> >> currently not accessible via LRI). If the context is adjusted before
> >> first use, the adjustment is for "free"; otherwise if the context is
> >> active we flush the context off the GPU (stalling all users) and forcing
> >> the GPU to save the context to memory where we can modify it and so
> >> ensure that the register is reloaded on next execution.
> >>
> >> The overhead of managing additional EU subslices can be significant,
> >> especially in multi-context workloads. Non-GPGPU contexts should
> >> preferably disable the subslices it is not using, and others should
> >> fine-tune the number to match their workload.
> > This hit a dead end last time due to the system wide policy needed to
> > avoid two parties fighting over the slice count (and going back and
> > forth between two slice counts would counter the benefits received from
> > this).
> >
> > Do we now have a solution for the contention? I don't see code to
> > negotiate a global value, just raw setter.
> >
> > Regards, Joonas
> 
> I've tried to come up with some numbers about the cost of the back & 
> forth (see igt series).
> 
> Other than that, I don't think we can expect the kernel to workaround 
> the inefficient use of the hardware by userspace.

Well, I'm pretty sure we should not try to make the situation too
miserable for the basic usecases.

If we allow two contexts to fight over the slice count, countering any
perceived benefit, I don't think that'll be a good default.

My recollection of the last round of discussion was that reasonable
thing to do would be to only disable slices when everyone is willing to
let go of. Then it would become a system maintainer level decision to
only run on two slices for when they see fit (configuring the userspace).

More advanced tactics would include scheduling work so that we try to
avoid the slice count changes and deduct the switching time from the
execution budget of the app requesting less slices (if we had fair
time slicing).

Regards, Joonas

> 
> >
> >> We expose complete control over the RPCS register, allowing
> >> configuration of slice/subslice, via masks packed into a u64 for
> >> simplicity. For example,
> >>
> >>          struct drm_i915_gem_context_param arg;
> >>          struct drm_i915_gem_context_param_sseu sseu = { .flags = I915_EXEC_RENDER };
> >>
> >>          memset(&arg, 0, sizeof(arg));
> >>          arg.ctx_id = ctx;
> >>          arg.param = I915_CONTEXT_PARAM_SSEU;
> >>          arg.value = (uintptr_t) &sseu;
> >>          if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
> >>                  sseu.packed.subslice_mask = 0;
> >>
> >>                  drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
> >>          }
> >>
> >> could be used to disable all subslices where supported.
> >>
> >> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
> >>
> >> v3: Add ability to program this per engine (Chris)
> >>
> >> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
> >>
> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> >> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
> >> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> CC: Zhipeng Gong <zhipeng.gong@intel.com>
> >> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/i915_gem_context.c | 82 ++++++++++++++++++++++++-
> >>   drivers/gpu/drm/i915/intel_lrc.c        | 55 +++++++++++++++++
> >>   drivers/gpu/drm/i915/intel_lrc.h        |  4 ++
> >>   include/uapi/drm/i915_drm.h             | 28 +++++++++
> >>   4 files changed, 168 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> >> index bdf050beeb94..b97ddcf47514 100644
> >> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> >> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> >> @@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
> >>          return 0;
> >>   }
> >>   
> >> +static struct i915_gem_context_sseu
> >> +i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
> >> +                                    const struct drm_i915_gem_context_param_sseu *user_sseu)
> >> +{
> >> +       struct i915_gem_context_sseu value = {
> >> +               .slice_mask = user_sseu->packed.slice_mask == 0 ?
> >> +                             sseu->slice_mask :
> >> +                             (user_sseu->packed.slice_mask & sseu->slice_mask),
> >> +               .subslice_mask = user_sseu->packed.subslice_mask == 0 ?
> >> +                                sseu->subslice_mask[0] :
> >> +                                (user_sseu->packed.subslice_mask & sseu->subslice_mask[0]),
> >> +               .min_eus_per_subslice = min(user_sseu->packed.min_eus_per_subslice,
> >> +                                           sseu->max_eus_per_subslice),
> >> +               .max_eus_per_subslice = min(user_sseu->packed.max_eus_per_subslice,
> >> +                                           sseu->max_eus_per_subslice),
> >> +       };
> >> +
> >> +       return value;
> >> +}
> >> +
> >>   int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
> >>                                      struct drm_file *file)
> >>   {
> >> @@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
> >>          case I915_CONTEXT_PARAM_PRIORITY:
> >>                  args->value = ctx->sched.priority;
> >>                  break;
> >> +       case I915_CONTEXT_PARAM_SSEU: {
> >> +               struct drm_i915_gem_context_param_sseu param_sseu;
> >> +               struct intel_engine_cs *engine;
> >> +
> >> +               if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
> >> +                                  sizeof(param_sseu))) {
> >> +                       ret = -EFAULT;
> >> +                       break;
> >> +               }
> >> +
> >> +               engine = i915_gem_engine_from_flags(to_i915(dev), file,
> >> +                                                   param_sseu.flags);
> >> +               if (!engine) {
> >> +                       ret = -EINVAL;
> >> +                       break;
> >> +               }
> >> +
> >> +               param_sseu.packed.slice_mask =
> >> +                       ctx->engine[engine->id].sseu.slice_mask;
> >> +               param_sseu.packed.subslice_mask =
> >> +                       ctx->engine[engine->id].sseu.subslice_mask;
> >> +               param_sseu.packed.min_eus_per_subslice =
> >> +                       ctx->engine[engine->id].sseu.min_eus_per_subslice;
> >> +               param_sseu.packed.max_eus_per_subslice =
> >> +                       ctx->engine[engine->id].sseu.max_eus_per_subslice;
> >> +
> >> +               if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
> >> +                                sizeof(param_sseu)))
> >> +                       ret = -EFAULT;
> >> +               break;
> >> +       }
> >>          default:
> >>                  ret = -EINVAL;
> >>                  break;
> >> @@ -832,7 +883,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
> >>                  else
> >>                          i915_gem_context_clear_bannable(ctx);
> >>                  break;
> >> -
> >>          case I915_CONTEXT_PARAM_PRIORITY:
> >>                  {
> >>                          s64 priority = args->value;
> >> @@ -851,7 +901,37 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
> >>                                  ctx->sched.priority = priority;
> >>                  }
> >>                  break;
> >> +       case I915_CONTEXT_PARAM_SSEU:
> >> +               if (args->size)
> >> +                       ret = -EINVAL;
> >> +               else if (!HAS_EXECLISTS(ctx->i915))
> >> +                       ret = -ENODEV;
> >> +               else {
> >> +                       struct drm_i915_private *dev_priv = to_i915(dev);
> >> +                       struct drm_i915_gem_context_param_sseu user_sseu;
> >> +                       struct i915_gem_context_sseu ctx_sseu;
> >> +                       struct intel_engine_cs *engine;
> >> +
> >> +                       if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
> >> +                                          sizeof(user_sseu))) {
> >> +                               ret = -EFAULT;
> >> +                               break;
> >> +                       }
> >> +
> >> +                       engine = i915_gem_engine_from_flags(dev_priv, file,
> >> +                                                           user_sseu.flags);
> >> +                       if (!engine) {
> >> +                               ret = -EINVAL;
> >> +                               break;
> >> +                       }
> >> +
> >> +                       ctx_sseu =
> >> +                               i915_gem_context_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
> >> +                                                                    &user_sseu);
> >>   
> >> +                       ret = intel_lr_context_set_sseu(ctx, engine, &ctx_sseu);
> >> +               }
> >> +               break;
> >>          default:
> >>                  ret = -EINVAL;
> >>                  break;
> >> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> >> index dca17ef24de5..9caddcb1f234 100644
> >> --- a/drivers/gpu/drm/i915/intel_lrc.c
> >> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> >> @@ -2711,6 +2711,61 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv)
> >>          }
> >>   }
> >>   
> >> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
> >> +                             struct intel_engine_cs *engine,
> >> +                             struct i915_gem_context_sseu *sseu)
> >> +{
> >> +       struct drm_i915_private *dev_priv = ctx->i915;
> >> +       struct intel_context *ce;
> >> +       enum intel_engine_id id;
> >> +       int ret;
> >> +
> >> +       lockdep_assert_held(&dev_priv->drm.struct_mutex);
> >> +
> >> +       if (memcmp(sseu, &ctx->engine[engine->id].sseu, sizeof(*sseu)) == 0)
> >> +               return 0;
> >> +
> >> +       /*
> >> +        * We can only program this on render ring.
> >> +        */
> >> +       ce = &ctx->engine[RCS];
> >> +
> >> +       if (ce->pin_count) { /* Assume that the context is active! */
> >> +               ret = i915_gem_switch_to_kernel_context(dev_priv);
> >> +               if (ret)
> >> +                       return ret;
> >> +
> >> +               ret = i915_gem_wait_for_idle(dev_priv,
> >> +                                            I915_WAIT_INTERRUPTIBLE |
> >> +                                            I915_WAIT_LOCKED);
> >> +               if (ret)
> >> +                       return ret;
> >> +       }
> >> +
> >> +       if (ce->state) {
> >> +               void *vaddr = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
> >> +               u32 *regs;
> >> +
> >> +               if (IS_ERR(vaddr))
> >> +                       return PTR_ERR(vaddr);
> >> +
> >> +               regs = vaddr + LRC_STATE_PN * PAGE_SIZE;
> >> +
> >> +               regs[CTX_R_PWR_CLK_STATE + 1] =
> >> +                       make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu);
> >> +               i915_gem_object_unpin_map(ce->state->obj);
> >> +       }
> >> +
> >> +       /*
> >> +        * Apply the configuration to all engine. Our hardware doesn't
> >> +        * currently support different configurations for each engine.
> >> +        */
> >> +       for_each_engine(engine, dev_priv, id)
> >> +               ctx->engine[id].sseu = *sseu;
> >> +
> >> +       return 0;
> >> +}
> >> +
> >>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> >>   #include "selftests/intel_lrc.c"
> >>   #endif
> >> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> >> index d91d69a17206..4c84266814fa 100644
> >> --- a/drivers/gpu/drm/i915/intel_lrc.h
> >> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> >> @@ -112,4 +112,8 @@ intel_lr_context_descriptor(struct i915_gem_context *ctx,
> >>          return ctx->engine[engine->id].lrc_desc;
> >>   }
> >>   
> >> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
> >> +                             struct intel_engine_cs *engine,
> >> +                             struct i915_gem_context_sseu *sseu);
> >> +
> >>   #endif /* _INTEL_LRC_H_ */
> >> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> >> index 96eda8176030..75749ce11c03 100644
> >> --- a/include/uapi/drm/i915_drm.h
> >> +++ b/include/uapi/drm/i915_drm.h
> >> @@ -1456,9 +1456,37 @@ struct drm_i915_gem_context_param {
> >>   #define   I915_CONTEXT_MAX_USER_PRIORITY       1023 /* inclusive */
> >>   #define   I915_CONTEXT_DEFAULT_PRIORITY                0
> >>   #define   I915_CONTEXT_MIN_USER_PRIORITY       -1023 /* inclusive */
> >> +       /*
> >> +        * When using the following param, value should be a pointer to
> >> +        * drm_i915_gem_context_param_sseu.
> >> +        */
> >> +#define I915_CONTEXT_PARAM_SSEU                0x7
> >>          __u64 value;
> >>   };
> >>   
> >> +struct drm_i915_gem_context_param_sseu {
> >> +       /*
> >> +        * Engine to be configured or queried. Same value you would use with
> >> +        * drm_i915_gem_execbuffer2.
> >> +        */
> >> +       __u64 flags;
> >> +
> >> +       /*
> >> +        * Setting slice_mask or subslice_mask to 0 will make the context use
> >> +        * masks reported respectively by I915_PARAM_SLICE_MASK or
> >> +        * I915_PARAM_SUBSLICE_MASK.
> >> +        */
> >> +       union {
> >> +               struct {
> >> +                       __u8 slice_mask;
> >> +                       __u8 subslice_mask;
> >> +                       __u8 min_eus_per_subslice;
> >> +                       __u8 max_eus_per_subslice;
> >> +               } packed;
> >> +               __u64 value;
> >> +       };
> >> +};
> >> +
> >>   enum drm_i915_oa_format {
> >>          I915_OA_FORMAT_A13 = 1,     /* HSW only */
> >>          I915_OA_FORMAT_A29,         /* HSW only */
> >> -- 
> >> 2.17.0
> >>
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-03 16:04       ` Joonas Lahtinen
@ 2018-05-03 16:14         ` Chris Wilson
  2018-05-03 16:25         ` Lionel Landwerlin
  2018-05-03 16:30         ` Tvrtko Ursulin
  2 siblings, 0 replies; 50+ messages in thread
From: Chris Wilson @ 2018-05-03 16:14 UTC (permalink / raw)
  To: Joonas Lahtinen, Lionel Landwerlin, intel-gfx

Quoting Joonas Lahtinen (2018-05-03 17:04:31)
> More advanced tactics would include scheduling work so that we try to
> avoid the slice count changes and deduct the switching time from the
> execution budget of the app requesting less slices (if we had fair
> time slicing).

Have you been peeking at my reading material? ;)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-03 16:04       ` Joonas Lahtinen
  2018-05-03 16:14         ` Chris Wilson
@ 2018-05-03 16:25         ` Lionel Landwerlin
  2018-05-03 16:30         ` Tvrtko Ursulin
  2 siblings, 0 replies; 50+ messages in thread
From: Lionel Landwerlin @ 2018-05-03 16:25 UTC (permalink / raw)
  To: Joonas Lahtinen, intel-gfx

On 03/05/18 17:04, Joonas Lahtinen wrote:
> Quoting Lionel Landwerlin (2018-04-26 13:22:30)
>> On 26/04/18 11:00, Joonas Lahtinen wrote:
>>> Quoting Lionel Landwerlin (2018-04-25 14:45:21)
>>>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>>>
>>>> We want to allow userspace to reconfigure the subslice configuration for
>>>> its own use case. To do so, we expose a context parameter to allow
>>>> adjustment of the RPCS register stored within the context image (and
>>>> currently not accessible via LRI). If the context is adjusted before
>>>> first use, the adjustment is for "free"; otherwise if the context is
>>>> active we flush the context off the GPU (stalling all users) and forcing
>>>> the GPU to save the context to memory where we can modify it and so
>>>> ensure that the register is reloaded on next execution.
>>>>
>>>> The overhead of managing additional EU subslices can be significant,
>>>> especially in multi-context workloads. Non-GPGPU contexts should
>>>> preferably disable the subslices it is not using, and others should
>>>> fine-tune the number to match their workload.
>>> This hit a dead end last time due to the system wide policy needed to
>>> avoid two parties fighting over the slice count (and going back and
>>> forth between two slice counts would counter the benefits received from
>>> this).
>>>
>>> Do we now have a solution for the contention? I don't see code to
>>> negotiate a global value, just raw setter.
>>>
>>> Regards, Joonas
>> I've tried to come up with some numbers about the cost of the back &
>> forth (see igt series).
>>
>> Other than that, I don't think we can expect the kernel to workaround
>> the inefficient use of the hardware by userspace.
> Well, I'm pretty sure we should not try to make the situation too
> miserable for the basic usecases.
>
> If we allow two contexts to fight over the slice count, countering any
> perceived benefit, I don't think that'll be a good default.
>
> My recollection of the last round of discussion was that reasonable
> thing to do would be to only disable slices when everyone is willing to
> let go of. Then it would become a system maintainer level decision to
> only run on two slices for when they see fit (configuring the userspace).

How would you detect that everybody is willing to let go?
We don't appear to have a mechanism to detect that.
Wouldn't that require to teach all userspace drivers?

>
> More advanced tactics would include scheduling work so that we try to
> avoid the slice count changes and deduct the switching time from the
> execution budget of the app requesting less slices (if we had fair
> time slicing).

That sounds more workable, although fairly complicated.
Maybe we could tweak the priority based on the slice count and say :
     for the next (1second / number of slice config), priority bumped 
for the configuration with x slices
     and rotate every (1second / number of slice config)

Would that resolve the back & forth issue completely though?

Moving a particular context back & forth between different 
configurations is costly too.
You need to drain the context, then pin the context before you can edit 
and resubmit.

Thanks for your feedback,

-
Lionel

>
> Regards, Joonas
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-03 16:04       ` Joonas Lahtinen
  2018-05-03 16:14         ` Chris Wilson
  2018-05-03 16:25         ` Lionel Landwerlin
@ 2018-05-03 16:30         ` Tvrtko Ursulin
  2 siblings, 0 replies; 50+ messages in thread
From: Tvrtko Ursulin @ 2018-05-03 16:30 UTC (permalink / raw)
  To: Joonas Lahtinen, Lionel Landwerlin, intel-gfx


On 03/05/2018 17:04, Joonas Lahtinen wrote:
> Quoting Lionel Landwerlin (2018-04-26 13:22:30)
>> On 26/04/18 11:00, Joonas Lahtinen wrote:
>>> Quoting Lionel Landwerlin (2018-04-25 14:45:21)
>>>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>>>
>>>> We want to allow userspace to reconfigure the subslice configuration for
>>>> its own use case. To do so, we expose a context parameter to allow
>>>> adjustment of the RPCS register stored within the context image (and
>>>> currently not accessible via LRI). If the context is adjusted before
>>>> first use, the adjustment is for "free"; otherwise if the context is
>>>> active we flush the context off the GPU (stalling all users) and forcing
>>>> the GPU to save the context to memory where we can modify it and so
>>>> ensure that the register is reloaded on next execution.
>>>>
>>>> The overhead of managing additional EU subslices can be significant,
>>>> especially in multi-context workloads. Non-GPGPU contexts should
>>>> preferably disable the subslices it is not using, and others should
>>>> fine-tune the number to match their workload.
>>> This hit a dead end last time due to the system wide policy needed to
>>> avoid two parties fighting over the slice count (and going back and
>>> forth between two slice counts would counter the benefits received from
>>> this).
>>>
>>> Do we now have a solution for the contention? I don't see code to
>>> negotiate a global value, just raw setter.
>>>
>>> Regards, Joonas
>>
>> I've tried to come up with some numbers about the cost of the back &
>> forth (see igt series).
>>
>> Other than that, I don't think we can expect the kernel to workaround
>> the inefficient use of the hardware by userspace.
> 
> Well, I'm pretty sure we should not try to make the situation too
> miserable for the basic usecases.
> 
> If we allow two contexts to fight over the slice count, countering any
> perceived benefit, I don't think that'll be a good default.
> 
> My recollection of the last round of discussion was that reasonable
> thing to do would be to only disable slices when everyone is willing to
> let go of. Then it would become a system maintainer level decision to
> only run on two slices for when they see fit (configuring the userspace).

Yes there was definitely talk about co-ordination between OpenCL and 
media stacks.

Btw we mentioned on IRC a few days back that a sysfs knob like 
allow_media_priority, defaulting to 0, could be a good start.

That way userspace could use the uAPI, but unless sysadmin allows it to, 
there will be no fighting over slice counts.

Then at some future time, as long as the uAPI is sensible, more advanced 
policies could be explored. Regardless if in userspace or i915.

Regards,

Tvrtko

> More advanced tactics would include scheduling work so that we try to
> avoid the slice count changes and deduct the switching time from the
> execution budget of the app requesting less slices (if we had fair
> time slicing).
> 
> Regards, Joonas
> 
>>
>>>
>>>> We expose complete control over the RPCS register, allowing
>>>> configuration of slice/subslice, via masks packed into a u64 for
>>>> simplicity. For example,
>>>>
>>>>           struct drm_i915_gem_context_param arg;
>>>>           struct drm_i915_gem_context_param_sseu sseu = { .flags = I915_EXEC_RENDER };
>>>>
>>>>           memset(&arg, 0, sizeof(arg));
>>>>           arg.ctx_id = ctx;
>>>>           arg.param = I915_CONTEXT_PARAM_SSEU;
>>>>           arg.value = (uintptr_t) &sseu;
>>>>           if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>>>>                   sseu.packed.subslice_mask = 0;
>>>>
>>>>                   drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>>>>           }
>>>>
>>>> could be used to disable all subslices where supported.
>>>>
>>>> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
>>>>
>>>> v3: Add ability to program this per engine (Chris)
>>>>
>>>> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
>>>>
>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>>> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
>>>> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> CC: Zhipeng Gong <zhipeng.gong@intel.com>
>>>> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/i915_gem_context.c | 82 ++++++++++++++++++++++++-
>>>>    drivers/gpu/drm/i915/intel_lrc.c        | 55 +++++++++++++++++
>>>>    drivers/gpu/drm/i915/intel_lrc.h        |  4 ++
>>>>    include/uapi/drm/i915_drm.h             | 28 +++++++++
>>>>    4 files changed, 168 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
>>>> index bdf050beeb94..b97ddcf47514 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>>>> @@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>>>>           return 0;
>>>>    }
>>>>    
>>>> +static struct i915_gem_context_sseu
>>>> +i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
>>>> +                                    const struct drm_i915_gem_context_param_sseu *user_sseu)
>>>> +{
>>>> +       struct i915_gem_context_sseu value = {
>>>> +               .slice_mask = user_sseu->packed.slice_mask == 0 ?
>>>> +                             sseu->slice_mask :
>>>> +                             (user_sseu->packed.slice_mask & sseu->slice_mask),
>>>> +               .subslice_mask = user_sseu->packed.subslice_mask == 0 ?
>>>> +                                sseu->subslice_mask[0] :
>>>> +                                (user_sseu->packed.subslice_mask & sseu->subslice_mask[0]),
>>>> +               .min_eus_per_subslice = min(user_sseu->packed.min_eus_per_subslice,
>>>> +                                           sseu->max_eus_per_subslice),
>>>> +               .max_eus_per_subslice = min(user_sseu->packed.max_eus_per_subslice,
>>>> +                                           sseu->max_eus_per_subslice),
>>>> +       };
>>>> +
>>>> +       return value;
>>>> +}
>>>> +
>>>>    int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>>>>                                       struct drm_file *file)
>>>>    {
>>>> @@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>>>>           case I915_CONTEXT_PARAM_PRIORITY:
>>>>                   args->value = ctx->sched.priority;
>>>>                   break;
>>>> +       case I915_CONTEXT_PARAM_SSEU: {
>>>> +               struct drm_i915_gem_context_param_sseu param_sseu;
>>>> +               struct intel_engine_cs *engine;
>>>> +
>>>> +               if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
>>>> +                                  sizeof(param_sseu))) {
>>>> +                       ret = -EFAULT;
>>>> +                       break;
>>>> +               }
>>>> +
>>>> +               engine = i915_gem_engine_from_flags(to_i915(dev), file,
>>>> +                                                   param_sseu.flags);
>>>> +               if (!engine) {
>>>> +                       ret = -EINVAL;
>>>> +                       break;
>>>> +               }
>>>> +
>>>> +               param_sseu.packed.slice_mask =
>>>> +                       ctx->engine[engine->id].sseu.slice_mask;
>>>> +               param_sseu.packed.subslice_mask =
>>>> +                       ctx->engine[engine->id].sseu.subslice_mask;
>>>> +               param_sseu.packed.min_eus_per_subslice =
>>>> +                       ctx->engine[engine->id].sseu.min_eus_per_subslice;
>>>> +               param_sseu.packed.max_eus_per_subslice =
>>>> +                       ctx->engine[engine->id].sseu.max_eus_per_subslice;
>>>> +
>>>> +               if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
>>>> +                                sizeof(param_sseu)))
>>>> +                       ret = -EFAULT;
>>>> +               break;
>>>> +       }
>>>>           default:
>>>>                   ret = -EINVAL;
>>>>                   break;
>>>> @@ -832,7 +883,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>>>>                   else
>>>>                           i915_gem_context_clear_bannable(ctx);
>>>>                   break;
>>>> -
>>>>           case I915_CONTEXT_PARAM_PRIORITY:
>>>>                   {
>>>>                           s64 priority = args->value;
>>>> @@ -851,7 +901,37 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>>>>                                   ctx->sched.priority = priority;
>>>>                   }
>>>>                   break;
>>>> +       case I915_CONTEXT_PARAM_SSEU:
>>>> +               if (args->size)
>>>> +                       ret = -EINVAL;
>>>> +               else if (!HAS_EXECLISTS(ctx->i915))
>>>> +                       ret = -ENODEV;
>>>> +               else {
>>>> +                       struct drm_i915_private *dev_priv = to_i915(dev);
>>>> +                       struct drm_i915_gem_context_param_sseu user_sseu;
>>>> +                       struct i915_gem_context_sseu ctx_sseu;
>>>> +                       struct intel_engine_cs *engine;
>>>> +
>>>> +                       if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
>>>> +                                          sizeof(user_sseu))) {
>>>> +                               ret = -EFAULT;
>>>> +                               break;
>>>> +                       }
>>>> +
>>>> +                       engine = i915_gem_engine_from_flags(dev_priv, file,
>>>> +                                                           user_sseu.flags);
>>>> +                       if (!engine) {
>>>> +                               ret = -EINVAL;
>>>> +                               break;
>>>> +                       }
>>>> +
>>>> +                       ctx_sseu =
>>>> +                               i915_gem_context_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
>>>> +                                                                    &user_sseu);
>>>>    
>>>> +                       ret = intel_lr_context_set_sseu(ctx, engine, &ctx_sseu);
>>>> +               }
>>>> +               break;
>>>>           default:
>>>>                   ret = -EINVAL;
>>>>                   break;
>>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>>> index dca17ef24de5..9caddcb1f234 100644
>>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>>> @@ -2711,6 +2711,61 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv)
>>>>           }
>>>>    }
>>>>    
>>>> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
>>>> +                             struct intel_engine_cs *engine,
>>>> +                             struct i915_gem_context_sseu *sseu)
>>>> +{
>>>> +       struct drm_i915_private *dev_priv = ctx->i915;
>>>> +       struct intel_context *ce;
>>>> +       enum intel_engine_id id;
>>>> +       int ret;
>>>> +
>>>> +       lockdep_assert_held(&dev_priv->drm.struct_mutex);
>>>> +
>>>> +       if (memcmp(sseu, &ctx->engine[engine->id].sseu, sizeof(*sseu)) == 0)
>>>> +               return 0;
>>>> +
>>>> +       /*
>>>> +        * We can only program this on render ring.
>>>> +        */
>>>> +       ce = &ctx->engine[RCS];
>>>> +
>>>> +       if (ce->pin_count) { /* Assume that the context is active! */
>>>> +               ret = i915_gem_switch_to_kernel_context(dev_priv);
>>>> +               if (ret)
>>>> +                       return ret;
>>>> +
>>>> +               ret = i915_gem_wait_for_idle(dev_priv,
>>>> +                                            I915_WAIT_INTERRUPTIBLE |
>>>> +                                            I915_WAIT_LOCKED);
>>>> +               if (ret)
>>>> +                       return ret;
>>>> +       }
>>>> +
>>>> +       if (ce->state) {
>>>> +               void *vaddr = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
>>>> +               u32 *regs;
>>>> +
>>>> +               if (IS_ERR(vaddr))
>>>> +                       return PTR_ERR(vaddr);
>>>> +
>>>> +               regs = vaddr + LRC_STATE_PN * PAGE_SIZE;
>>>> +
>>>> +               regs[CTX_R_PWR_CLK_STATE + 1] =
>>>> +                       make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu);
>>>> +               i915_gem_object_unpin_map(ce->state->obj);
>>>> +       }
>>>> +
>>>> +       /*
>>>> +        * Apply the configuration to all engine. Our hardware doesn't
>>>> +        * currently support different configurations for each engine.
>>>> +        */
>>>> +       for_each_engine(engine, dev_priv, id)
>>>> +               ctx->engine[id].sseu = *sseu;
>>>> +
>>>> +       return 0;
>>>> +}
>>>> +
>>>>    #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>>>>    #include "selftests/intel_lrc.c"
>>>>    #endif
>>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
>>>> index d91d69a17206..4c84266814fa 100644
>>>> --- a/drivers/gpu/drm/i915/intel_lrc.h
>>>> +++ b/drivers/gpu/drm/i915/intel_lrc.h
>>>> @@ -112,4 +112,8 @@ intel_lr_context_descriptor(struct i915_gem_context *ctx,
>>>>           return ctx->engine[engine->id].lrc_desc;
>>>>    }
>>>>    
>>>> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
>>>> +                             struct intel_engine_cs *engine,
>>>> +                             struct i915_gem_context_sseu *sseu);
>>>> +
>>>>    #endif /* _INTEL_LRC_H_ */
>>>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>>> index 96eda8176030..75749ce11c03 100644
>>>> --- a/include/uapi/drm/i915_drm.h
>>>> +++ b/include/uapi/drm/i915_drm.h
>>>> @@ -1456,9 +1456,37 @@ struct drm_i915_gem_context_param {
>>>>    #define   I915_CONTEXT_MAX_USER_PRIORITY       1023 /* inclusive */
>>>>    #define   I915_CONTEXT_DEFAULT_PRIORITY                0
>>>>    #define   I915_CONTEXT_MIN_USER_PRIORITY       -1023 /* inclusive */
>>>> +       /*
>>>> +        * When using the following param, value should be a pointer to
>>>> +        * drm_i915_gem_context_param_sseu.
>>>> +        */
>>>> +#define I915_CONTEXT_PARAM_SSEU                0x7
>>>>           __u64 value;
>>>>    };
>>>>    
>>>> +struct drm_i915_gem_context_param_sseu {
>>>> +       /*
>>>> +        * Engine to be configured or queried. Same value you would use with
>>>> +        * drm_i915_gem_execbuffer2.
>>>> +        */
>>>> +       __u64 flags;
>>>> +
>>>> +       /*
>>>> +        * Setting slice_mask or subslice_mask to 0 will make the context use
>>>> +        * masks reported respectively by I915_PARAM_SLICE_MASK or
>>>> +        * I915_PARAM_SUBSLICE_MASK.
>>>> +        */
>>>> +       union {
>>>> +               struct {
>>>> +                       __u8 slice_mask;
>>>> +                       __u8 subslice_mask;
>>>> +                       __u8 min_eus_per_subslice;
>>>> +                       __u8 max_eus_per_subslice;
>>>> +               } packed;
>>>> +               __u64 value;
>>>> +       };
>>>> +};
>>>> +
>>>>    enum drm_i915_oa_format {
>>>>           I915_OA_FORMAT_A13 = 1,     /* HSW only */
>>>>           I915_OA_FORMAT_A29,         /* HSW only */
>>>> -- 
>>>> 2.17.0
>>>>
>>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/8] drm/i915: expose helper mapping exec flag engine to intel_engine_cs
  2018-04-30 14:37     ` Lionel Landwerlin
  2018-05-01 11:13       ` Chris Wilson
@ 2018-05-03 17:12       ` Tvrtko Ursulin
  2018-05-03 17:31         ` Lionel Landwerlin
  1 sibling, 1 reply; 50+ messages in thread
From: Tvrtko Ursulin @ 2018-05-03 17:12 UTC (permalink / raw)
  To: Lionel Landwerlin, Chris Wilson, intel-gfx


On 30/04/2018 15:37, Lionel Landwerlin wrote:
> On 25/04/18 12:50, Chris Wilson wrote:
>> Quoting Lionel Landwerlin (2018-04-25 12:45:14)
>>> This function will be used later by the per (context,engine) power
>>> programming interface.
>> No. This is not the appropriate uABI, please see
>> intel_engine_lookup_user().
>> -Chris
>>
> uAPI wise, does this sound okay? :
> 
> #define I915_CONTEXT_PARAM_SSEU         0x7
>          __u64 value;
> 
> struct drm_i915_gem_context_param_sseu {
> /*
>           * Engine to be configured or queried.
>           */
>          __u32 class;
>          __u32 instance;
> 
> /*
>           * Setting slice_mask or subslice_mask to 0 will make the 
> context use
>           * masks reported respectively by I915_PARAM_SLICE_MASK or
>           * I915_PARAM_SUBSLICE_MASK.
>           */
>          union {
>                  struct {
>                          __u8 slice_mask;
>                          __u8 subslice_mask;
>                          __u8 min_eus_per_subslice;
>                          __u8 max_eus_per_subslice;
>                  } packed;
>                  __u64 value;
>          };
> };

Is it possible (now or in the future) to have different configs per 
slice or subslice? And if so, is the way this uAPI handles that passing 
these packets multiple times every time with different slices and 
subslice mask?

For instance:

class=0, instance=0, slice_mask=0x1, subslice=0x10
class=0, instance=0, slice_mask=0x10, subslice=0x1

Is this something we should support?

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-04-25 11:45 ` [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
  2018-04-26 10:00   ` Joonas Lahtinen
@ 2018-05-03 17:18   ` Tvrtko Ursulin
  2018-05-04 16:25     ` Lionel Landwerlin
  2018-05-08 20:56     ` Chris Wilson
  1 sibling, 2 replies; 50+ messages in thread
From: Tvrtko Ursulin @ 2018-05-03 17:18 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx


On 25/04/2018 12:45, Lionel Landwerlin wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> We want to allow userspace to reconfigure the subslice configuration for
> its own use case. To do so, we expose a context parameter to allow
> adjustment of the RPCS register stored within the context image (and
> currently not accessible via LRI). If the context is adjusted before
> first use, the adjustment is for "free"; otherwise if the context is
> active we flush the context off the GPU (stalling all users) and forcing
> the GPU to save the context to memory where we can modify it and so
> ensure that the register is reloaded on next execution.
> 
> The overhead of managing additional EU subslices can be significant,
> especially in multi-context workloads. Non-GPGPU contexts should
> preferably disable the subslices it is not using, and others should
> fine-tune the number to match their workload.
> 
> We expose complete control over the RPCS register, allowing
> configuration of slice/subslice, via masks packed into a u64 for
> simplicity. For example,
> 
> 	struct drm_i915_gem_context_param arg;
> 	struct drm_i915_gem_context_param_sseu sseu = { .flags = I915_EXEC_RENDER };
> 
> 	memset(&arg, 0, sizeof(arg));
> 	arg.ctx_id = ctx;
> 	arg.param = I915_CONTEXT_PARAM_SSEU;
> 	arg.value = (uintptr_t) &sseu;
> 	if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
> 		sseu.packed.subslice_mask = 0;
> 
> 		drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
> 	}
> 
> could be used to disable all subslices where supported.
> 
> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
> 
> v3: Add ability to program this per engine (Chris)
> 
> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> CC: Zhipeng Gong <zhipeng.gong@intel.com>
> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c | 82 ++++++++++++++++++++++++-
>   drivers/gpu/drm/i915/intel_lrc.c        | 55 +++++++++++++++++
>   drivers/gpu/drm/i915/intel_lrc.h        |  4 ++
>   include/uapi/drm/i915_drm.h             | 28 +++++++++
>   4 files changed, 168 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index bdf050beeb94..b97ddcf47514 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>   	return 0;
>   }
>   
> +static struct i915_gem_context_sseu
> +i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
> +				     const struct drm_i915_gem_context_param_sseu *user_sseu)
> +{
> +	struct i915_gem_context_sseu value = {
> +		.slice_mask = user_sseu->packed.slice_mask == 0 ?
> +			      sseu->slice_mask :
> +			      (user_sseu->packed.slice_mask & sseu->slice_mask),
> +		.subslice_mask = user_sseu->packed.subslice_mask == 0 ?
> +				 sseu->subslice_mask[0] :
> +				 (user_sseu->packed.subslice_mask & sseu->subslice_mask[0]),
> +		.min_eus_per_subslice = min(user_sseu->packed.min_eus_per_subslice,
> +					    sseu->max_eus_per_subslice),
> +		.max_eus_per_subslice = min(user_sseu->packed.max_eus_per_subslice,
> +					    sseu->max_eus_per_subslice),
> +	};
> +
> +	return value;
> +}
> +
>   int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>   				    struct drm_file *file)
>   {
> @@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>   	case I915_CONTEXT_PARAM_PRIORITY:
>   		args->value = ctx->sched.priority;
>   		break;
> +	case I915_CONTEXT_PARAM_SSEU: {
> +		struct drm_i915_gem_context_param_sseu param_sseu;
> +		struct intel_engine_cs *engine;
> +
> +		if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
> +				   sizeof(param_sseu))) {
> +			ret = -EFAULT;
> +			break;
> +		}
> +
> +		engine = i915_gem_engine_from_flags(to_i915(dev), file,
> +						    param_sseu.flags);
> +		if (!engine) {
> +			ret = -EINVAL;
> +			break;
> +		}
> +
> +		param_sseu.packed.slice_mask =
> +			ctx->engine[engine->id].sseu.slice_mask;
> +		param_sseu.packed.subslice_mask =
> +			ctx->engine[engine->id].sseu.subslice_mask;
> +		param_sseu.packed.min_eus_per_subslice =
> +			ctx->engine[engine->id].sseu.min_eus_per_subslice;
> +		param_sseu.packed.max_eus_per_subslice =
> +			ctx->engine[engine->id].sseu.max_eus_per_subslice;
> +
> +		if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
> +				 sizeof(param_sseu)))
> +			ret = -EFAULT;
> +		break;
> +	}
>   	default:
>   		ret = -EINVAL;
>   		break;
> @@ -832,7 +883,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   		else
>   			i915_gem_context_clear_bannable(ctx);
>   		break;
> -
>   	case I915_CONTEXT_PARAM_PRIORITY:
>   		{
>   			s64 priority = args->value;
> @@ -851,7 +901,37 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   				ctx->sched.priority = priority;
>   		}
>   		break;
> +	case I915_CONTEXT_PARAM_SSEU:
> +		if (args->size)
> +			ret = -EINVAL;
> +		else if (!HAS_EXECLISTS(ctx->i915))
> +			ret = -ENODEV;
> +		else {
> +			struct drm_i915_private *dev_priv = to_i915(dev);
> +			struct drm_i915_gem_context_param_sseu user_sseu;
> +			struct i915_gem_context_sseu ctx_sseu;
> +			struct intel_engine_cs *engine;
> +
> +			if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
> +					   sizeof(user_sseu))) {
> +				ret = -EFAULT;
> +				break;
> +			}
> +
> +			engine = i915_gem_engine_from_flags(dev_priv, file,
> +							    user_sseu.flags);
> +			if (!engine) {
> +				ret = -EINVAL;
> +				break;
> +			}
> +
> +			ctx_sseu =
> +				i915_gem_context_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
> +								     &user_sseu);
>   
> +			ret = intel_lr_context_set_sseu(ctx, engine, &ctx_sseu);
> +		}
> +		break;
>   	default:
>   		ret = -EINVAL;
>   		break;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index dca17ef24de5..9caddcb1f234 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2711,6 +2711,61 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv)
>   	}
>   }
>   
> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
> +			      struct intel_engine_cs *engine,
> +			      struct i915_gem_context_sseu *sseu)
> +{
> +	struct drm_i915_private *dev_priv = ctx->i915;
> +	struct intel_context *ce;
> +	enum intel_engine_id id;
> +	int ret;
> +
> +	lockdep_assert_held(&dev_priv->drm.struct_mutex);
> +
> +	if (memcmp(sseu, &ctx->engine[engine->id].sseu, sizeof(*sseu)) == 0)
> +		return 0;
> +
> +	/*
> +	 * We can only program this on render ring.
> +	 */
> +	ce = &ctx->engine[RCS];
> +
> +	if (ce->pin_count) { /* Assume that the context is active! */
> +		ret = i915_gem_switch_to_kernel_context(dev_priv);
> +		if (ret)
> +			return ret;
> +
> +		ret = i915_gem_wait_for_idle(dev_priv,
> +					     I915_WAIT_INTERRUPTIBLE |
> +					     I915_WAIT_LOCKED);

Could we consider the alternative of only allowing this to be configured 
on context create? That way we would not need to idle the GPU every time 
userspace decides to fiddle with it. It is unprivileged so quite an easy 
way for random app to ruin GPU performance for everyone.

Regards,

Tvrtko

> +		if (ret)
> +			return ret;
> +	}
> +
> +	if (ce->state) {
> +		void *vaddr = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
> +		u32 *regs;
> +
> +		if (IS_ERR(vaddr))
> +			return PTR_ERR(vaddr);
> +
> +		regs = vaddr + LRC_STATE_PN * PAGE_SIZE;
> +
> +		regs[CTX_R_PWR_CLK_STATE + 1] =
> +			make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu);
> +		i915_gem_object_unpin_map(ce->state->obj);
> +	}
> +
> +	/*
> +	 * Apply the configuration to all engine. Our hardware doesn't
> +	 * currently support different configurations for each engine.
> +	 */
> +	for_each_engine(engine, dev_priv, id)
> +		ctx->engine[id].sseu = *sseu;
> +
> +	return 0;
> +}
> +
>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>   #include "selftests/intel_lrc.c"
>   #endif
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index d91d69a17206..4c84266814fa 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -112,4 +112,8 @@ intel_lr_context_descriptor(struct i915_gem_context *ctx,
>   	return ctx->engine[engine->id].lrc_desc;
>   }
>   
> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
> +			      struct intel_engine_cs *engine,
> +			      struct i915_gem_context_sseu *sseu);
> +
>   #endif /* _INTEL_LRC_H_ */
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 96eda8176030..75749ce11c03 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1456,9 +1456,37 @@ struct drm_i915_gem_context_param {
>   #define   I915_CONTEXT_MAX_USER_PRIORITY	1023 /* inclusive */
>   #define   I915_CONTEXT_DEFAULT_PRIORITY		0
>   #define   I915_CONTEXT_MIN_USER_PRIORITY	-1023 /* inclusive */
> +	/*
> +	 * When using the following param, value should be a pointer to
> +	 * drm_i915_gem_context_param_sseu.
> +	 */
> +#define I915_CONTEXT_PARAM_SSEU		0x7
>   	__u64 value;
>   };
>   
> +struct drm_i915_gem_context_param_sseu {
> +	/*
> +	 * Engine to be configured or queried. Same value you would use with
> +	 * drm_i915_gem_execbuffer2.
> +	 */
> +	__u64 flags;
> +
> +	/*
> +	 * Setting slice_mask or subslice_mask to 0 will make the context use
> +	 * masks reported respectively by I915_PARAM_SLICE_MASK or
> +	 * I915_PARAM_SUBSLICE_MASK.
> +	 */
> +	union {
> +		struct {
> +			__u8 slice_mask;
> +			__u8 subslice_mask;
> +			__u8 min_eus_per_subslice;
> +			__u8 max_eus_per_subslice;
> +		} packed;
> +		__u64 value;
> +	};
> +};
> +
>   enum drm_i915_oa_format {
>   	I915_OA_FORMAT_A13 = 1,	    /* HSW only */
>   	I915_OA_FORMAT_A29,	    /* HSW only */
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/8] drm/i915: expose helper mapping exec flag engine to intel_engine_cs
  2018-05-03 17:12       ` Tvrtko Ursulin
@ 2018-05-03 17:31         ` Lionel Landwerlin
  2018-05-03 18:00           ` Tvrtko Ursulin
  0 siblings, 1 reply; 50+ messages in thread
From: Lionel Landwerlin @ 2018-05-03 17:31 UTC (permalink / raw)
  To: Tvrtko Ursulin, Chris Wilson, intel-gfx

On 03/05/18 18:12, Tvrtko Ursulin wrote:
>
> On 30/04/2018 15:37, Lionel Landwerlin wrote:
>> On 25/04/18 12:50, Chris Wilson wrote:
>>> Quoting Lionel Landwerlin (2018-04-25 12:45:14)
>>>> This function will be used later by the per (context,engine) power
>>>> programming interface.
>>> No. This is not the appropriate uABI, please see
>>> intel_engine_lookup_user().
>>> -Chris
>>>
>> uAPI wise, does this sound okay? :
>>
>> #define I915_CONTEXT_PARAM_SSEU         0x7
>>          __u64 value;
>>
>> struct drm_i915_gem_context_param_sseu {
>> /*
>>           * Engine to be configured or queried.
>>           */
>>          __u32 class;
>>          __u32 instance;
>>
>> /*
>>           * Setting slice_mask or subslice_mask to 0 will make the 
>> context use
>>           * masks reported respectively by I915_PARAM_SLICE_MASK or
>>           * I915_PARAM_SUBSLICE_MASK.
>>           */
>>          union {
>>                  struct {
>>                          __u8 slice_mask;
>>                          __u8 subslice_mask;
>>                          __u8 min_eus_per_subslice;
>>                          __u8 max_eus_per_subslice;
>>                  } packed;
>>                  __u64 value;
>>          };
>> };
>
> Is it possible (now or in the future) to have different configs per 
> slice or subslice? And if so, is the way this uAPI handles that 
> passing these packets multiple times every time with different slices 
> and subslice mask?
>
> For instance:
>
> class=0, instance=0, slice_mask=0x1, subslice=0x10
> class=0, instance=0, slice_mask=0x10, subslice=0x1
>
> Is this something we should support?

It's not supported in any configuration I'm aware of. It's always a 
uniform subslice programming across slices.
It's actually a number of slice/subslice to enable (through the register 
interface), not a mask (i.e. we can't choose which ones get used) which 
is what this interface exposes.

It's a fair comment though. We didn't plan for asymmetric slice fusing 
initially and it just happened.

One other thing I've been wondering is whether we should strictly 
validate the input from userspace.
Right now we min/max the numbers and never raise any error.

Thanks,

-
Lionel

>
> Regards,
>
> Tvrtko
>
>
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/8] drm/i915: expose helper mapping exec flag engine to intel_engine_cs
  2018-05-03 17:31         ` Lionel Landwerlin
@ 2018-05-03 18:00           ` Tvrtko Ursulin
  2018-05-03 20:09             ` Lionel Landwerlin
  0 siblings, 1 reply; 50+ messages in thread
From: Tvrtko Ursulin @ 2018-05-03 18:00 UTC (permalink / raw)
  To: Lionel Landwerlin, Chris Wilson, intel-gfx


On 03/05/2018 18:31, Lionel Landwerlin wrote:
> On 03/05/18 18:12, Tvrtko Ursulin wrote:
>>
>> On 30/04/2018 15:37, Lionel Landwerlin wrote:
>>> On 25/04/18 12:50, Chris Wilson wrote:
>>>> Quoting Lionel Landwerlin (2018-04-25 12:45:14)
>>>>> This function will be used later by the per (context,engine) power
>>>>> programming interface.
>>>> No. This is not the appropriate uABI, please see
>>>> intel_engine_lookup_user().
>>>> -Chris
>>>>
>>> uAPI wise, does this sound okay? :
>>>
>>> #define I915_CONTEXT_PARAM_SSEU         0x7
>>>          __u64 value;
>>>
>>> struct drm_i915_gem_context_param_sseu {
>>> /*
>>>           * Engine to be configured or queried.
>>>           */
>>>          __u32 class;
>>>          __u32 instance;
>>>
>>> /*
>>>           * Setting slice_mask or subslice_mask to 0 will make the 
>>> context use
>>>           * masks reported respectively by I915_PARAM_SLICE_MASK or
>>>           * I915_PARAM_SUBSLICE_MASK.
>>>           */
>>>          union {
>>>                  struct {
>>>                          __u8 slice_mask;
>>>                          __u8 subslice_mask;
>>>                          __u8 min_eus_per_subslice;
>>>                          __u8 max_eus_per_subslice;
>>>                  } packed;
>>>                  __u64 value;
>>>          };
>>> };
>>
>> Is it possible (now or in the future) to have different configs per 
>> slice or subslice? And if so, is the way this uAPI handles that 
>> passing these packets multiple times every time with different slices 
>> and subslice mask?
>>
>> For instance:
>>
>> class=0, instance=0, slice_mask=0x1, subslice=0x10
>> class=0, instance=0, slice_mask=0x10, subslice=0x1
>>
>> Is this something we should support?
> 
> It's not supported in any configuration I'm aware of. It's always a 
> uniform subslice programming across slices. > It's actually a number of slice/subslice to enable (through the register
> interface), not a mask (i.e. we can't choose which ones get used) which 
> is what this interface exposes.

I thought for Gen11 it cannot be just number of slices but must be a 
mask? since media will want to make sure some slices are not enabled, 
and some are, for its workloads.

But in general I guess I don't understand the semantics of slice and 
subslice masks (and eu counts). Say if you pass in slice_mask=0x1 
subslice_mask=0x1 - this means you are only enabling subslice 0x1 on 
slice 0x1 - but what happens on other slices? They are all disabled?

> It's a fair comment though. We didn't plan for asymmetric slice fusing 
> initially and it just happened.

Okay so you think uAPI should support it to be more future proof?

> One other thing I've been wondering is whether we should strictly 
> validate the input from userspace.
> Right now we min/max the numbers and never raise any error.

It does feel low level enough that we should reject userspace trying to 
feed in nonsense for its own good.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/8] drm/i915: expose helper mapping exec flag engine to intel_engine_cs
  2018-05-03 18:00           ` Tvrtko Ursulin
@ 2018-05-03 20:09             ` Lionel Landwerlin
  2018-05-03 20:15               ` Chris Wilson
  0 siblings, 1 reply; 50+ messages in thread
From: Lionel Landwerlin @ 2018-05-03 20:09 UTC (permalink / raw)
  To: Tvrtko Ursulin, Chris Wilson, intel-gfx

On 03/05/18 19:00, Tvrtko Ursulin wrote:
>
> On 03/05/2018 18:31, Lionel Landwerlin wrote:
>> On 03/05/18 18:12, Tvrtko Ursulin wrote:
>>>
>>> On 30/04/2018 15:37, Lionel Landwerlin wrote:
>>>> On 25/04/18 12:50, Chris Wilson wrote:
>>>>> Quoting Lionel Landwerlin (2018-04-25 12:45:14)
>>>>>> This function will be used later by the per (context,engine) power
>>>>>> programming interface.
>>>>> No. This is not the appropriate uABI, please see
>>>>> intel_engine_lookup_user().
>>>>> -Chris
>>>>>
>>>> uAPI wise, does this sound okay? :
>>>>
>>>> #define I915_CONTEXT_PARAM_SSEU         0x7
>>>>          __u64 value;
>>>>
>>>> struct drm_i915_gem_context_param_sseu {
>>>> /*
>>>>           * Engine to be configured or queried.
>>>>           */
>>>>          __u32 class;
>>>>          __u32 instance;
>>>>
>>>> /*
>>>>           * Setting slice_mask or subslice_mask to 0 will make the 
>>>> context use
>>>>           * masks reported respectively by I915_PARAM_SLICE_MASK or
>>>>           * I915_PARAM_SUBSLICE_MASK.
>>>>           */
>>>>          union {
>>>>                  struct {
>>>>                          __u8 slice_mask;
>>>>                          __u8 subslice_mask;
>>>>                          __u8 min_eus_per_subslice;
>>>>                          __u8 max_eus_per_subslice;
>>>>                  } packed;
>>>>                  __u64 value;
>>>>          };
>>>> };
>>>
>>> Is it possible (now or in the future) to have different configs per 
>>> slice or subslice? And if so, is the way this uAPI handles that 
>>> passing these packets multiple times every time with different 
>>> slices and subslice mask?
>>>
>>> For instance:
>>>
>>> class=0, instance=0, slice_mask=0x1, subslice=0x10
>>> class=0, instance=0, slice_mask=0x10, subslice=0x1
>>>
>>> Is this something we should support?
>>
>> It's not supported in any configuration I'm aware of. It's always a 
>> uniform subslice programming across slices. > It's actually a number 
>> of slice/subslice to enable (through the register
>> interface), not a mask (i.e. we can't choose which ones get used) 
>> which is what this interface exposes.
>
> I thought for Gen11 it cannot be just number of slices but must be a 
> mask? since media will want to make sure some slices are not enabled, 
> and some are, for its workloads.

Interesting question. I'll try to get an understanding on that.
I'm guessing that you program it to half the number of subslices and it 
just works.

>
> But in general I guess I don't understand the semantics of slice and 
> subslice masks (and eu counts). Say if you pass in slice_mask=0x1 
> subslice_mask=0x1 - this means you are only enabling subslice 0x1 on 
> slice 0x1 - but what happens on other slices? They are all disabled?

slice_mask=0x1 means you only enable the first slice
subslice_mask=0x1 means for each slice, you only enable the first subslice.

So essentially on a GT4 (3 slices), slices 1 & 2 would be disable 
completely (all subslices included) and only subslice0 in slice0 would 
be powered.

>
>> It's a fair comment though. We didn't plan for asymmetric slice 
>> fusing initially and it just happened.
>
> Okay so you think uAPI should support it to be more future proof?

I don't feel like HW engineers want to give us that amount of say in 
what can be powered.
So I wouldn't make it more flexible than it currently is.

We explicitly say that subslice_mask is uniform for all slices. And if 
hardware ever grows more capability we add a new enum.

>
>> One other thing I've been wondering is whether we should strictly 
>> validate the input from userspace.
>> Right now we min/max the numbers and never raise any error.
>
> It does feel low level enough that we should reject userspace trying 
> to feed in nonsense for its own good.

Okay, adding that.

Thanks!

>
> Regards,
>
> Tvrtko
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/8] drm/i915: expose helper mapping exec flag engine to intel_engine_cs
  2018-05-03 20:09             ` Lionel Landwerlin
@ 2018-05-03 20:15               ` Chris Wilson
  0 siblings, 0 replies; 50+ messages in thread
From: Chris Wilson @ 2018-05-03 20:15 UTC (permalink / raw)
  To: Lionel Landwerlin, Tvrtko Ursulin, intel-gfx

Quoting Lionel Landwerlin (2018-05-03 21:09:01)
> On 03/05/18 19:00, Tvrtko Ursulin wrote:
> >> One other thing I've been wondering is whether we should strictly 
> >> validate the input from userspace.
> >> Right now we min/max the numbers and never raise any error.
> >
> > It does feel low level enough that we should reject userspace trying 
> > to feed in nonsense for its own good.
> 
> Okay, adding that.

As a general rule, validate input to protect the kernel and not the
user. It's a fine line between strict validation and preventing the user
from doing something you didn't consider/appreciate but is perfectly
valid (and turns out to be the preferred way down the line).

(Not enough validation hits the headlines; too much and we just curse
you as we have to carry two interfaces to do the job of one ;)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-03 17:18   ` Tvrtko Ursulin
@ 2018-05-04 16:25     ` Lionel Landwerlin
  2018-05-08  4:04       ` Rogozhkin, Dmitry V
  2018-05-08 20:56     ` Chris Wilson
  1 sibling, 1 reply; 50+ messages in thread
From: Lionel Landwerlin @ 2018-05-04 16:25 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx, Rogozhkin, Dmitry V


[-- Attachment #1.1: Type: text/plain, Size: 1528 bytes --]

On 03/05/18 18:18, Tvrtko Ursulin wrote:
>>   +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
>> +                  struct intel_engine_cs *engine,
>> +                  struct i915_gem_context_sseu *sseu)
>> +{
>> +    struct drm_i915_private *dev_priv = ctx->i915;
>> +    struct intel_context *ce;
>> +    enum intel_engine_id id;
>> +    int ret;
>> +
>> +    lockdep_assert_held(&dev_priv->drm.struct_mutex);
>> +
>> +    if (memcmp(sseu, &ctx->engine[engine->id].sseu, sizeof(*sseu)) 
>> == 0)
>> +        return 0;
>> +
>> +    /*
>> +     * We can only program this on render ring.
>> +     */
>> +    ce = &ctx->engine[RCS];
>> +
>> +    if (ce->pin_count) { /* Assume that the context is active! */
>> +        ret = i915_gem_switch_to_kernel_context(dev_priv);
>> +        if (ret)
>> +            return ret;
>> +
>> +        ret = i915_gem_wait_for_idle(dev_priv,
>> +                         I915_WAIT_INTERRUPTIBLE |
>> +                         I915_WAIT_LOCKED);
>
> Could we consider the alternative of only allowing this to be 
> configured on context create? That way we would not need to idle the 
> GPU every time userspace decides to fiddle with it. It is unprivileged 
> so quite an easy way for random app to ruin GPU performance for everyone.
>
> Regards,
>
> Tvrtko 


I'm pretty sure Dmitry wants dynamic configurations.

Dmitry?

[-- Attachment #1.2: Type: text/html, Size: 2577 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-04 16:25     ` Lionel Landwerlin
@ 2018-05-08  4:04       ` Rogozhkin, Dmitry V
  2018-05-08  8:24         ` Tvrtko Ursulin
  0 siblings, 1 reply; 50+ messages in thread
From: Rogozhkin, Dmitry V @ 2018-05-08  4:04 UTC (permalink / raw)
  To: Landwerlin, Lionel G, Tvrtko Ursulin, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 1734 bytes --]

>> I'm pretty sure Dmitry wants dynamic configurations.

Yes, I afraid we really need dynamic slice configurations for media.

From: Landwerlin, Lionel G
Sent: Friday, May 4, 2018 9:25 AM
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>; intel-gfx@lists.freedesktop.org; Rogozhkin, Dmitry V <dmitry.v.rogozhkin@intel.com>
Subject: Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

On 03/05/18 18:18, Tvrtko Ursulin wrote:
  +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
+                  struct intel_engine_cs *engine,
+                  struct i915_gem_context_sseu *sseu)
+{
+    struct drm_i915_private *dev_priv = ctx->i915;
+    struct intel_context *ce;
+    enum intel_engine_id id;
+    int ret;
+
+    lockdep_assert_held(&dev_priv->drm.struct_mutex);
+
+    if (memcmp(sseu, &ctx->engine[engine->id].sseu, sizeof(*sseu)) == 0)
+        return 0;
+
+    /*
+     * We can only program this on render ring.
+     */
+    ce = &ctx->engine[RCS];
+
+    if (ce->pin_count) { /* Assume that the context is active! */
+        ret = i915_gem_switch_to_kernel_context(dev_priv);
+        if (ret)
+            return ret;
+
+        ret = i915_gem_wait_for_idle(dev_priv,
+                         I915_WAIT_INTERRUPTIBLE |
+                         I915_WAIT_LOCKED);

Could we consider the alternative of only allowing this to be configured on context create? That way we would not need to idle the GPU every time userspace decides to fiddle with it. It is unprivileged so quite an easy way for random app to ruin GPU performance for everyone.

Regards,

Tvrtko


I'm pretty sure Dmitry wants dynamic configurations.

Dmitry?

[-- Attachment #1.2: Type: text/html, Size: 5985 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-08  4:04       ` Rogozhkin, Dmitry V
@ 2018-05-08  8:24         ` Tvrtko Ursulin
  2018-05-08 16:00           ` Rogozhkin, Dmitry V
  0 siblings, 1 reply; 50+ messages in thread
From: Tvrtko Ursulin @ 2018-05-08  8:24 UTC (permalink / raw)
  To: Rogozhkin, Dmitry V, Landwerlin, Lionel G, intel-gfx


On 08/05/2018 05:04, Rogozhkin, Dmitry V wrote:
>>>  I'm pretty sure Dmitry wants dynamic configurations.
> 
> Yes, I afraid we really need dynamic slice configurations for media.

Context of the question was whether you need to change slice 
configuration for a single GEM context between submitting batch buffers?

Regards,

Tvrtko


> *From:*Landwerlin, Lionel G
> *Sent:* Friday, May 4, 2018 9:25 AM
> *To:* Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>; 
> intel-gfx@lists.freedesktop.org; Rogozhkin, Dmitry V 
> <dmitry.v.rogozhkin@intel.com>
> *Subject:* Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) 
> configuration to userspace
> 
> On 03/05/18 18:18, Tvrtko Ursulin wrote:
> 
>            +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
>         +                  struct intel_engine_cs *engine,
>         +                  struct i915_gem_context_sseu *sseu)
>         +{
>         +    struct drm_i915_private *dev_priv = ctx->i915;
>         +    struct intel_context *ce;
>         +    enum intel_engine_id id;
>         +    int ret;
>         +
>         +    lockdep_assert_held(&dev_priv->drm.struct_mutex);
>         +
>         +    if (memcmp(sseu, &ctx->engine[engine->id].sseu,
>         sizeof(*sseu)) == 0)
>         +        return 0;
>         +
>         +    /*
>         +     * We can only program this on render ring.
>         +     */
>         +    ce = &ctx->engine[RCS];
>         +
>         +    if (ce->pin_count) { /* Assume that the context is active! */
>         +        ret = i915_gem_switch_to_kernel_context(dev_priv);
>         +        if (ret)
>         +            return ret;
>         +
>         +        ret = i915_gem_wait_for_idle(dev_priv,
>         +                         I915_WAIT_INTERRUPTIBLE |
>         +                         I915_WAIT_LOCKED);
> 
> 
>     Could we consider the alternative of only allowing this to be
>     configured on context create? That way we would not need to idle the
>     GPU every time userspace decides to fiddle with it. It is
>     unprivileged so quite an easy way for random app to ruin GPU
>     performance for everyone.
> 
>     Regards,
> 
>     Tvrtko
> 
> I'm pretty sure Dmitry wants dynamic configurations.
> 
> Dmitry?
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-08  8:24         ` Tvrtko Ursulin
@ 2018-05-08 16:00           ` Rogozhkin, Dmitry V
  0 siblings, 0 replies; 50+ messages in thread
From: Rogozhkin, Dmitry V @ 2018-05-08 16:00 UTC (permalink / raw)
  To: Tvrtko Ursulin, Landwerlin, Lionel G, intel-gfx

>> Context of the question was whether you need to change slice configuration for a single GEM context between submitting batch buffers?

When we create a context we know the optimal slice count for it. And this optimal point does not change for this context in unperturbed conditions, i.e. when context runs alone. However, there are critical use cases when we never run single context, but instead we run few contexts in parallel and each of them has its own optimal slice count operating point. As a result, major question is: what is the cost of context switch if there is associated slice configuration change, powering on or off the slice(s)? In the ideal situation when the cost is 0, we don't need any change of slice configuration for single GEM context between submitting batch buffers. Problem is that cost is far from 0. And the cost is non-tolerable for the worst case when we will have a switch at each next batch buffer. As a result, we are forced to have some negotiation channel between different contexts and make them agree on some single slice configuration which will exist for reasonably long period of time to have associated cost negligible in genera. During this period we will submit a number of batch buffers before next reconfiguration attempt. So, that's the situation when we need to reconfigure slice configuration for a single GEM context between submitting batches.

Dmitry.

-----Original Message-----
From: Tvrtko Ursulin [mailto:tvrtko.ursulin@linux.intel.com] 
Sent: Tuesday, May 8, 2018 1:25 AM
To: Rogozhkin, Dmitry V <dmitry.v.rogozhkin@intel.com>; Landwerlin, Lionel G <lionel.g.landwerlin@intel.com>; intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace


On 08/05/2018 05:04, Rogozhkin, Dmitry V wrote:
>>>  I'm pretty sure Dmitry wants dynamic configurations.
> 
> Yes, I afraid we really need dynamic slice configurations for media.

Context of the question was whether you need to change slice configuration for a single GEM context between submitting batch buffers?

Regards,

Tvrtko


> *From:*Landwerlin, Lionel G
> *Sent:* Friday, May 4, 2018 9:25 AM
> *To:* Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>; 
> intel-gfx@lists.freedesktop.org; Rogozhkin, Dmitry V 
> <dmitry.v.rogozhkin@intel.com>
> *Subject:* Re: [Intel-gfx] [PATCH 8/8] drm/i915: Expose RPCS (SSEU) 
> configuration to userspace
> 
> On 03/05/18 18:18, Tvrtko Ursulin wrote:
> 
>            +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
>         +                  struct intel_engine_cs *engine,
>         +                  struct i915_gem_context_sseu *sseu)
>         +{
>         +    struct drm_i915_private *dev_priv = ctx->i915;
>         +    struct intel_context *ce;
>         +    enum intel_engine_id id;
>         +    int ret;
>         +
>         +    lockdep_assert_held(&dev_priv->drm.struct_mutex);
>         +
>         +    if (memcmp(sseu, &ctx->engine[engine->id].sseu,
>         sizeof(*sseu)) == 0)
>         +        return 0;
>         +
>         +    /*
>         +     * We can only program this on render ring.
>         +     */
>         +    ce = &ctx->engine[RCS];
>         +
>         +    if (ce->pin_count) { /* Assume that the context is active! */
>         +        ret = i915_gem_switch_to_kernel_context(dev_priv);
>         +        if (ret)
>         +            return ret;
>         +
>         +        ret = i915_gem_wait_for_idle(dev_priv,
>         +                         I915_WAIT_INTERRUPTIBLE |
>         +                         I915_WAIT_LOCKED);
> 
> 
>     Could we consider the alternative of only allowing this to be
>     configured on context create? That way we would not need to idle the
>     GPU every time userspace decides to fiddle with it. It is
>     unprivileged so quite an easy way for random app to ruin GPU
>     performance for everyone.
> 
>     Regards,
> 
>     Tvrtko
> 
> I'm pretty sure Dmitry wants dynamic configurations.
> 
> Dmitry?
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-03 17:18   ` Tvrtko Ursulin
  2018-05-04 16:25     ` Lionel Landwerlin
@ 2018-05-08 20:56     ` Chris Wilson
  2018-05-09 15:35       ` Lionel Landwerlin
  1 sibling, 1 reply; 50+ messages in thread
From: Chris Wilson @ 2018-05-08 20:56 UTC (permalink / raw)
  To: Tvrtko Ursulin, Lionel Landwerlin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-03 18:18:43)
> 
> On 25/04/2018 12:45, Lionel Landwerlin wrote:
> > From: Chris Wilson <chris@chris-wilson.co.uk>
> > 
> > We want to allow userspace to reconfigure the subslice configuration for
> > its own use case. To do so, we expose a context parameter to allow
> > adjustment of the RPCS register stored within the context image (and
> > currently not accessible via LRI). If the context is adjusted before
> > first use, the adjustment is for "free"; otherwise if the context is
> > active we flush the context off the GPU (stalling all users) and forcing
> > the GPU to save the context to memory where we can modify it and so
> > ensure that the register is reloaded on next execution.
> > 
> > The overhead of managing additional EU subslices can be significant,
> > especially in multi-context workloads. Non-GPGPU contexts should
> > preferably disable the subslices it is not using, and others should
> > fine-tune the number to match their workload.
> > 
> > We expose complete control over the RPCS register, allowing
> > configuration of slice/subslice, via masks packed into a u64 for
> > simplicity. For example,
> > 
> >       struct drm_i915_gem_context_param arg;
> >       struct drm_i915_gem_context_param_sseu sseu = { .flags = I915_EXEC_RENDER };
> > 
> >       memset(&arg, 0, sizeof(arg));
> >       arg.ctx_id = ctx;
> >       arg.param = I915_CONTEXT_PARAM_SSEU;
> >       arg.value = (uintptr_t) &sseu;
> >       if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
> >               sseu.packed.subslice_mask = 0;
> > 
> >               drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
> >       }
> > 
> > could be used to disable all subslices where supported.
> > 
> > v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
> > 
> > v3: Add ability to program this per engine (Chris)
> > 
> > v4: Move most get_sseu() into i915_gem_context.c (Lionel)
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> > c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
> > CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > CC: Zhipeng Gong <zhipeng.gong@intel.com>
> > CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem_context.c | 82 ++++++++++++++++++++++++-
> >   drivers/gpu/drm/i915/intel_lrc.c        | 55 +++++++++++++++++
> >   drivers/gpu/drm/i915/intel_lrc.h        |  4 ++
> >   include/uapi/drm/i915_drm.h             | 28 +++++++++
> >   4 files changed, 168 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> > index bdf050beeb94..b97ddcf47514 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
> >       return 0;
> >   }
> >   
> > +static struct i915_gem_context_sseu
> > +i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
> > +                                  const struct drm_i915_gem_context_param_sseu *user_sseu)
> > +{
> > +     struct i915_gem_context_sseu value = {
> > +             .slice_mask = user_sseu->packed.slice_mask == 0 ?
> > +                           sseu->slice_mask :
> > +                           (user_sseu->packed.slice_mask & sseu->slice_mask),
> > +             .subslice_mask = user_sseu->packed.subslice_mask == 0 ?
> > +                              sseu->subslice_mask[0] :
> > +                              (user_sseu->packed.subslice_mask & sseu->subslice_mask[0]),
> > +             .min_eus_per_subslice = min(user_sseu->packed.min_eus_per_subslice,
> > +                                         sseu->max_eus_per_subslice),
> > +             .max_eus_per_subslice = min(user_sseu->packed.max_eus_per_subslice,
> > +                                         sseu->max_eus_per_subslice),
> > +     };
> > +
> > +     return value;
> > +}
> > +
> >   int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
> >                                   struct drm_file *file)
> >   {
> > @@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
> >       case I915_CONTEXT_PARAM_PRIORITY:
> >               args->value = ctx->sched.priority;
> >               break;
> > +     case I915_CONTEXT_PARAM_SSEU: {
> > +             struct drm_i915_gem_context_param_sseu param_sseu;
> > +             struct intel_engine_cs *engine;
> > +
> > +             if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
> > +                                sizeof(param_sseu))) {
> > +                     ret = -EFAULT;
> > +                     break;
> > +             }
> > +
> > +             engine = i915_gem_engine_from_flags(to_i915(dev), file,
> > +                                                 param_sseu.flags);
> > +             if (!engine) {
> > +                     ret = -EINVAL;
> > +                     break;
> > +             }
> > +
> > +             param_sseu.packed.slice_mask =
> > +                     ctx->engine[engine->id].sseu.slice_mask;
> > +             param_sseu.packed.subslice_mask =
> > +                     ctx->engine[engine->id].sseu.subslice_mask;
> > +             param_sseu.packed.min_eus_per_subslice =
> > +                     ctx->engine[engine->id].sseu.min_eus_per_subslice;
> > +             param_sseu.packed.max_eus_per_subslice =
> > +                     ctx->engine[engine->id].sseu.max_eus_per_subslice;
> > +
> > +             if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
> > +                              sizeof(param_sseu)))
> > +                     ret = -EFAULT;
> > +             break;
> > +     }
> >       default:
> >               ret = -EINVAL;
> >               break;
> > @@ -832,7 +883,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
> >               else
> >                       i915_gem_context_clear_bannable(ctx);
> >               break;
> > -
> >       case I915_CONTEXT_PARAM_PRIORITY:
> >               {
> >                       s64 priority = args->value;
> > @@ -851,7 +901,37 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
> >                               ctx->sched.priority = priority;
> >               }
> >               break;
> > +     case I915_CONTEXT_PARAM_SSEU:
> > +             if (args->size)
> > +                     ret = -EINVAL;
> > +             else if (!HAS_EXECLISTS(ctx->i915))
> > +                     ret = -ENODEV;
> > +             else {
> > +                     struct drm_i915_private *dev_priv = to_i915(dev);
> > +                     struct drm_i915_gem_context_param_sseu user_sseu;
> > +                     struct i915_gem_context_sseu ctx_sseu;
> > +                     struct intel_engine_cs *engine;
> > +
> > +                     if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
> > +                                        sizeof(user_sseu))) {
> > +                             ret = -EFAULT;
> > +                             break;
> > +                     }
> > +
> > +                     engine = i915_gem_engine_from_flags(dev_priv, file,
> > +                                                         user_sseu.flags);
> > +                     if (!engine) {
> > +                             ret = -EINVAL;
> > +                             break;
> > +                     }
> > +
> > +                     ctx_sseu =
> > +                             i915_gem_context_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
> > +                                                                  &user_sseu);
> >   
> > +                     ret = intel_lr_context_set_sseu(ctx, engine, &ctx_sseu);
> > +             }
> > +             break;
> >       default:
> >               ret = -EINVAL;
> >               break;
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index dca17ef24de5..9caddcb1f234 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -2711,6 +2711,61 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv)
> >       }
> >   }
> >   
> > +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
> > +                           struct intel_engine_cs *engine,
> > +                           struct i915_gem_context_sseu *sseu)
> > +{
> > +     struct drm_i915_private *dev_priv = ctx->i915;
> > +     struct intel_context *ce;
> > +     enum intel_engine_id id;
> > +     int ret;
> > +
> > +     lockdep_assert_held(&dev_priv->drm.struct_mutex);
> > +
> > +     if (memcmp(sseu, &ctx->engine[engine->id].sseu, sizeof(*sseu)) == 0)
> > +             return 0;
> > +
> > +     /*
> > +      * We can only program this on render ring.
> > +      */
> > +     ce = &ctx->engine[RCS];
> > +
> > +     if (ce->pin_count) { /* Assume that the context is active! */
> > +             ret = i915_gem_switch_to_kernel_context(dev_priv);
> > +             if (ret)
> > +                     return ret;
> > +
> > +             ret = i915_gem_wait_for_idle(dev_priv,
> > +                                          I915_WAIT_INTERRUPTIBLE |
> > +                                          I915_WAIT_LOCKED);
> 
> Could we consider the alternative of only allowing this to be configured 
> on context create? That way we would not need to idle the GPU every time 
> userspace decides to fiddle with it. It is unprivileged so quite an easy 
> way for random app to ruin GPU performance for everyone.

I think we can do dynamic reconfiguration of the context using LRI. And
if we can't do so from within the context, we can use SDM from another.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-08 20:56     ` Chris Wilson
@ 2018-05-09 15:35       ` Lionel Landwerlin
  0 siblings, 0 replies; 50+ messages in thread
From: Lionel Landwerlin @ 2018-05-09 15:35 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, intel-gfx

On 08/05/18 21:56, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-05-03 18:18:43)
>> On 25/04/2018 12:45, Lionel Landwerlin wrote:
>>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>>
>>> We want to allow userspace to reconfigure the subslice configuration for
>>> its own use case. To do so, we expose a context parameter to allow
>>> adjustment of the RPCS register stored within the context image (and
>>> currently not accessible via LRI). If the context is adjusted before
>>> first use, the adjustment is for "free"; otherwise if the context is
>>> active we flush the context off the GPU (stalling all users) and forcing
>>> the GPU to save the context to memory where we can modify it and so
>>> ensure that the register is reloaded on next execution.
>>>
>>> The overhead of managing additional EU subslices can be significant,
>>> especially in multi-context workloads. Non-GPGPU contexts should
>>> preferably disable the subslices it is not using, and others should
>>> fine-tune the number to match their workload.
>>>
>>> We expose complete control over the RPCS register, allowing
>>> configuration of slice/subslice, via masks packed into a u64 for
>>> simplicity. For example,
>>>
>>>        struct drm_i915_gem_context_param arg;
>>>        struct drm_i915_gem_context_param_sseu sseu = { .flags = I915_EXEC_RENDER };
>>>
>>>        memset(&arg, 0, sizeof(arg));
>>>        arg.ctx_id = ctx;
>>>        arg.param = I915_CONTEXT_PARAM_SSEU;
>>>        arg.value = (uintptr_t) &sseu;
>>>        if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>>>                sseu.packed.subslice_mask = 0;
>>>
>>>                drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>>>        }
>>>
>>> could be used to disable all subslices where supported.
>>>
>>> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
>>>
>>> v3: Add ability to program this per engine (Chris)
>>>
>>> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
>>> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> CC: Zhipeng Gong <zhipeng.gong@intel.com>
>>> CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_gem_context.c | 82 ++++++++++++++++++++++++-
>>>    drivers/gpu/drm/i915/intel_lrc.c        | 55 +++++++++++++++++
>>>    drivers/gpu/drm/i915/intel_lrc.h        |  4 ++
>>>    include/uapi/drm/i915_drm.h             | 28 +++++++++
>>>    4 files changed, 168 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
>>> index bdf050beeb94..b97ddcf47514 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>>> @@ -740,6 +740,26 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>>>        return 0;
>>>    }
>>>    
>>> +static struct i915_gem_context_sseu
>>> +i915_gem_context_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
>>> +                                  const struct drm_i915_gem_context_param_sseu *user_sseu)
>>> +{
>>> +     struct i915_gem_context_sseu value = {
>>> +             .slice_mask = user_sseu->packed.slice_mask == 0 ?
>>> +                           sseu->slice_mask :
>>> +                           (user_sseu->packed.slice_mask & sseu->slice_mask),
>>> +             .subslice_mask = user_sseu->packed.subslice_mask == 0 ?
>>> +                              sseu->subslice_mask[0] :
>>> +                              (user_sseu->packed.subslice_mask & sseu->subslice_mask[0]),
>>> +             .min_eus_per_subslice = min(user_sseu->packed.min_eus_per_subslice,
>>> +                                         sseu->max_eus_per_subslice),
>>> +             .max_eus_per_subslice = min(user_sseu->packed.max_eus_per_subslice,
>>> +                                         sseu->max_eus_per_subslice),
>>> +     };
>>> +
>>> +     return value;
>>> +}
>>> +
>>>    int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>>>                                    struct drm_file *file)
>>>    {
>>> @@ -777,6 +797,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>>>        case I915_CONTEXT_PARAM_PRIORITY:
>>>                args->value = ctx->sched.priority;
>>>                break;
>>> +     case I915_CONTEXT_PARAM_SSEU: {
>>> +             struct drm_i915_gem_context_param_sseu param_sseu;
>>> +             struct intel_engine_cs *engine;
>>> +
>>> +             if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
>>> +                                sizeof(param_sseu))) {
>>> +                     ret = -EFAULT;
>>> +                     break;
>>> +             }
>>> +
>>> +             engine = i915_gem_engine_from_flags(to_i915(dev), file,
>>> +                                                 param_sseu.flags);
>>> +             if (!engine) {
>>> +                     ret = -EINVAL;
>>> +                     break;
>>> +             }
>>> +
>>> +             param_sseu.packed.slice_mask =
>>> +                     ctx->engine[engine->id].sseu.slice_mask;
>>> +             param_sseu.packed.subslice_mask =
>>> +                     ctx->engine[engine->id].sseu.subslice_mask;
>>> +             param_sseu.packed.min_eus_per_subslice =
>>> +                     ctx->engine[engine->id].sseu.min_eus_per_subslice;
>>> +             param_sseu.packed.max_eus_per_subslice =
>>> +                     ctx->engine[engine->id].sseu.max_eus_per_subslice;
>>> +
>>> +             if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
>>> +                              sizeof(param_sseu)))
>>> +                     ret = -EFAULT;
>>> +             break;
>>> +     }
>>>        default:
>>>                ret = -EINVAL;
>>>                break;
>>> @@ -832,7 +883,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>>>                else
>>>                        i915_gem_context_clear_bannable(ctx);
>>>                break;
>>> -
>>>        case I915_CONTEXT_PARAM_PRIORITY:
>>>                {
>>>                        s64 priority = args->value;
>>> @@ -851,7 +901,37 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>>>                                ctx->sched.priority = priority;
>>>                }
>>>                break;
>>> +     case I915_CONTEXT_PARAM_SSEU:
>>> +             if (args->size)
>>> +                     ret = -EINVAL;
>>> +             else if (!HAS_EXECLISTS(ctx->i915))
>>> +                     ret = -ENODEV;
>>> +             else {
>>> +                     struct drm_i915_private *dev_priv = to_i915(dev);
>>> +                     struct drm_i915_gem_context_param_sseu user_sseu;
>>> +                     struct i915_gem_context_sseu ctx_sseu;
>>> +                     struct intel_engine_cs *engine;
>>> +
>>> +                     if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
>>> +                                        sizeof(user_sseu))) {
>>> +                             ret = -EFAULT;
>>> +                             break;
>>> +                     }
>>> +
>>> +                     engine = i915_gem_engine_from_flags(dev_priv, file,
>>> +                                                         user_sseu.flags);
>>> +                     if (!engine) {
>>> +                             ret = -EINVAL;
>>> +                             break;
>>> +                     }
>>> +
>>> +                     ctx_sseu =
>>> +                             i915_gem_context_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
>>> +                                                                  &user_sseu);
>>>    
>>> +                     ret = intel_lr_context_set_sseu(ctx, engine, &ctx_sseu);
>>> +             }
>>> +             break;
>>>        default:
>>>                ret = -EINVAL;
>>>                break;
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>> index dca17ef24de5..9caddcb1f234 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -2711,6 +2711,61 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv)
>>>        }
>>>    }
>>>    
>>> +int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
>>> +                           struct intel_engine_cs *engine,
>>> +                           struct i915_gem_context_sseu *sseu)
>>> +{
>>> +     struct drm_i915_private *dev_priv = ctx->i915;
>>> +     struct intel_context *ce;
>>> +     enum intel_engine_id id;
>>> +     int ret;
>>> +
>>> +     lockdep_assert_held(&dev_priv->drm.struct_mutex);
>>> +
>>> +     if (memcmp(sseu, &ctx->engine[engine->id].sseu, sizeof(*sseu)) == 0)
>>> +             return 0;
>>> +
>>> +     /*
>>> +      * We can only program this on render ring.
>>> +      */
>>> +     ce = &ctx->engine[RCS];
>>> +
>>> +     if (ce->pin_count) { /* Assume that the context is active! */
>>> +             ret = i915_gem_switch_to_kernel_context(dev_priv);
>>> +             if (ret)
>>> +                     return ret;
>>> +
>>> +             ret = i915_gem_wait_for_idle(dev_priv,
>>> +                                          I915_WAIT_INTERRUPTIBLE |
>>> +                                          I915_WAIT_LOCKED);
>> Could we consider the alternative of only allowing this to be configured
>> on context create? That way we would not need to idle the GPU every time
>> userspace decides to fiddle with it. It is unprivileged so quite an easy
>> way for random app to ruin GPU performance for everyone.
> I think we can do dynamic reconfiguration of the context using LRI. And
> if we can't do so from within the context, we can use SDM from another.
> -Chris
>
Documentation says we can't LRI. So that leaves SDM from another context.
Should we use the kernel one?

Thanks,

-
Lionel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-08-15 11:51     ` Tvrtko Ursulin
@ 2018-08-15 11:56       ` Chris Wilson
  0 siblings, 0 replies; 50+ messages in thread
From: Chris Wilson @ 2018-08-15 11:56 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-08-15 12:51:28)
> 
> On 14/08/2018 16:22, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-08-14 15:40:58)
> > Looks like we should be able to hook this up to a selftest to confirm
> > the modification does land in the target context image, and a SRM to
> > confirm it loaded.
> 
> Lionel wrote an IGT which reads it back via SRM so that should be 
> covered. I will be posting the IGT counterpart at some point as well, 
> ideally when the agreement on the i915 side is there.

Wouldn't you rather have both? I know I would :-p

The emphasis is slightly different, IGT wants to cover the ABI
behaviour, selftests looking towards the edge cases hard to reach
otherwise. But even a trivial selftest would let us know the series
functions as intended.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-08-14 15:22   ` Chris Wilson
@ 2018-08-15 11:51     ` Tvrtko Ursulin
  2018-08-15 11:56       ` Chris Wilson
  0 siblings, 1 reply; 50+ messages in thread
From: Tvrtko Ursulin @ 2018-08-15 11:51 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin


On 14/08/2018 16:22, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-08-14 15:40:58)
>> +static int
>> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
>> +                                 struct intel_engine_cs *engine,
>> +                                 struct intel_sseu sseu)
>> +{
>> +       struct drm_i915_private *i915 = ctx->i915;
>> +       struct i915_request *rq;
>> +       struct intel_ring *ring;
>> +       int ret;
>> +
>> +       lockdep_assert_held(&i915->drm.struct_mutex);
>> +
>> +       /* Submitting requests etc needs the hw awake. */
>> +       intel_runtime_pm_get(i915);
>> +
>> +       i915_retire_requests(i915);
>> +
>> +       /* Now use the RCS to actually reconfigure. */
>> +       engine = i915->engine[RCS];
>> +
>> +       rq = i915_request_alloc(engine, i915->kernel_context);
>> +       if (IS_ERR(rq)) {
>> +               ret = PTR_ERR(rq);
>> +               goto out_put;
>> +       }
>> +
>> +       ret = engine->emit_rpcs_config(rq, ctx, sseu);
>> +       if (ret)
>> +               goto out_add;
>> +
>> +       /* Queue this switch after all other activity */
>> +       list_for_each_entry(ring, &i915->gt.active_rings, active_link) {
>> +               struct i915_request *prev;
>> +
>> +               prev = last_request_on_engine(ring->timeline, engine);
>> +               if (prev)
>> +                       i915_sw_fence_await_sw_fence_gfp(&rq->submit,
>> +                                                        &prev->submit,
>> +                                                        I915_FENCE_GFP);
>> +       }
>> +
>> +       i915_gem_set_global_barrier(i915, rq);
>> +
>> +out_add:
>> +       i915_request_add(rq);
>> +out_put:
>> +       intel_runtime_pm_put(i915);
>> +
>> +       return ret;
> 
> Looks like we should be able to hook this up to a selftest to confirm
> the modification does land in the target context image, and a SRM to
> confirm it loaded.

Lionel wrote an IGT which reads it back via SRM so that should be 
covered. I will be posting the IGT counterpart at some point as well, 
ideally when the agreement on the i915 side is there.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-08-14 18:53       ` Chris Wilson
@ 2018-08-15  9:12         ` Tvrtko Ursulin
  0 siblings, 0 replies; 50+ messages in thread
From: Tvrtko Ursulin @ 2018-08-15  9:12 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin


On 14/08/2018 19:53, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-08-14 19:44:09)
>>
>> On 14/08/2018 15:59, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-08-14 15:40:58)
>>>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>>>
>>>> We want to allow userspace to reconfigure the subslice configuration for
>>>> its own use case. To do so, we expose a context parameter to allow
>>>> adjustment of the RPCS register stored within the context image (and
>>>> currently not accessible via LRI). If the context is adjusted before
>>>> first use, the adjustment is for "free"; otherwise if the context is
>>>> active we flush the context off the GPU (stalling all users) and forcing
>>>> the GPU to save the context to memory where we can modify it and so
>>>> ensure that the register is reloaded on next execution.
>>>>
>>>> The overhead of managing additional EU subslices can be significant,
>>>> especially in multi-context workloads. Non-GPGPU contexts should
>>>> preferably disable the subslices it is not using, and others should
>>>> fine-tune the number to match their workload.
>>>>
>>>> We expose complete control over the RPCS register, allowing
>>>> configuration of slice/subslice, via masks packed into a u64 for
>>>> simplicity. For example,
>>>>
>>>>           struct drm_i915_gem_context_param arg;
>>>>           struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
>>>>                                                           .instance = 0, };
>>>>
>>>>           memset(&arg, 0, sizeof(arg));
>>>>           arg.ctx_id = ctx;
>>>>           arg.param = I915_CONTEXT_PARAM_SSEU;
>>>>           arg.value = (uintptr_t) &sseu;
>>>>           if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>>>>                   sseu.packed.subslice_mask = 0;
>>>>
>>>>                   drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>>>>           }
>>>>
>>>> could be used to disable all subslices where supported.
>>>>
>>>> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
>>>>
>>>> v3: Add ability to program this per engine (Chris)
>>>>
>>>> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
>>>>
>>>> v5: Validate sseu configuration against the device's capabilities (Lionel)
>>>>
>>>> v6: Change context powergating settings through MI_SDM on kernel context (Chris)
>>>>
>>>> v7: Synchronize the requests following a powergating setting change using a global
>>>>       dependency (Chris)
>>>>       Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
>>>>       Disable RPCS configuration setting for non capable users (Lionel/Tvrtko)
>>>>
>>>> v8: s/union intel_sseu/struct intel_sseu/ (Lionel)
>>>>       s/dev_priv/i915/ (Tvrtko)
>>>>       Change uapi class/instance fields to u16 (Tvrtko)
>>>>       Bump mask fields to 64bits (Lionel)
>>>>       Don't return EPERM when dynamic sseu is disabled (Tvrtko)
>>>>
>>>> v9: Import context image into kernel context's ppgtt only when
>>>>       reconfiguring powergated slice/subslices (Chris)
>>>>       Use aliasing ppgtt when needed (Michel)
>>>>
>>>> Tvrtko Ursulin:
>>>>
>>>> v10:
>>>>    * Update for upstream changes.
>>>>    * Request submit needs a RPM reference.
>>>>    * Reject on !FULL_PPGTT for simplicity.
>>>>    * Pull out get/set param to helpers for readability and less indent.
>>>>    * Use i915_request_await_dma_fence in add_global_barrier to skip waits
>>>>      on the same timeline and avoid GEM_BUG_ON.
>>>>    * No need to explicitly assign a NULL pointer to engine in legacy mode.
>>>>    * No need to move gen8_make_rpcs up.
>>>>    * Factored out global barrier as prep patch.
>>>>    * Allow to only CAP_SYS_ADMIN if !Gen11.
>>>>
>>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
>>>> Issue: https://github.com/intel/media-driver/issues/267
>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>>> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> Cc: Zhipeng Gong <zhipeng.gong@intel.com>
>>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/i915_gem_context.c | 187 +++++++++++++++++++++++-
>>>>    drivers/gpu/drm/i915/intel_lrc.c        |  55 +++++++
>>>>    drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
>>>>    include/uapi/drm/i915_drm.h             |  43 ++++++
>>>>    4 files changed, 288 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
>>>> index 8a12984e7495..6d6220634e9e 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>>>> @@ -773,6 +773,91 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>>>>           return 0;
>>>>    }
>>>>    
>>>> +static int
>>>> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
>>>> +                                 struct intel_engine_cs *engine,
>>>> +                                 struct intel_sseu sseu)
>>>> +{
>>>> +       struct drm_i915_private *i915 = ctx->i915;
>>>> +       struct i915_request *rq;
>>>> +       struct intel_ring *ring;
>>>> +       int ret;
>>>> +
>>>> +       lockdep_assert_held(&i915->drm.struct_mutex);
>>>> +
>>>> +       /* Submitting requests etc needs the hw awake. */
>>>> +       intel_runtime_pm_get(i915);
>>>> +
>>>> +       i915_retire_requests(i915);
>>>
>>> ?
>>
>> I wondered myself but did not make myself dig through all the history of
>> the series. Cannot think that it does anything useful in the current design.
>>
>>>> +
>>>> +       /* Now use the RCS to actually reconfigure. */
>>>> +       engine = i915->engine[RCS];
>>>
>>> ? Modifying registers stored in another engine's context image.
>>
>> Well, I was hoping design was kind of agreed by now.
> 
> :-p
> 
> This wasn't about using requests per-se, but raising the question of why
> use rcs to adjust the vcs context image. If we used vcs, the
> question of serialisation is then only on that engine.

It only ever modifies the RCS context image - since that is where the 
feature is only supported AFAIU. And does it via RCS which again AFAIU 
ensures we not only received the user interrupt for the last previous rq 
on the timeline, but it has been fully context saved as well.

In the light of this perhaps the uAPI allowing set param on any engine 
is incorrect for now, and it should instead return -ENODEV for !RCS. It 
is misleading that SSEU config on other engines will succeed, but in 
fact the RCS context image is modified and SSEU configuration only 
applies if/when the RCS context runs.

So now I am thinking the uAPI is fine to support other engines for now, 
to be future proof, but unless I am not getting something it should 
really return -ENODEV for !RCS.

>>>> +       rq = i915_request_alloc(engine, i915->kernel_context);
>>>> +       if (IS_ERR(rq)) {
>>>> +               ret = PTR_ERR(rq);
>>>> +               goto out_put;
>>>> +       }
>>>> +
>>>> +       ret = engine->emit_rpcs_config(rq, ctx, sseu);
>>>
>>> It's just an LRI, I'd rather we do it directly unless there's evidence
>>> that there will be na explicit rpcs config instruction in future. It
>>> just doesn't seem general enough.
>>
>> No complaints, can do.
>>
>>>> +       if (ret)
>>>> +               goto out_add;
>>>> +
>>>> +       /* Queue this switch after all other activity */
>>>
>>> Only needs to be after the target ctx.
>>
>> True, so find just the timeline belonging to target context. Some
>> backpointers would be needed to find it. Or a walk and compare against
>> target ce->ring->timeline->fence_context. Sounds like more than the
>> latter isn't justified for this use case.
> 
> Right, we should be able to use ce->ring->timeline.
> 
>>>> +       list_for_each_entry(ring, &i915->gt.active_rings, active_link) {
>>>> +               struct i915_request *prev;
>>>> +
>>>> +               prev = last_request_on_engine(ring->timeline, engine);
>>>
>>> As constructed above you need target-engine + RCS.
>>
>> Target engine is always RCS. Looks like the engine param to this
>> function is pointless.
> 
> Always always? At the beginning the speculation was that the other
> engines were going to get their own subslice parameters. I guess that
> was on the same design sheet as the VCS commands...

See above.

> 
>>>> +               if (prev)
>>>> +                       i915_sw_fence_await_sw_fence_gfp(&rq->submit,
>>>> +                                                        &prev->submit,
>>>> +                                                        I915_FENCE_GFP);
>>>> +       }
>>>> +
>>>> +       i915_gem_set_global_barrier(i915, rq);
>>>
>>> This is just for a link from ctx-engine to this rq. Overkill much?
>>> Presumably this stems from using the wrong engine.
>>
>> AFAIU this is to ensure target context cannot get ahead of this request.
>> Without it could overtake and then there is no guarantee execbufs
>> following set param will be with new SSEU configuration.
> 
> Right, but we only need a fence on the target context to this request.
> The existing way would be to queue an empty request on the target with
> the await on this rq, but if you feel that is one request too many we
> can use a timeline barrier. Or if that is too narrow, an engine barrier.
> Over time, I can see that we may want all 3 barrier levels. The cost
> isn't too onerous.

Not sure exactly what you are thinking here but I think we do need a 
global barrier, or at least a barrier local to the RCS engine.

Since the context update is done from the kernel context, we have to 
ensure new requests against the target context cannot overtake the 
context update request.

Perhaps it would be sufficient to set the highest, non user accessible 
priority on the update request? Barrier sounds more future proof and 
correct from design point of view though.

> 
>>>> +
>>>> +out_add:
>>>> +       i915_request_add(rq);
>>>
>>> And I'd still recommend not using indirect access if we can apply the
>>> changes immediately.
>>
>> Direct (CPU) access means blocking in set param until the context is
>> idle.
> 
> No, it doesn't. It's is just a choice of whether you use request to a
> pinned context, or direct access if idle (literally ce->pin_count).
> Direct access is going to be near zero overhead.

You cut all the rest of my comments here so it is unclear what you meant.

It was mostly about avoiding multiple paths to do the same thing. As I 
see it, we cannot only have the CPU update path - since it does imply 
idling the GPU. If you disagree with that please explain.

And if we do need the SDM path anyway, the CPU update path could be 
added later. I agree it would bring a small advantage of not having to 
add a global barrier *if* the target context is idle *and* RCS is busy 
with other contexts.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-08-14 18:44     ` Tvrtko Ursulin
@ 2018-08-14 18:53       ` Chris Wilson
  2018-08-15  9:12         ` Tvrtko Ursulin
  0 siblings, 1 reply; 50+ messages in thread
From: Chris Wilson @ 2018-08-14 18:53 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-08-14 19:44:09)
> 
> On 14/08/2018 15:59, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-08-14 15:40:58)
> >> From: Chris Wilson <chris@chris-wilson.co.uk>
> >>
> >> We want to allow userspace to reconfigure the subslice configuration for
> >> its own use case. To do so, we expose a context parameter to allow
> >> adjustment of the RPCS register stored within the context image (and
> >> currently not accessible via LRI). If the context is adjusted before
> >> first use, the adjustment is for "free"; otherwise if the context is
> >> active we flush the context off the GPU (stalling all users) and forcing
> >> the GPU to save the context to memory where we can modify it and so
> >> ensure that the register is reloaded on next execution.
> >>
> >> The overhead of managing additional EU subslices can be significant,
> >> especially in multi-context workloads. Non-GPGPU contexts should
> >> preferably disable the subslices it is not using, and others should
> >> fine-tune the number to match their workload.
> >>
> >> We expose complete control over the RPCS register, allowing
> >> configuration of slice/subslice, via masks packed into a u64 for
> >> simplicity. For example,
> >>
> >>          struct drm_i915_gem_context_param arg;
> >>          struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
> >>                                                          .instance = 0, };
> >>
> >>          memset(&arg, 0, sizeof(arg));
> >>          arg.ctx_id = ctx;
> >>          arg.param = I915_CONTEXT_PARAM_SSEU;
> >>          arg.value = (uintptr_t) &sseu;
> >>          if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
> >>                  sseu.packed.subslice_mask = 0;
> >>
> >>                  drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
> >>          }
> >>
> >> could be used to disable all subslices where supported.
> >>
> >> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
> >>
> >> v3: Add ability to program this per engine (Chris)
> >>
> >> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
> >>
> >> v5: Validate sseu configuration against the device's capabilities (Lionel)
> >>
> >> v6: Change context powergating settings through MI_SDM on kernel context (Chris)
> >>
> >> v7: Synchronize the requests following a powergating setting change using a global
> >>      dependency (Chris)
> >>      Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
> >>      Disable RPCS configuration setting for non capable users (Lionel/Tvrtko)
> >>
> >> v8: s/union intel_sseu/struct intel_sseu/ (Lionel)
> >>      s/dev_priv/i915/ (Tvrtko)
> >>      Change uapi class/instance fields to u16 (Tvrtko)
> >>      Bump mask fields to 64bits (Lionel)
> >>      Don't return EPERM when dynamic sseu is disabled (Tvrtko)
> >>
> >> v9: Import context image into kernel context's ppgtt only when
> >>      reconfiguring powergated slice/subslices (Chris)
> >>      Use aliasing ppgtt when needed (Michel)
> >>
> >> Tvrtko Ursulin:
> >>
> >> v10:
> >>   * Update for upstream changes.
> >>   * Request submit needs a RPM reference.
> >>   * Reject on !FULL_PPGTT for simplicity.
> >>   * Pull out get/set param to helpers for readability and less indent.
> >>   * Use i915_request_await_dma_fence in add_global_barrier to skip waits
> >>     on the same timeline and avoid GEM_BUG_ON.
> >>   * No need to explicitly assign a NULL pointer to engine in legacy mode.
> >>   * No need to move gen8_make_rpcs up.
> >>   * Factored out global barrier as prep patch.
> >>   * Allow to only CAP_SYS_ADMIN if !Gen11.
> >>
> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> >> Issue: https://github.com/intel/media-driver/issues/267
> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> >> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
> >> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> Cc: Zhipeng Gong <zhipeng.gong@intel.com>
> >> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/i915_gem_context.c | 187 +++++++++++++++++++++++-
> >>   drivers/gpu/drm/i915/intel_lrc.c        |  55 +++++++
> >>   drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
> >>   include/uapi/drm/i915_drm.h             |  43 ++++++
> >>   4 files changed, 288 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> >> index 8a12984e7495..6d6220634e9e 100644
> >> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> >> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> >> @@ -773,6 +773,91 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
> >>          return 0;
> >>   }
> >>   
> >> +static int
> >> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
> >> +                                 struct intel_engine_cs *engine,
> >> +                                 struct intel_sseu sseu)
> >> +{
> >> +       struct drm_i915_private *i915 = ctx->i915;
> >> +       struct i915_request *rq;
> >> +       struct intel_ring *ring;
> >> +       int ret;
> >> +
> >> +       lockdep_assert_held(&i915->drm.struct_mutex);
> >> +
> >> +       /* Submitting requests etc needs the hw awake. */
> >> +       intel_runtime_pm_get(i915);
> >> +
> >> +       i915_retire_requests(i915);
> > 
> > ?
> 
> I wondered myself but did not make myself dig through all the history of 
> the series. Cannot think that it does anything useful in the current design.
> 
> >> +
> >> +       /* Now use the RCS to actually reconfigure. */
> >> +       engine = i915->engine[RCS];
> > 
> > ? Modifying registers stored in another engine's context image.
> 
> Well, I was hoping design was kind of agreed by now.

:-p

This wasn't about using requests per-se, but raising the question of why
use rcs to adjust the vcs context image. If we used vcs, the
question of serialisation is then only on that engine.
 
> >> +       rq = i915_request_alloc(engine, i915->kernel_context);
> >> +       if (IS_ERR(rq)) {
> >> +               ret = PTR_ERR(rq);
> >> +               goto out_put;
> >> +       }
> >> +
> >> +       ret = engine->emit_rpcs_config(rq, ctx, sseu);
> > 
> > It's just an LRI, I'd rather we do it directly unless there's evidence
> > that there will be na explicit rpcs config instruction in future. It
> > just doesn't seem general enough.
> 
> No complaints, can do.
> 
> >> +       if (ret)
> >> +               goto out_add;
> >> +
> >> +       /* Queue this switch after all other activity */
> > 
> > Only needs to be after the target ctx.
> 
> True, so find just the timeline belonging to target context. Some 
> backpointers would be needed to find it. Or a walk and compare against 
> target ce->ring->timeline->fence_context. Sounds like more than the 
> latter isn't justified for this use case.

Right, we should be able to use ce->ring->timeline.

> >> +       list_for_each_entry(ring, &i915->gt.active_rings, active_link) {
> >> +               struct i915_request *prev;
> >> +
> >> +               prev = last_request_on_engine(ring->timeline, engine);
> > 
> > As constructed above you need target-engine + RCS.
> 
> Target engine is always RCS. Looks like the engine param to this 
> function is pointless.

Always always? At the beginning the speculation was that the other
engines were going to get their own subslice parameters. I guess that
was on the same design sheet as the VCS commands...

> >> +               if (prev)
> >> +                       i915_sw_fence_await_sw_fence_gfp(&rq->submit,
> >> +                                                        &prev->submit,
> >> +                                                        I915_FENCE_GFP);
> >> +       }
> >> +
> >> +       i915_gem_set_global_barrier(i915, rq);
> > 
> > This is just for a link from ctx-engine to this rq. Overkill much?
> > Presumably this stems from using the wrong engine.
> 
> AFAIU this is to ensure target context cannot get ahead of this request. 
> Without it could overtake and then there is no guarantee execbufs 
> following set param will be with new SSEU configuration.

Right, but we only need a fence on the target context to this request.
The existing way would be to queue an empty request on the target with
the await on this rq, but if you feel that is one request too many we
can use a timeline barrier. Or if that is too narrow, an engine barrier.
Over time, I can see that we may want all 3 barrier levels. The cost
isn't too onerous.

> >> +
> >> +out_add:
> >> +       i915_request_add(rq);
> > 
> > And I'd still recommend not using indirect access if we can apply the
> > changes immediately.
> 
> Direct (CPU) access means blocking in set param until the context is 
> idle.

No, it doesn't. It's is just a choice of whether you use request to a
pinned context, or direct access if idle (literally ce->pin_count).
Direct access is going to be near zero overhead.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-08-14 14:59   ` Chris Wilson
  2018-08-14 15:11     ` Lionel Landwerlin
@ 2018-08-14 18:44     ` Tvrtko Ursulin
  2018-08-14 18:53       ` Chris Wilson
  1 sibling, 1 reply; 50+ messages in thread
From: Tvrtko Ursulin @ 2018-08-14 18:44 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin


On 14/08/2018 15:59, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-08-14 15:40:58)
>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>
>> We want to allow userspace to reconfigure the subslice configuration for
>> its own use case. To do so, we expose a context parameter to allow
>> adjustment of the RPCS register stored within the context image (and
>> currently not accessible via LRI). If the context is adjusted before
>> first use, the adjustment is for "free"; otherwise if the context is
>> active we flush the context off the GPU (stalling all users) and forcing
>> the GPU to save the context to memory where we can modify it and so
>> ensure that the register is reloaded on next execution.
>>
>> The overhead of managing additional EU subslices can be significant,
>> especially in multi-context workloads. Non-GPGPU contexts should
>> preferably disable the subslices it is not using, and others should
>> fine-tune the number to match their workload.
>>
>> We expose complete control over the RPCS register, allowing
>> configuration of slice/subslice, via masks packed into a u64 for
>> simplicity. For example,
>>
>>          struct drm_i915_gem_context_param arg;
>>          struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
>>                                                          .instance = 0, };
>>
>>          memset(&arg, 0, sizeof(arg));
>>          arg.ctx_id = ctx;
>>          arg.param = I915_CONTEXT_PARAM_SSEU;
>>          arg.value = (uintptr_t) &sseu;
>>          if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>>                  sseu.packed.subslice_mask = 0;
>>
>>                  drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>>          }
>>
>> could be used to disable all subslices where supported.
>>
>> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
>>
>> v3: Add ability to program this per engine (Chris)
>>
>> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
>>
>> v5: Validate sseu configuration against the device's capabilities (Lionel)
>>
>> v6: Change context powergating settings through MI_SDM on kernel context (Chris)
>>
>> v7: Synchronize the requests following a powergating setting change using a global
>>      dependency (Chris)
>>      Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
>>      Disable RPCS configuration setting for non capable users (Lionel/Tvrtko)
>>
>> v8: s/union intel_sseu/struct intel_sseu/ (Lionel)
>>      s/dev_priv/i915/ (Tvrtko)
>>      Change uapi class/instance fields to u16 (Tvrtko)
>>      Bump mask fields to 64bits (Lionel)
>>      Don't return EPERM when dynamic sseu is disabled (Tvrtko)
>>
>> v9: Import context image into kernel context's ppgtt only when
>>      reconfiguring powergated slice/subslices (Chris)
>>      Use aliasing ppgtt when needed (Michel)
>>
>> Tvrtko Ursulin:
>>
>> v10:
>>   * Update for upstream changes.
>>   * Request submit needs a RPM reference.
>>   * Reject on !FULL_PPGTT for simplicity.
>>   * Pull out get/set param to helpers for readability and less indent.
>>   * Use i915_request_await_dma_fence in add_global_barrier to skip waits
>>     on the same timeline and avoid GEM_BUG_ON.
>>   * No need to explicitly assign a NULL pointer to engine in legacy mode.
>>   * No need to move gen8_make_rpcs up.
>>   * Factored out global barrier as prep patch.
>>   * Allow to only CAP_SYS_ADMIN if !Gen11.
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
>> Issue: https://github.com/intel/media-driver/issues/267
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Zhipeng Gong <zhipeng.gong@intel.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_gem_context.c | 187 +++++++++++++++++++++++-
>>   drivers/gpu/drm/i915/intel_lrc.c        |  55 +++++++
>>   drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
>>   include/uapi/drm/i915_drm.h             |  43 ++++++
>>   4 files changed, 288 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
>> index 8a12984e7495..6d6220634e9e 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>> @@ -773,6 +773,91 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>>          return 0;
>>   }
>>   
>> +static int
>> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
>> +                                 struct intel_engine_cs *engine,
>> +                                 struct intel_sseu sseu)
>> +{
>> +       struct drm_i915_private *i915 = ctx->i915;
>> +       struct i915_request *rq;
>> +       struct intel_ring *ring;
>> +       int ret;
>> +
>> +       lockdep_assert_held(&i915->drm.struct_mutex);
>> +
>> +       /* Submitting requests etc needs the hw awake. */
>> +       intel_runtime_pm_get(i915);
>> +
>> +       i915_retire_requests(i915);
> 
> ?

I wondered myself but did not make myself dig through all the history of 
the series. Cannot think that it does anything useful in the current design.

>> +
>> +       /* Now use the RCS to actually reconfigure. */
>> +       engine = i915->engine[RCS];
> 
> ? Modifying registers stored in another engine's context image.

Well, I was hoping design was kind of agreed by now.

>> +
>> +       rq = i915_request_alloc(engine, i915->kernel_context);
>> +       if (IS_ERR(rq)) {
>> +               ret = PTR_ERR(rq);
>> +               goto out_put;
>> +       }
>> +
>> +       ret = engine->emit_rpcs_config(rq, ctx, sseu);
> 
> It's just an LRI, I'd rather we do it directly unless there's evidence
> that there will be na explicit rpcs config instruction in future. It
> just doesn't seem general enough.

No complaints, can do.

>> +       if (ret)
>> +               goto out_add;
>> +
>> +       /* Queue this switch after all other activity */
> 
> Only needs to be after the target ctx.

True, so find just the timeline belonging to target context. Some 
backpointers would be needed to find it. Or a walk and compare against 
target ce->ring->timeline->fence_context. Sounds like more than the 
latter isn't justified for this use case.

> 
>> +       list_for_each_entry(ring, &i915->gt.active_rings, active_link) {
>> +               struct i915_request *prev;
>> +
>> +               prev = last_request_on_engine(ring->timeline, engine);
> 
> As constructed above you need target-engine + RCS.

Target engine is always RCS. Looks like the engine param to this 
function is pointless.

> 
>> +               if (prev)
>> +                       i915_sw_fence_await_sw_fence_gfp(&rq->submit,
>> +                                                        &prev->submit,
>> +                                                        I915_FENCE_GFP);
>> +       }
>> +
>> +       i915_gem_set_global_barrier(i915, rq);
> 
> This is just for a link from ctx-engine to this rq. Overkill much?
> Presumably this stems from using the wrong engine.

AFAIU this is to ensure target context cannot get ahead of this request. 
Without it could overtake and then there is no guarantee execbufs 
following set param will be with new SSEU configuration.
> 
>> +
>> +out_add:
>> +       i915_request_add(rq);
> 
> And I'd still recommend not using indirect access if we can apply the
> changes immediately.

Direct (CPU) access means blocking in set param until the context is 
idle. AFAIR in one of the earlier version GPU idling approach was used 
but I thought that is very dangerous API. Can we somehow take just one 
context out of circulation? Still have to idle since it could be deep in 
the queue. The current approach is nicely async. And I don't like having 
two paths - direct if unpinned, or via GPU if pinned. In fact it would 
even be a third path since normally it happens via init_reg_state of 
pristine contexts.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-08-14 16:05         ` Lionel Landwerlin
@ 2018-08-14 16:09           ` Lionel Landwerlin
  0 siblings, 0 replies; 50+ messages in thread
From: Lionel Landwerlin @ 2018-08-14 16:09 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin

On 14/08/18 17:05, Lionel Landwerlin wrote:
> On 14/08/18 16:18, Chris Wilson wrote:
>> Quoting Lionel Landwerlin (2018-08-14 16:11:36)
>>> On 14/08/18 15:59, Chris Wilson wrote:
>>>> And I'd still recommend not using indirect access if we can apply the
>>>> changes immediately.
>>>> -Chris
>>>>
>>> Hangs on Gen9 :(
>> How does modifying the context image of an idle (unpinned) context cause
>> a hang?
>> -Chris
>>
> I thought you meant a context modifying it's own RPCS register... no?
>
> -
> Lionel

Oh I get it now... Sorry, forget me :)

-
Lionel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-08-14 15:18       ` Chris Wilson
@ 2018-08-14 16:05         ` Lionel Landwerlin
  2018-08-14 16:09           ` Lionel Landwerlin
  0 siblings, 1 reply; 50+ messages in thread
From: Lionel Landwerlin @ 2018-08-14 16:05 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin

On 14/08/18 16:18, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2018-08-14 16:11:36)
>> On 14/08/18 15:59, Chris Wilson wrote:
>>> And I'd still recommend not using indirect access if we can apply the
>>> changes immediately.
>>> -Chris
>>>
>> Hangs on Gen9 :(
> How does modifying the context image of an idle (unpinned) context cause
> a hang?
> -Chris
>
I thought you meant a context modifying it's own RPCS register... no?

-
Lionel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-08-14 14:40 ` [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace Tvrtko Ursulin
  2018-08-14 14:59   ` Chris Wilson
@ 2018-08-14 15:22   ` Chris Wilson
  2018-08-15 11:51     ` Tvrtko Ursulin
  1 sibling, 1 reply; 50+ messages in thread
From: Chris Wilson @ 2018-08-14 15:22 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-08-14 15:40:58)
> +static int
> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
> +                                 struct intel_engine_cs *engine,
> +                                 struct intel_sseu sseu)
> +{
> +       struct drm_i915_private *i915 = ctx->i915;
> +       struct i915_request *rq;
> +       struct intel_ring *ring;
> +       int ret;
> +
> +       lockdep_assert_held(&i915->drm.struct_mutex);
> +
> +       /* Submitting requests etc needs the hw awake. */
> +       intel_runtime_pm_get(i915);
> +
> +       i915_retire_requests(i915);
> +
> +       /* Now use the RCS to actually reconfigure. */
> +       engine = i915->engine[RCS];
> +
> +       rq = i915_request_alloc(engine, i915->kernel_context);
> +       if (IS_ERR(rq)) {
> +               ret = PTR_ERR(rq);
> +               goto out_put;
> +       }
> +
> +       ret = engine->emit_rpcs_config(rq, ctx, sseu);
> +       if (ret)
> +               goto out_add;
> +
> +       /* Queue this switch after all other activity */
> +       list_for_each_entry(ring, &i915->gt.active_rings, active_link) {
> +               struct i915_request *prev;
> +
> +               prev = last_request_on_engine(ring->timeline, engine);
> +               if (prev)
> +                       i915_sw_fence_await_sw_fence_gfp(&rq->submit,
> +                                                        &prev->submit,
> +                                                        I915_FENCE_GFP);
> +       }
> +
> +       i915_gem_set_global_barrier(i915, rq);
> +
> +out_add:
> +       i915_request_add(rq);
> +out_put:
> +       intel_runtime_pm_put(i915);
> +
> +       return ret;

Looks like we should be able to hook this up to a selftest to confirm
the modification does land in the target context image, and a SRM to
confirm it loaded.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-08-14 15:11     ` Lionel Landwerlin
@ 2018-08-14 15:18       ` Chris Wilson
  2018-08-14 16:05         ` Lionel Landwerlin
  0 siblings, 1 reply; 50+ messages in thread
From: Chris Wilson @ 2018-08-14 15:18 UTC (permalink / raw)
  To: Intel-gfx, Lionel Landwerlin, Tvrtko Ursulin

Quoting Lionel Landwerlin (2018-08-14 16:11:36)
> On 14/08/18 15:59, Chris Wilson wrote:
> > And I'd still recommend not using indirect access if we can apply the
> > changes immediately.
> > -Chris
> >
> 
> Hangs on Gen9 :(

How does modifying the context image of an idle (unpinned) context cause
a hang?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-08-14 14:59   ` Chris Wilson
@ 2018-08-14 15:11     ` Lionel Landwerlin
  2018-08-14 15:18       ` Chris Wilson
  2018-08-14 18:44     ` Tvrtko Ursulin
  1 sibling, 1 reply; 50+ messages in thread
From: Lionel Landwerlin @ 2018-08-14 15:11 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin

On 14/08/18 15:59, Chris Wilson wrote:
> And I'd still recommend not using indirect access if we can apply the
> changes immediately.
> -Chris
>

Hangs on Gen9 :(

-
Lionel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-08-14 14:40 ` [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace Tvrtko Ursulin
@ 2018-08-14 14:59   ` Chris Wilson
  2018-08-14 15:11     ` Lionel Landwerlin
  2018-08-14 18:44     ` Tvrtko Ursulin
  2018-08-14 15:22   ` Chris Wilson
  1 sibling, 2 replies; 50+ messages in thread
From: Chris Wilson @ 2018-08-14 14:59 UTC (permalink / raw)
  To: Intel-gfx, Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-08-14 15:40:58)
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> We want to allow userspace to reconfigure the subslice configuration for
> its own use case. To do so, we expose a context parameter to allow
> adjustment of the RPCS register stored within the context image (and
> currently not accessible via LRI). If the context is adjusted before
> first use, the adjustment is for "free"; otherwise if the context is
> active we flush the context off the GPU (stalling all users) and forcing
> the GPU to save the context to memory where we can modify it and so
> ensure that the register is reloaded on next execution.
> 
> The overhead of managing additional EU subslices can be significant,
> especially in multi-context workloads. Non-GPGPU contexts should
> preferably disable the subslices it is not using, and others should
> fine-tune the number to match their workload.
> 
> We expose complete control over the RPCS register, allowing
> configuration of slice/subslice, via masks packed into a u64 for
> simplicity. For example,
> 
>         struct drm_i915_gem_context_param arg;
>         struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
>                                                         .instance = 0, };
> 
>         memset(&arg, 0, sizeof(arg));
>         arg.ctx_id = ctx;
>         arg.param = I915_CONTEXT_PARAM_SSEU;
>         arg.value = (uintptr_t) &sseu;
>         if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
>                 sseu.packed.subslice_mask = 0;
> 
>                 drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
>         }
> 
> could be used to disable all subslices where supported.
> 
> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)
> 
> v3: Add ability to program this per engine (Chris)
> 
> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
> 
> v5: Validate sseu configuration against the device's capabilities (Lionel)
> 
> v6: Change context powergating settings through MI_SDM on kernel context (Chris)
> 
> v7: Synchronize the requests following a powergating setting change using a global
>     dependency (Chris)
>     Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
>     Disable RPCS configuration setting for non capable users (Lionel/Tvrtko)
> 
> v8: s/union intel_sseu/struct intel_sseu/ (Lionel)
>     s/dev_priv/i915/ (Tvrtko)
>     Change uapi class/instance fields to u16 (Tvrtko)
>     Bump mask fields to 64bits (Lionel)
>     Don't return EPERM when dynamic sseu is disabled (Tvrtko)
> 
> v9: Import context image into kernel context's ppgtt only when
>     reconfiguring powergated slice/subslices (Chris)
>     Use aliasing ppgtt when needed (Michel)
> 
> Tvrtko Ursulin:
> 
> v10:
>  * Update for upstream changes.
>  * Request submit needs a RPM reference.
>  * Reject on !FULL_PPGTT for simplicity.
>  * Pull out get/set param to helpers for readability and less indent.
>  * Use i915_request_await_dma_fence in add_global_barrier to skip waits
>    on the same timeline and avoid GEM_BUG_ON.
>  * No need to explicitly assign a NULL pointer to engine in legacy mode.
>  * No need to move gen8_make_rpcs up.
>  * Factored out global barrier as prep patch.
>  * Allow to only CAP_SYS_ADMIN if !Gen11.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> Issue: https://github.com/intel/media-driver/issues/267
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Zhipeng Gong <zhipeng.gong@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 187 +++++++++++++++++++++++-
>  drivers/gpu/drm/i915/intel_lrc.c        |  55 +++++++
>  drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
>  include/uapi/drm/i915_drm.h             |  43 ++++++
>  4 files changed, 288 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 8a12984e7495..6d6220634e9e 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -773,6 +773,91 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>         return 0;
>  }
>  
> +static int
> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
> +                                 struct intel_engine_cs *engine,
> +                                 struct intel_sseu sseu)
> +{
> +       struct drm_i915_private *i915 = ctx->i915;
> +       struct i915_request *rq;
> +       struct intel_ring *ring;
> +       int ret;
> +
> +       lockdep_assert_held(&i915->drm.struct_mutex);
> +
> +       /* Submitting requests etc needs the hw awake. */
> +       intel_runtime_pm_get(i915);
> +
> +       i915_retire_requests(i915);

?

> +
> +       /* Now use the RCS to actually reconfigure. */
> +       engine = i915->engine[RCS];

? Modifying registers stored in another engine's context image.

> +
> +       rq = i915_request_alloc(engine, i915->kernel_context);
> +       if (IS_ERR(rq)) {
> +               ret = PTR_ERR(rq);
> +               goto out_put;
> +       }
> +
> +       ret = engine->emit_rpcs_config(rq, ctx, sseu);

It's just an LRI, I'd rather we do it directly unless there's evidence
that there will be na explicit rpcs config instruction in future. It
just doesn't seem general enough.

> +       if (ret)
> +               goto out_add;
> +
> +       /* Queue this switch after all other activity */

Only needs to be after the target ctx.

> +       list_for_each_entry(ring, &i915->gt.active_rings, active_link) {
> +               struct i915_request *prev;
> +
> +               prev = last_request_on_engine(ring->timeline, engine);

As constructed above you need target-engine + RCS.

> +               if (prev)
> +                       i915_sw_fence_await_sw_fence_gfp(&rq->submit,
> +                                                        &prev->submit,
> +                                                        I915_FENCE_GFP);
> +       }
> +
> +       i915_gem_set_global_barrier(i915, rq);

This is just for a link from ctx-engine to this rq. Overkill much?
Presumably this stems from using the wrong engine.

> +
> +out_add:
> +       i915_request_add(rq);

And I'd still recommend not using indirect access if we can apply the
changes immediately.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-08-14 14:40 [PATCH v10 0/8] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
@ 2018-08-14 14:40 ` Tvrtko Ursulin
  2018-08-14 14:59   ` Chris Wilson
  2018-08-14 15:22   ` Chris Wilson
  0 siblings, 2 replies; 50+ messages in thread
From: Tvrtko Ursulin @ 2018-08-14 14:40 UTC (permalink / raw)
  To: Intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

	struct drm_i915_gem_context_param arg;
	struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
	                                                .instance = 0, };

	memset(&arg, 0, sizeof(arg));
	arg.ctx_id = ctx;
	arg.param = I915_CONTEXT_PARAM_SSEU;
	arg.value = (uintptr_t) &sseu;
	if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
		sseu.packed.subslice_mask = 0;

		drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
	}

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)

v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

v5: Validate sseu configuration against the device's capabilities (Lionel)

v6: Change context powergating settings through MI_SDM on kernel context (Chris)

v7: Synchronize the requests following a powergating setting change using a global
    dependency (Chris)
    Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
    Disable RPCS configuration setting for non capable users (Lionel/Tvrtko)

v8: s/union intel_sseu/struct intel_sseu/ (Lionel)
    s/dev_priv/i915/ (Tvrtko)
    Change uapi class/instance fields to u16 (Tvrtko)
    Bump mask fields to 64bits (Lionel)
    Don't return EPERM when dynamic sseu is disabled (Tvrtko)

v9: Import context image into kernel context's ppgtt only when
    reconfiguring powergated slice/subslices (Chris)
    Use aliasing ppgtt when needed (Michel)

Tvrtko Ursulin:

v10:
 * Update for upstream changes.
 * Request submit needs a RPM reference.
 * Reject on !FULL_PPGTT for simplicity.
 * Pull out get/set param to helpers for readability and less indent.
 * Use i915_request_await_dma_fence in add_global_barrier to skip waits
   on the same timeline and avoid GEM_BUG_ON.
 * No need to explicitly assign a NULL pointer to engine in legacy mode.
 * No need to move gen8_make_rpcs up.
 * Factored out global barrier as prep patch.
 * Allow to only CAP_SYS_ADMIN if !Gen11.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Issue: https://github.com/intel/media-driver/issues/267
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Zhipeng Gong <zhipeng.gong@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 187 +++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_lrc.c        |  55 +++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
 include/uapi/drm/i915_drm.h             |  43 ++++++
 4 files changed, 288 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 8a12984e7495..6d6220634e9e 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -773,6 +773,91 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
 	return 0;
 }
 
+static int
+i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
+				  struct intel_engine_cs *engine,
+				  struct intel_sseu sseu)
+{
+	struct drm_i915_private *i915 = ctx->i915;
+	struct i915_request *rq;
+	struct intel_ring *ring;
+	int ret;
+
+	lockdep_assert_held(&i915->drm.struct_mutex);
+
+	/* Submitting requests etc needs the hw awake. */
+	intel_runtime_pm_get(i915);
+
+	i915_retire_requests(i915);
+
+	/* Now use the RCS to actually reconfigure. */
+	engine = i915->engine[RCS];
+
+	rq = i915_request_alloc(engine, i915->kernel_context);
+	if (IS_ERR(rq)) {
+		ret = PTR_ERR(rq);
+		goto out_put;
+	}
+
+	ret = engine->emit_rpcs_config(rq, ctx, sseu);
+	if (ret)
+		goto out_add;
+
+	/* Queue this switch after all other activity */
+	list_for_each_entry(ring, &i915->gt.active_rings, active_link) {
+		struct i915_request *prev;
+
+		prev = last_request_on_engine(ring->timeline, engine);
+		if (prev)
+			i915_sw_fence_await_sw_fence_gfp(&rq->submit,
+							 &prev->submit,
+							 I915_FENCE_GFP);
+	}
+
+	i915_gem_set_global_barrier(i915, rq);
+
+out_add:
+	i915_request_add(rq);
+out_put:
+	intel_runtime_pm_put(i915);
+
+	return ret;
+}
+
+static int get_sseu(struct i915_gem_context *ctx,
+		    struct drm_i915_gem_context_param *args)
+{
+	struct drm_i915_gem_context_param_sseu user_sseu;
+	struct intel_engine_cs *engine;
+	struct intel_context *ce;
+
+	if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
+			   sizeof(user_sseu)))
+		return -EFAULT;
+
+	if (user_sseu.rsvd1 || user_sseu.rsvd2)
+		return -EINVAL;
+
+	engine = intel_engine_lookup_user(ctx->i915,
+					  user_sseu.class,
+					  user_sseu.instance);
+	if (!engine)
+		return -EINVAL;
+
+	ce = to_intel_context(ctx, engine);
+
+	user_sseu.slice_mask = ce->sseu.slice_mask;
+	user_sseu.subslice_mask = ce->sseu.subslice_mask;
+	user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
+	user_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
+
+	if (copy_to_user(u64_to_user_ptr(args->value), &user_sseu,
+			 sizeof(user_sseu)))
+		return -EFAULT;
+
+	return 0;
+}
+
 int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 				    struct drm_file *file)
 {
@@ -810,6 +895,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_CONTEXT_PARAM_PRIORITY:
 		args->value = ctx->sched.priority;
 		break;
+	case I915_CONTEXT_PARAM_SSEU:
+		ret = get_sseu(ctx, args);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
@@ -819,6 +907,101 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	return ret;
 }
 
+static int
+__user_to_context_sseu(const struct sseu_dev_info *device,
+		       const struct drm_i915_gem_context_param_sseu *user,
+		       struct intel_sseu *context)
+{
+	/* No zeros in any field. */
+	if (!user->slice_mask || !user->subslice_mask ||
+	    !user->min_eus_per_subslice || !user->max_eus_per_subslice)
+		return -EINVAL;
+
+	/* Max > min. */
+	if (user->max_eus_per_subslice < user->min_eus_per_subslice)
+		return -EINVAL;
+
+	/* Check validity against hardware. */
+	if (user->slice_mask & ~device->slice_mask)
+		return -EINVAL;
+
+	if (user->subslice_mask & ~device->subslice_mask[0])
+		return -EINVAL;
+
+	if (user->max_eus_per_subslice > device->max_eus_per_subslice)
+		return -EINVAL;
+
+	context->slice_mask = user->slice_mask;
+	context->subslice_mask = user->subslice_mask;
+	context->min_eus_per_subslice = user->min_eus_per_subslice;
+	context->max_eus_per_subslice = user->max_eus_per_subslice;
+
+	return 0;
+}
+
+static int set_sseu(struct i915_gem_context *ctx,
+		    struct drm_i915_gem_context_param *args)
+{
+	struct drm_i915_private *i915 = ctx->i915;
+	struct drm_i915_gem_context_param_sseu user_sseu;
+	struct intel_engine_cs *engine;
+	struct intel_sseu ctx_sseu;
+	struct intel_context *ce;
+	enum intel_engine_id id;
+	int ret;
+
+	if (args->size)
+		return -EINVAL;
+
+	if (!USES_FULL_PPGTT(i915))
+		return -ENODEV;
+
+	if (!IS_GEN11(i915) && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
+			   sizeof(user_sseu)))
+		return -EFAULT;
+
+	if (user_sseu.rsvd1 || user_sseu.rsvd2)
+		return -EINVAL;
+
+	engine = intel_engine_lookup_user(i915,
+					  user_sseu.class,
+					  user_sseu.instance);
+	if (!engine)
+		return -EINVAL;
+
+	if (!engine->emit_rpcs_config)
+		return -ENODEV;
+
+	ret = __user_to_context_sseu(&INTEL_INFO(i915)->sseu, &user_sseu,
+				     &ctx_sseu);
+	if (ret)
+		return ret;
+
+	ce = to_intel_context(ctx, engine);
+
+	/* Nothing to do if unmodified. */
+	if (!memcmp(&ce->sseu, &ctx_sseu, sizeof(ctx_sseu)))
+		return 0;
+
+	ret = i915_gem_context_reconfigure_sseu(ctx, engine, ctx_sseu);
+	if (ret)
+		return ret;
+
+	/*
+	 * Copy the configuration to all engines. Our hardware doesn't
+	 * currently support different configurations for each engine.
+	 */
+	for_each_engine(engine, i915, id) {
+		ce = to_intel_context(ctx, engine);
+		ce->sseu = ctx_sseu;
+	}
+
+	return 0;
+}
+
 int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 				    struct drm_file *file)
 {
@@ -884,7 +1067,9 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 				ctx->sched.priority = priority;
 		}
 		break;
-
+	case I915_CONTEXT_PARAM_SSEU:
+		ret = set_sseu(ctx, args);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 8a2997be7ef7..0f780c666e98 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2232,6 +2232,60 @@ static void gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs)
 }
 static const int gen8_emit_breadcrumb_rcs_sz = 8 + WA_TAIL_DWORDS;
 
+static int gen8_emit_rpcs_config(struct i915_request *rq,
+				 struct i915_gem_context *ctx,
+				 struct intel_sseu sseu)
+{
+	struct drm_i915_private *i915 = rq->i915;
+	struct intel_context *ce = to_intel_context(ctx, i915->engine[RCS]);
+	struct i915_vma *vma;
+	u64 offset;
+	u32 *cs;
+	int err;
+
+	/* Let the deferred state allocation take care of this. */
+	if (!ce->state)
+		return 0;
+
+	vma = i915_vma_instance(ce->state->obj,
+				&i915->kernel_context->ppgtt->vm,
+				NULL);
+	if (IS_ERR(vma))
+		return PTR_ERR(vma);
+
+	err = i915_vma_pin(vma, 0, 0, PIN_USER);
+	if (err) {
+		i915_vma_close(vma);
+		return err;
+	}
+
+	err = i915_vma_move_to_active(vma, rq, EXEC_OBJECT_WRITE);
+	if (unlikely(err)) {
+		i915_vma_close(vma);
+		return err;
+	}
+
+	i915_vma_unpin(vma);
+
+	cs = intel_ring_begin(rq, 4);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	offset = vma->node.start +
+		LRC_STATE_PN * PAGE_SIZE +
+		(CTX_R_PWR_CLK_STATE + 1) * 4;
+
+	*cs++ = MI_STORE_DWORD_IMM_GEN4;
+	*cs++ = lower_32_bits(offset);
+	*cs++ = upper_32_bits(offset);
+	*cs++ = gen8_make_rpcs(&INTEL_INFO(i915)->sseu,
+			       intel_engine_prepare_sseu(rq->engine, sseu));
+
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
 static int gen8_init_rcs_context(struct i915_request *rq)
 {
 	int ret;
@@ -2324,6 +2378,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 	engine->emit_flush = gen8_emit_flush;
 	engine->emit_breadcrumb = gen8_emit_breadcrumb;
 	engine->emit_breadcrumb_sz = gen8_emit_breadcrumb_sz;
+	engine->emit_rpcs_config = gen8_emit_rpcs_config;
 
 	engine->set_default_submission = intel_execlists_set_default_submission;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 9090885d57de..acb8b6fe912a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -477,6 +477,10 @@ struct intel_engine_cs {
 	void		(*emit_breadcrumb)(struct i915_request *rq, u32 *cs);
 	int		emit_breadcrumb_sz;
 
+	int		(*emit_rpcs_config)(struct i915_request *rq,
+					    struct i915_gem_context *ctx,
+					    struct intel_sseu sseu);
+
 	/* Pass the request to the hardware queue (e.g. directly into
 	 * the legacy ringbuffer or to the end of an execlist).
 	 *
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index a4446f452040..e195c38b15a6 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1478,9 +1478,52 @@ struct drm_i915_gem_context_param {
 #define   I915_CONTEXT_MAX_USER_PRIORITY	1023 /* inclusive */
 #define   I915_CONTEXT_DEFAULT_PRIORITY		0
 #define   I915_CONTEXT_MIN_USER_PRIORITY	-1023 /* inclusive */
+	/*
+	 * When using the following param, value should be a pointer to
+	 * drm_i915_gem_context_param_sseu.
+	 */
+#define I915_CONTEXT_PARAM_SSEU		0x7
 	__u64 value;
 };
 
+struct drm_i915_gem_context_param_sseu {
+	/*
+	 * Engine class & instance to be configured or queried.
+	 */
+	__u16 class;
+	__u16 instance;
+
+	/*
+	 * Unused for now. Must be cleared to zero.
+	 */
+	__u32 rsvd1;
+
+	/*
+	 * Mask of slices to enable for the context. Valid values are a subset
+	 * of the bitmask value returned for I915_PARAM_SLICE_MASK.
+	 */
+	__u64 slice_mask;
+
+	/*
+	 * Mask of subslices to enable for the context. Valid values are a
+	 * subset of the bitmask value return by I915_PARAM_SUBSLICE_MASK.
+	 */
+	__u64 subslice_mask;
+
+	/*
+	 * Minimum/Maximum number of EUs to enable per subslice for the
+	 * context. min_eus_per_subslice must be inferior or equal to
+	 * max_eus_per_subslice.
+	 */
+	__u16 min_eus_per_subslice;
+	__u16 max_eus_per_subslice;
+
+	/*
+	 * Unused for now. Must be cleared to zero.
+	 */
+	__u32 rsvd2;
+};
+
 enum drm_i915_oa_format {
 	I915_OA_FORMAT_A13 = 1,	    /* HSW only */
 	I915_OA_FORMAT_A29,	    /* HSW only */
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2018-08-15 11:57 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-25 11:45 [PATCH 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
2018-04-25 11:45 ` [PATCH 1/8] drm/i915: expose helper mapping exec flag engine to intel_engine_cs Lionel Landwerlin
2018-04-25 11:50   ` Chris Wilson
2018-04-30 14:37     ` Lionel Landwerlin
2018-05-01 11:13       ` Chris Wilson
2018-05-03 17:12       ` Tvrtko Ursulin
2018-05-03 17:31         ` Lionel Landwerlin
2018-05-03 18:00           ` Tvrtko Ursulin
2018-05-03 20:09             ` Lionel Landwerlin
2018-05-03 20:15               ` Chris Wilson
2018-04-25 11:45 ` [PATCH 2/8] drm/i915: Program RPCS for Broadwell Lionel Landwerlin
2018-04-25 11:45 ` [PATCH 3/8] drm/i915: don't specify pinned size for wa_bb pin/allocation Lionel Landwerlin
2018-04-25 11:52   ` Chris Wilson
2018-04-25 11:45 ` [PATCH 4/8] drm/i915: extract per-ctx/indirect bb programming Lionel Landwerlin
2018-04-25 11:45 ` [PATCH 5/8] drm/i915: pass wa_ctx as argument Lionel Landwerlin
2018-04-25 11:45 ` [PATCH 6/8] drm/i915: reprogram NOA muxes on context switch when using perf Lionel Landwerlin
2018-04-25 11:57   ` Chris Wilson
2018-04-25 13:23     ` Chris Wilson
2018-04-25 14:35     ` Lionel Landwerlin
2018-04-25 11:45 ` [PATCH 7/8] drm/i915: Record the sseu configuration per-context & engine Lionel Landwerlin
2018-04-25 11:45 ` [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
2018-04-26 10:00   ` Joonas Lahtinen
2018-04-26 10:22     ` Lionel Landwerlin
2018-05-03 16:04       ` Joonas Lahtinen
2018-05-03 16:14         ` Chris Wilson
2018-05-03 16:25         ` Lionel Landwerlin
2018-05-03 16:30         ` Tvrtko Ursulin
2018-05-03 17:18   ` Tvrtko Ursulin
2018-05-04 16:25     ` Lionel Landwerlin
2018-05-08  4:04       ` Rogozhkin, Dmitry V
2018-05-08  8:24         ` Tvrtko Ursulin
2018-05-08 16:00           ` Rogozhkin, Dmitry V
2018-05-08 20:56     ` Chris Wilson
2018-05-09 15:35       ` Lionel Landwerlin
2018-04-25 12:34 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: per context slice/subslice powergating Patchwork
2018-04-25 12:36 ` ✗ Fi.CI.SPARSE: " Patchwork
2018-04-25 12:49 ` ✓ Fi.CI.BAT: success " Patchwork
2018-04-25 15:39 ` ✗ Fi.CI.IGT: failure " Patchwork
2018-08-14 14:40 [PATCH v10 0/8] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
2018-08-14 14:40 ` [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace Tvrtko Ursulin
2018-08-14 14:59   ` Chris Wilson
2018-08-14 15:11     ` Lionel Landwerlin
2018-08-14 15:18       ` Chris Wilson
2018-08-14 16:05         ` Lionel Landwerlin
2018-08-14 16:09           ` Lionel Landwerlin
2018-08-14 18:44     ` Tvrtko Ursulin
2018-08-14 18:53       ` Chris Wilson
2018-08-15  9:12         ` Tvrtko Ursulin
2018-08-14 15:22   ` Chris Wilson
2018-08-15 11:51     ` Tvrtko Ursulin
2018-08-15 11:56       ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.