All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/8] drm/i915: per context slice/subslice powergating
@ 2018-05-09 17:48 Lionel Landwerlin
  2018-05-09 17:48 ` [PATCH v4 1/8] drm/i915: Program RPCS for Broadwell Lionel Landwerlin
                   ` (8 more replies)
  0 siblings, 9 replies; 11+ messages in thread
From: Lionel Landwerlin @ 2018-05-09 17:48 UTC (permalink / raw)
  To: intel-gfx

Hi all,

Another update following Chris' review (Thanks!).

A few more IGT tests to come to verify that interaction with perf.

Cheers,

Chris Wilson (3):
  drm/i915: Program RPCS for Broadwell
  drm/i915: Record the sseu configuration per-context & engine
  drm/i915: Expose RPCS (SSEU) configuration to userspace

Lionel Landwerlin (5):
  drm/i915/perf: simplify configure all context function
  drm/i915: add new pipe control helper for mmio writes
  drm/i915: give engine to execlists cancel helper
  drm/i915: reprogram NOA muxes on context switch when using perf
  drm/i915: count powergating transitions per engine

 drivers/gpu/drm/i915/i915_gem_context.c     | 173 ++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_context.h     |   3 +
 drivers/gpu/drm/i915/i915_perf.c            | 146 ++++++++++++-
 drivers/gpu/drm/i915/i915_request.c         |   2 +
 drivers/gpu/drm/i915/i915_request.h         |  24 +++
 drivers/gpu/drm/i915/intel_engine_cs.c      |   5 +
 drivers/gpu/drm/i915/intel_guc_submission.c |   2 +-
 drivers/gpu/drm/i915/intel_lrc.c            | 216 +++++++++++++++-----
 drivers/gpu/drm/i915/intel_ringbuffer.c     |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h     |  31 ++-
 include/uapi/drm/i915_drm.h                 |  38 ++++
 11 files changed, 583 insertions(+), 59 deletions(-)

--
2.17.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 1/8] drm/i915: Program RPCS for Broadwell
  2018-05-09 17:48 [PATCH v4 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
@ 2018-05-09 17:48 ` Lionel Landwerlin
  2018-05-09 17:48 ` [PATCH v4 2/8] drm/i915: Record the sseu configuration per-context & engine Lionel Landwerlin
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Lionel Landwerlin @ 2018-05-09 17:48 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

Currently we only configure the power gating for Skylake and above, but
the configuration should equally apply to Broadwell and Braswell. Even
though, there is not as much variation as for later generations, we want
to expose control over the configuration to userspace and may want to
opt out of the "always-enabled" setting.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 7f98dda3c929..1bc35de215ae 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2394,13 +2394,6 @@ make_rpcs(struct drm_i915_private *dev_priv)
 {
 	u32 rpcs = 0;
 
-	/*
-	 * No explicit RPCS request is needed to ensure full
-	 * slice/subslice/EU enablement prior to Gen9.
-	*/
-	if (INTEL_GEN(dev_priv) < 9)
-		return 0;
-
 	/*
 	 * Starting in Gen9, render power gating can leave
 	 * slice/subslice/EU in a partially enabled state. We
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 2/8] drm/i915: Record the sseu configuration per-context & engine
  2018-05-09 17:48 [PATCH v4 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
  2018-05-09 17:48 ` [PATCH v4 1/8] drm/i915: Program RPCS for Broadwell Lionel Landwerlin
@ 2018-05-09 17:48 ` Lionel Landwerlin
  2018-05-09 17:48 ` [PATCH v4 3/8] drm/i915/perf: simplify configure all context function Lionel Landwerlin
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Lionel Landwerlin @ 2018-05-09 17:48 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

We want to expose the ability to reconfigure the slices, subslice and
eu per context and per engine. To facilitate that, store the current
configuration on the context for each engine, which is initially set
to the device default upon creation.

v2: record sseu configuration per context & engine (Chris)

v3: introduce the i915_gem_context_sseu to store powergating
    programming, sseu_dev_info has grown quite a bit (Lionel)

v4: rename i915_gem_sseu into intel_sseu (Chris)
    use to_intel_context() (Chris)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 22 ++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_context.h |  3 +++
 drivers/gpu/drm/i915/i915_request.h     | 13 +++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        | 24 ++++++++++++------------
 4 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 33f8a4b3c981..a04f0329e85a 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -261,11 +261,26 @@ static u32 default_desc_template(const struct drm_i915_private *i915,
 	return desc;
 }
 
+static union intel_sseu
+intel_sseu_from_device_sseu(const struct sseu_dev_info *sseu)
+{
+	union intel_sseu value = {
+		.slice_mask = sseu->slice_mask,
+		.subslice_mask = sseu->subslice_mask[0],
+		.min_eus_per_subslice = sseu->max_eus_per_subslice,
+		.max_eus_per_subslice = sseu->max_eus_per_subslice,
+	};
+
+	return value;
+}
+
 static struct i915_gem_context *
 __create_hw_context(struct drm_i915_private *dev_priv,
 		    struct drm_i915_file_private *file_priv)
 {
 	struct i915_gem_context *ctx;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
 	int ret;
 
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
@@ -314,6 +329,13 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 	 * is no remap info, it will be a NOP. */
 	ctx->remap_slice = ALL_L3_SLICES(dev_priv);
 
+	/* On all engines, use the whole device by default */
+	for_each_engine(engine, dev_priv, id) {
+		struct intel_context *ce = to_intel_context(ctx, engine);
+
+		ce->sseu = intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu);
+	}
+
 	i915_gem_context_set_bannable(ctx);
 	ctx->ring_size = 4 * PAGE_SIZE;
 	ctx->desc_template =
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index ace3b129c189..cb9d93d29c64 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -30,6 +30,7 @@
 #include <linux/radix-tree.h>
 
 #include "i915_gem.h"
+#include "intel_device_info.h"
 
 struct pid;
 
@@ -149,6 +150,8 @@ struct i915_gem_context {
 		u32 *lrc_reg_state;
 		u64 lrc_desc;
 		int pin_count;
+		/** sseu: Control eu/slice partitioning */
+		union intel_sseu sseu;
 	} __engine[I915_NUM_ENGINES];
 
 	/** ring_size: size for allocating the per-engine ring buffer */
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index eddbd4245cb3..beb312ac9aa0 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -39,6 +39,19 @@ struct drm_i915_gem_object;
 struct i915_request;
 struct i915_timeline;
 
+/*
+ * Powergating configuration for a particular (context,engine).
+ */
+union intel_sseu {
+	struct {
+		u8 slice_mask;
+		u8 subslice_mask;
+		u8 min_eus_per_subslice;
+		u8 max_eus_per_subslice;
+	};
+	u64 value;
+};
+
 struct intel_wait {
 	struct rb_node node;
 	struct task_struct *tsk;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 1bc35de215ae..e754e9d112a5 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2389,8 +2389,8 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
 	return logical_ring_init(engine);
 }
 
-static u32
-make_rpcs(struct drm_i915_private *dev_priv)
+static u32 make_rpcs(const struct sseu_dev_info *sseu,
+		     union intel_sseu ctx_sseu)
 {
 	u32 rpcs = 0;
 
@@ -2400,24 +2400,23 @@ make_rpcs(struct drm_i915_private *dev_priv)
 	 * must make an explicit request through RPCS for full
 	 * enablement.
 	*/
-	if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) {
+	if (sseu->has_slice_pg) {
 		rpcs |= GEN8_RPCS_S_CNT_ENABLE;
-		rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) <<
-			GEN8_RPCS_S_CNT_SHIFT;
+		rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
 
-	if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) {
+	if (sseu->has_subslice_pg) {
 		rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
-		rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]) <<
-			GEN8_RPCS_SS_CNT_SHIFT;
+		rpcs |= hweight8(ctx_sseu.subslice_mask) <<
+		        GEN8_RPCS_SS_CNT_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
 
-	if (INTEL_INFO(dev_priv)->sseu.has_eu_pg) {
-		rpcs |= INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
+	if (sseu->has_eu_pg) {
+		rpcs |= ctx_sseu.min_eus_per_subslice <<
 			GEN8_RPCS_EU_MIN_SHIFT;
-		rpcs |= INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
+		rpcs |= ctx_sseu.max_eus_per_subslice <<
 			GEN8_RPCS_EU_MAX_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
@@ -2541,7 +2540,8 @@ static void execlists_init_reg_state(u32 *regs,
 	if (rcs) {
 		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
-			make_rpcs(dev_priv));
+			make_rpcs(&INTEL_INFO(dev_priv)->sseu,
+				  ctx->__engine[engine->id].sseu));
 
 		i915_oa_init_reg_state(engine, ctx, regs);
 	}
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 3/8] drm/i915/perf: simplify configure all context function
  2018-05-09 17:48 [PATCH v4 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
  2018-05-09 17:48 ` [PATCH v4 1/8] drm/i915: Program RPCS for Broadwell Lionel Landwerlin
  2018-05-09 17:48 ` [PATCH v4 2/8] drm/i915: Record the sseu configuration per-context & engine Lionel Landwerlin
@ 2018-05-09 17:48 ` Lionel Landwerlin
  2018-05-09 17:48 ` [PATCH v4 4/8] drm/i915: add new pipe control helper for mmio writes Lionel Landwerlin
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Lionel Landwerlin @ 2018-05-09 17:48 UTC (permalink / raw)
  To: intel-gfx

We don't need any special treatment on error so just return as soon as
possible.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index d9341415df40..5b279a82445a 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1765,7 +1765,7 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 	/* Switch away from any user context. */
 	ret = gen8_switch_to_updated_kernel_context(dev_priv, oa_config);
 	if (ret)
-		goto out;
+		return ret;
 
 	/*
 	 * The OA register config is setup through the context image. This image
@@ -1782,7 +1782,7 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 	 */
 	ret = i915_gem_wait_for_idle(dev_priv, wait_flags);
 	if (ret)
-		goto out;
+		return ret;
 
 	/* Update all contexts now that we've stalled the submission. */
 	list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
@@ -1794,10 +1794,8 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 			continue;
 
 		regs = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
-		if (IS_ERR(regs)) {
-			ret = PTR_ERR(regs);
-			goto out;
-		}
+		if (IS_ERR(regs))
+			return PTR_ERR(regs);
 
 		ce->state->obj->mm.dirty = true;
 		regs += LRC_STATE_PN * PAGE_SIZE / sizeof(*regs);
@@ -1807,7 +1805,6 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 		i915_gem_object_unpin_map(ce->state->obj);
 	}
 
- out:
 	return ret;
 }
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 4/8] drm/i915: add new pipe control helper for mmio writes
  2018-05-09 17:48 [PATCH v4 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (2 preceding siblings ...)
  2018-05-09 17:48 ` [PATCH v4 3/8] drm/i915/perf: simplify configure all context function Lionel Landwerlin
@ 2018-05-09 17:48 ` Lionel Landwerlin
  2018-05-09 17:48 ` [PATCH v4 5/8] drm/i915: give engine to execlists cancel helper Lionel Landwerlin
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Lionel Landwerlin @ 2018-05-09 17:48 UTC (permalink / raw)
  To: intel-gfx

We'll use those helpers in the following commits. It's a good thing to
have them around as they need to apply a particular workaround on
Skylake.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 34 +++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  5 ++++
 2 files changed, 39 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index e754e9d112a5..6fe0d668c023 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2151,6 +2151,40 @@ static void gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs)
 }
 static const int gen8_emit_breadcrumb_rcs_sz = 8 + WA_TAIL_DWORDS;
 
+u32 gen8_lri_pipe_control_len(struct drm_i915_private *dev_priv)
+{
+	return IS_SKYLAKE(dev_priv) ? (7) : 5;
+}
+
+u32 *gen8_emit_lri_pipe_control(struct drm_i915_private *dev_priv,
+				u32 *cs, u32 flags, u32 offset,
+				u32 value)
+{
+	/*
+	 * Project: SKL
+	 *
+	 *  "PIPECONTROL command with "Command Streamer Stall Enable" must be
+	 *  programmed prior to programming a PIPECONTROL command with LRI
+	 *  Post Sync Operation in GPGPU mode of operation (i.e when
+	 *  PIPELINE_SELECT command is set to GPGPU mode of operation)."
+	 *
+	 *  Since the mode of operation is selected from userspace, we apply
+	 *  this workaround all the time one SKL.
+	 */
+	if (IS_SKYLAKE(dev_priv)) {
+		*cs++ = GFX_OP_PIPE_CONTROL(2);
+		*cs++ = PIPE_CONTROL_CS_STALL;
+	}
+
+	*cs++ = GFX_OP_PIPE_CONTROL(5);
+	*cs++ = PIPE_CONTROL_MMIO_WRITE | flags;
+	*cs++ = offset;
+	*cs++ = 0;
+	*cs++ = value;
+
+	return cs;
+}
+
 static int gen8_init_rcs_context(struct i915_request *rq)
 {
 	int ret;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 010750e8ee44..aa643a1d69db 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -1042,6 +1042,11 @@ gen8_emit_ggtt_write(u32 *cs, u32 value, u32 gtt_offset)
 	return cs;
 }
 
+u32 gen8_lri_pipe_control_len(struct drm_i915_private *dev_priv);
+u32 *gen8_emit_lri_pipe_control(struct drm_i915_private *dev_priv,
+				u32 *cs, u32 flags, u32 offset,
+				u32 value);
+
 bool intel_engine_is_idle(struct intel_engine_cs *engine);
 bool intel_engines_are_idle(struct drm_i915_private *dev_priv);
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 5/8] drm/i915: give engine to execlists cancel helper
  2018-05-09 17:48 [PATCH v4 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (3 preceding siblings ...)
  2018-05-09 17:48 ` [PATCH v4 4/8] drm/i915: add new pipe control helper for mmio writes Lionel Landwerlin
@ 2018-05-09 17:48 ` Lionel Landwerlin
  2018-05-09 17:48 ` [PATCH v4 6/8] drm/i915: reprogram NOA muxes on context switch when using perf Lionel Landwerlin
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Lionel Landwerlin @ 2018-05-09 17:48 UTC (permalink / raw)
  To: intel-gfx

We would like to set a value on the associated engine in this helper
in a following commit.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/intel_guc_submission.c |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c            | 10 +++++-----
 drivers/gpu/drm/i915/intel_ringbuffer.h     |  2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 2feb65096966..ef914fc926bb 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -794,7 +794,7 @@ static void guc_submission_tasklet(unsigned long data)
 	if (execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT) &&
 	    intel_read_status_page(engine, I915_GEM_HWS_PREEMPT_INDEX) ==
 	    GUC_PREEMPT_FINISHED) {
-		execlists_cancel_port_requests(&engine->execlists);
+		execlists_cancel_port_requests(engine);
 		execlists_unwind_incomplete_requests(execlists);
 
 		wait_for_guc_preempt_report(engine);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 6fe0d668c023..a608ff0f9e7a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -772,8 +772,9 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 }
 
 void
-execlists_cancel_port_requests(struct intel_engine_execlists * const execlists)
+execlists_cancel_port_requests(struct intel_engine_cs *engine)
 {
+	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct execlist_port *port = execlists->port;
 	unsigned int num_ports = execlists_num_ports(execlists);
 
@@ -904,7 +905,7 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 	local_irq_save(flags);
 
 	/* Cancel the requests on the HW and clear the ELSP tracker. */
-	execlists_cancel_port_requests(execlists);
+	execlists_cancel_port_requests(engine);
 	reset_irq(engine);
 
 	spin_lock(&engine->timeline.lock);
@@ -1063,7 +1064,7 @@ static void execlists_submission_tasklet(unsigned long data)
 			    buf[2*head + 1] == execlists->preempt_complete_status) {
 				GEM_TRACE("%s preempt-idle\n", engine->name);
 
-				execlists_cancel_port_requests(execlists);
+				execlists_cancel_port_requests(engine);
 				execlists_unwind_incomplete_requests(execlists);
 
 				GEM_BUG_ON(!execlists_is_active(execlists,
@@ -1823,7 +1824,6 @@ static int gen9_init_render_ring(struct intel_engine_cs *engine)
 static void reset_common_ring(struct intel_engine_cs *engine,
 			      struct i915_request *request)
 {
-	struct intel_engine_execlists * const execlists = &engine->execlists;
 	unsigned long flags;
 	u32 *regs;
 
@@ -1843,7 +1843,7 @@ static void reset_common_ring(struct intel_engine_cs *engine,
 	 * guessing the missed context-switch events by looking at what
 	 * requests were completed.
 	 */
-	execlists_cancel_port_requests(execlists);
+	execlists_cancel_port_requests(engine);
 	reset_irq(engine);
 
 	/* Push back any incomplete requests for replay after the reset. */
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index aa643a1d69db..1d00cc3cc1a4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -684,7 +684,7 @@ void execlists_user_begin(struct intel_engine_execlists *execlists,
 void execlists_user_end(struct intel_engine_execlists *execlists);
 
 void
-execlists_cancel_port_requests(struct intel_engine_execlists * const execlists);
+execlists_cancel_port_requests(struct intel_engine_cs *engine);
 
 void
 execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists);
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 6/8] drm/i915: reprogram NOA muxes on context switch when using perf
  2018-05-09 17:48 [PATCH v4 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (4 preceding siblings ...)
  2018-05-09 17:48 ` [PATCH v4 5/8] drm/i915: give engine to execlists cancel helper Lionel Landwerlin
@ 2018-05-09 17:48 ` Lionel Landwerlin
  2018-05-09 23:38   ` Lionel Landwerlin
  2018-05-09 17:48 ` [PATCH v4 7/8] drm/i915: count powergating transitions per engine Lionel Landwerlin
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 11+ messages in thread
From: Lionel Landwerlin @ 2018-05-09 17:48 UTC (permalink / raw)
  To: intel-gfx

If some of the contexts submitting workloads to the GPU have been
configured to shutdown slices/subslices, we might loose the NOA
configurations written in the NOA muxes. We need to reprogram them
when we detect a powergating configuration change.

In this change i915/perf is responsible for setting up a reprogramming
batchbuffer which we execute just before the userspace submitted
batchbuffer. We do this while preemption is still disable, only if
needed. The decision to execute this reprogramming batchbuffer is made
when we assign a request to an execlist port.

v2: Only reprogram when detecting configuration changes (Chris/Lionel)

v3: Clear engine sseu tracking on execlists cancel port (Chris)
    Store NOA reprogramming vma on the engine (Chris/Lionel)
    Use PIPECONTROL MMIO write correctly, on the last register write (Chris/Lionel)
    Pin NOA reprogramming vma with PIN_USER only (Chris)
    Program MI_BATCH_BUFFER_START into NOA reprogramming correctly (Chris)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c        | 135 ++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_request.c     |   2 +
 drivers/gpu/drm/i915/i915_request.h     |  11 ++
 drivers/gpu/drm/i915/intel_engine_cs.c  |   2 +
 drivers/gpu/drm/i915/intel_lrc.c        |  57 +++++++++-
 drivers/gpu/drm/i915/intel_ringbuffer.h |  14 +++
 6 files changed, 220 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 5b279a82445a..66a8f296290a 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1691,6 +1691,122 @@ static int gen8_emit_oa_config(struct i915_request *rq,
 	return 0;
 }
 
+#define MAX_LRI_SIZE (125U)
+
+static u32 noa_reprogram_bb_size(struct drm_i915_private *dev_priv,
+				 const struct i915_oa_config *oa_config)
+{
+	u32 n_lri_mux_regs;
+	u32 n_lri;
+
+	/* Very unlikely but possible that we have no muxes to configure. */
+	if (!oa_config->mux_regs_len)
+		return 0;
+
+	n_lri_mux_regs = oa_config->mux_regs_len - 1;
+
+	n_lri = (n_lri_mux_regs / MAX_LRI_SIZE) +
+		(n_lri_mux_regs % MAX_LRI_SIZE) != 0;
+
+	return n_lri * 4 + n_lri_mux_regs * 8 + /* MI_LOAD_REGISTER_IMMs */
+		gen8_lri_pipe_control_len(dev_priv) + /* PIPE_CONTROL */
+		4; /* MI_BATCH_BUFFER_END */
+}
+
+static struct i915_vma *
+alloc_noa_reprogram_bo(struct drm_i915_private *dev_priv,
+		       const struct i915_oa_config *oa_config)
+{
+	struct drm_i915_gem_object *bo;
+	struct i915_vma *vma;
+	u32 buffer_size, pc_flags;
+	u32 *cs;
+	int i, ret, last_reg, n_loaded_regs;
+
+	buffer_size =
+		ALIGN(noa_reprogram_bb_size(dev_priv, oa_config), PAGE_SIZE);
+	if (buffer_size == 0)
+		return NULL;
+
+	bo = i915_gem_object_create(dev_priv, buffer_size);
+	if (IS_ERR(bo)) {
+		DRM_ERROR("Failed to allocate NOA reprogramming buffer\n");
+		ret = PTR_ERR(bo);
+	}
+
+	cs = i915_gem_object_pin_map(bo, I915_MAP_WB);
+	if (IS_ERR(cs)) {
+		ret = PTR_ERR(cs);
+		goto err_unref_bo;
+	}
+
+	n_loaded_regs = 0;
+	last_reg = oa_config->mux_regs_len - 1;
+	for (i = 0; i < last_reg; i++) {
+		if ((n_loaded_regs % MAX_LRI_SIZE) == 0) {
+			u32 n_lri = min(oa_config->mux_regs_len - n_loaded_regs,
+					MAX_LRI_SIZE);
+			*cs++ = MI_LOAD_REGISTER_IMM(n_lri);
+		}
+
+		*cs++ = i915_mmio_reg_offset(oa_config->mux_regs[i].addr);
+		*cs++ = oa_config->mux_regs[i].value;
+		n_loaded_regs++;
+	}
+
+	pc_flags = PIPE_CONTROL_CS_STALL;
+	/*
+	 * Project: PRE-SKL
+	 *
+	 *  Command Streamer Stall Enable:
+	 *
+	 *  "One of the following must also be set:
+	 *     - Render Target Cache Flush Enable
+	 *     - Dpeth Cache Flush Enable
+	 *     - Stall at Pixel Scoreboard
+	 *     - Depth Stall
+	 *     - Post-Sync Operation
+	 *     - DC FlushEnable"
+	 *
+	 *  Since we only do NOA reprogramming on Gen8+, this is the only Gen
+	 *  where we need to apply this.
+	 */
+	if (IS_GEN8(dev_priv, 8))
+		pc_flags |= PIPE_CONTROL_STALL_AT_SCOREBOARD;
+
+	/* Serialize on the last MMIO write. */
+	cs = gen8_emit_lri_pipe_control(dev_priv, cs, pc_flags,
+					i915_mmio_reg_offset(oa_config->mux_regs[last_reg].addr),
+					oa_config->mux_regs[last_reg].value);
+
+	*cs++ = MI_BATCH_BUFFER_END;
+
+	i915_gem_object_unpin_map(bo);
+
+	ret = i915_gem_object_set_to_gtt_domain(bo, false);
+	if (ret)
+		goto err_unref_bo;
+
+	vma = i915_vma_instance(bo, &dev_priv->ggtt.base, NULL);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
+		goto err_unref_bo;
+	}
+
+	ret = i915_vma_pin(vma, 0, 0, PIN_USER);
+	if (ret)
+		goto err_unref_vma;
+
+	return vma;
+
+err_unref_vma:
+	i915_vma_put(vma);
+err_unref_bo:
+	i915_gem_object_put(bo);
+
+	return ERR_PTR(ret);
+}
+
 static int gen8_switch_to_updated_kernel_context(struct drm_i915_private *dev_priv,
 						 const struct i915_oa_config *oa_config)
 {
@@ -1784,6 +1900,25 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 	if (ret)
 		return ret;
 
+	/*
+	 * Powergating configuration changes will loose some of the NOA
+	 * programming. Set a NOA reprogramming BO for the engine to execute
+	 * when a powergating configuration change is detected.
+	 */
+	if (oa_config) {
+		struct i915_vma *reprog_vma =
+			alloc_noa_reprogram_bo(dev_priv, oa_config);
+
+		if (IS_ERR(reprog_vma)) {
+			DRM_DEBUG("Unable to alloc NOA reprogramming BO\n");
+			return ret;
+		}
+		engine->noa_reprogram_vma = reprog_vma;
+	} else {
+		i915_vma_unpin_and_release(&engine->noa_reprogram_vma);
+		engine->noa_reprogram_vma = NULL;
+	}
+
 	/* Update all contexts now that we've stalled the submission. */
 	list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
 		struct intel_context *ce = to_intel_context(ctx, engine);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 8928894dd9c7..dd0b37e0a85c 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -786,6 +786,8 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	rq->capture_list = NULL;
 	rq->waitboost = false;
 
+	rq->sseu = ctx->__engine[engine->id].sseu;
+
 	/*
 	 * Reserve space in the ring buffer for all the commands required to
 	 * eventually emit this request. This is to guarantee that the
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index beb312ac9aa0..b4191d382145 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -162,6 +162,17 @@ struct i915_request {
 	/** Preallocate space in the ring for the emitting the request */
 	u32 reserved_space;
 
+	/*
+	 * Position in the ring batchbuffer to where the i915/perf NOA
+	 * reprogramming can be inserted just before HW submission.
+	 */
+	u32 perf_prog;
+
+	/*
+	 * Powergating configuration associated with this request.
+	 */
+	union intel_sseu sseu;
+
 	/** Batch buffer related to this request if any (used for
 	 * error state dump only).
 	 */
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 70325e0824e3..1bab0447c9dc 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -504,6 +504,8 @@ void intel_engine_setup_common(struct intel_engine_cs *engine)
 {
 	i915_timeline_init(engine->i915, &engine->timeline, engine->name);
 
+	memset(&engine->last_sseu, 0, sizeof(engine->last_sseu));
+
 	intel_engine_init_execlist(engine);
 	intel_engine_init_hangcheck(engine);
 	intel_engine_init_batch_pool(engine);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a608ff0f9e7a..c9a51185b7fe 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -513,6 +513,37 @@ static bool can_merge_ctx(const struct i915_gem_context *prev,
 	return true;
 }
 
+static void maybe_enable_noa_reprogram(struct i915_request *rq)
+{
+	struct intel_engine_cs *engine = rq->engine;
+	u32 *cs;
+
+	/* Slice/subslice/EU powergating only matters on the RCS. */
+	if (engine->id != RCS)
+		return;
+
+	/*
+	 * If the i915 perf stream is not enabled or it doesn't source any
+	 * data from the NOA muxes, we won't have anything to reconfigure.
+	 */
+	if (!engine->noa_reprogram_vma)
+		return;
+
+	/*
+	 * If the powergating configuration doesn't change, no need to
+	 * reprogram.
+	 */
+	if (engine->last_sseu.value == rq->sseu.value)
+		return;
+
+	cs = rq->ring->vaddr + rq->perf_prog;
+	*cs++ = MI_BATCH_BUFFER_START_GEN8;
+	*cs++ = lower_32_bits(engine->noa_reprogram_vma->node.start);
+	*cs++ = upper_32_bits(engine->noa_reprogram_vma->node.start);
+
+	engine->last_sseu = rq->sseu;
+}
+
 static void port_assign(struct execlist_port *port, struct i915_request *rq)
 {
 	GEM_BUG_ON(rq == port_request(port));
@@ -520,6 +551,8 @@ static void port_assign(struct execlist_port *port, struct i915_request *rq)
 	if (port_isset(port))
 		i915_request_put(port_request(port));
 
+	maybe_enable_noa_reprogram(rq);
+
 	port_set(port, port_pack(i915_request_get(rq), port_count(port)));
 }
 
@@ -801,6 +834,12 @@ execlists_cancel_port_requests(struct intel_engine_cs *engine)
 	}
 
 	execlists_user_end(execlists);
+
+	/*
+	 * Clear out the state of the sseu on the engine, as it's not clear
+	 * what it will be after preemption.
+	 */
+	engine->last_sseu.value = 0;
 }
 
 static void clear_gtiir(struct intel_engine_cs *engine)
@@ -1953,10 +1992,26 @@ static int gen8_emit_bb_start(struct i915_request *rq,
 		rq->ctx->ppgtt->pd_dirty_rings &= ~intel_engine_flag(rq->engine);
 	}
 
-	cs = intel_ring_begin(rq, 6);
+	cs = intel_ring_begin(rq, rq->engine->id == RCS ? 10 : 6);
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
+	if (rq->engine->id == RCS) {
+		/*
+		 * Leave some instructions to be written with an
+		 * MI_BATCH_BUFFER_START to the i915/perf NOA reprogramming
+		 * batchbuffer. We only turn those MI_NOOP into
+		 * MI_BATCH_BUFFER_START when we detect a SSEU powergating
+		 * configuration change that might affect NOA. This is only
+		 * for the RCS.
+		 */
+		rq->perf_prog = intel_ring_offset(rq, cs);
+		*cs++ = MI_NOOP;
+		*cs++ = MI_NOOP;
+		*cs++ = MI_NOOP;
+		*cs++ = MI_NOOP; /* Aligning to 2 dwords */
+	}
+
 	/*
 	 * WaDisableCtxRestoreArbitration:bdw,chv
 	 *
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 1d00cc3cc1a4..955518a5396f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -343,6 +343,20 @@ struct intel_engine_cs {
 
 	struct drm_i915_gem_object *default_state;
 
+	/**
+	 * @noa_reprogram_vma: A batchbuffer reprogramming the NOA muxes, used
+	 * after switching powergating configurations. This field is only
+	 * assigned by i915/perf after calling i915_gem_wait_for_idle() and
+	 * while holding the device's lock.
+	 */
+	struct i915_vma *noa_reprogram_vma;
+
+	/**
+	 * @last_sseu: The last SSEU configuration submitted to the
+	 * hardware. Set to 0 if unknown.
+	 */
+	union intel_sseu last_sseu;
+
 	atomic_t irq_count;
 	unsigned long irq_posted;
 #define ENGINE_IRQ_BREADCRUMB 0
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 7/8] drm/i915: count powergating transitions per engine
  2018-05-09 17:48 [PATCH v4 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (5 preceding siblings ...)
  2018-05-09 17:48 ` [PATCH v4 6/8] drm/i915: reprogram NOA muxes on context switch when using perf Lionel Landwerlin
@ 2018-05-09 17:48 ` Lionel Landwerlin
  2018-05-09 17:48 ` [PATCH v4 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
  2018-05-09 19:03 ` ✗ Fi.CI.BAT: failure for drm/i915: per context slice/subslice powergating (rev3) Patchwork
  8 siblings, 0 replies; 11+ messages in thread
From: Lionel Landwerlin @ 2018-05-09 17:48 UTC (permalink / raw)
  To: intel-gfx

This can be used to monitor the number of powergating transition
changes for a particular workload.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/intel_engine_cs.c  | 3 +++
 drivers/gpu/drm/i915/intel_lrc.c        | 1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h | 6 ++++++
 3 files changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 1bab0447c9dc..c795a674abf0 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -505,6 +505,7 @@ void intel_engine_setup_common(struct intel_engine_cs *engine)
 	i915_timeline_init(engine->i915, &engine->timeline, engine->name);
 
 	memset(&engine->last_sseu, 0, sizeof(engine->last_sseu));
+	atomic_set(&engine->sseu_transitions, 0);
 
 	intel_engine_init_execlist(engine);
 	intel_engine_init_hangcheck(engine);
@@ -1439,6 +1440,8 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	hexdump(m, engine->status_page.page_addr, PAGE_SIZE);
 
 	drm_printf(m, "Idle? %s\n", yesno(intel_engine_is_idle(engine)));
+
+	drm_printf(m, "Powergating transitions: %u\n", atomic_read(&engine->sseu_transitions));
 }
 
 static u8 user_class_map[] = {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index c9a51185b7fe..d0c429c4bd35 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -542,6 +542,7 @@ static void maybe_enable_noa_reprogram(struct i915_request *rq)
 	*cs++ = upper_32_bits(engine->noa_reprogram_vma->node.start);
 
 	engine->last_sseu = rq->sseu;
+	atomic_inc(&engine->sseu_transitions);
 }
 
 static void port_assign(struct execlist_port *port, struct i915_request *rq)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 955518a5396f..80819172619e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -357,6 +357,12 @@ struct intel_engine_cs {
 	 */
 	union intel_sseu last_sseu;
 
+	/**
+	 * @sseu_transitions: A counter of the number of powergating
+	 * transition this engine has gone through.
+	 */
+	atomic_t sseu_transitions;
+
 	atomic_t irq_count;
 	unsigned long irq_posted;
 #define ENGINE_IRQ_BREADCRUMB 0
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-09 17:48 [PATCH v4 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (6 preceding siblings ...)
  2018-05-09 17:48 ` [PATCH v4 7/8] drm/i915: count powergating transitions per engine Lionel Landwerlin
@ 2018-05-09 17:48 ` Lionel Landwerlin
  2018-05-09 19:03 ` ✗ Fi.CI.BAT: failure for drm/i915: per context slice/subslice powergating (rev3) Patchwork
  8 siblings, 0 replies; 11+ messages in thread
From: Lionel Landwerlin @ 2018-05-09 17:48 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

	struct drm_i915_gem_context_param arg;
	struct drm_i915_gem_context_param_sseu sseu = { .class = 0, instance = 0, };

	memset(&arg, 0, sizeof(arg));
	arg.ctx_id = ctx;
	arg.param = I915_CONTEXT_PARAM_SSEU;
	arg.value = (uintptr_t) &sseu;
	if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
		sseu.packed.subslice_mask = 0;

		drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
	}

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)

v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

v5: Validate sseu configuration against the device's capabilities (Lionel)

v6: Change context powergating settings through MI_SDM on kernel context (Chris)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
CC: Zhipeng Gong <zhipeng.gong@intel.com>
CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 151 ++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        | 103 ++++++++++------
 drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
 include/uapi/drm/i915_drm.h             |  38 ++++++
 5 files changed, 263 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index a04f0329e85a..6c67ef87b706 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -747,6 +747,92 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
 	return 0;
 }
 
+static int
+intel_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
+			  const struct drm_i915_gem_context_param_sseu *user_sseu,
+			  union intel_sseu *ctx_sseu)
+{
+	if ((user_sseu->slice_mask & ~sseu->slice_mask) != 0 ||
+	    user_sseu->slice_mask == 0)
+		return -EINVAL;
+
+	if ((user_sseu->subslice_mask & ~sseu->subslice_mask[0]) != 0 ||
+	    user_sseu->subslice_mask == 0)
+		return -EINVAL;
+
+	if (user_sseu->min_eus_per_subslice > sseu->max_eus_per_subslice)
+		return -EINVAL;
+
+	if (user_sseu->max_eus_per_subslice > sseu->max_eus_per_subslice ||
+	    user_sseu->max_eus_per_subslice < user_sseu->min_eus_per_subslice ||
+	    user_sseu->max_eus_per_subslice == 0)
+		return -EINVAL;
+
+	ctx_sseu->slice_mask = user_sseu->slice_mask;
+	ctx_sseu->subslice_mask = user_sseu->subslice_mask;
+	ctx_sseu->min_eus_per_subslice = user_sseu->min_eus_per_subslice;
+	ctx_sseu->max_eus_per_subslice = user_sseu->max_eus_per_subslice;
+
+	return 0;
+}
+
+static int
+i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
+				  struct intel_engine_cs *engine,
+				  union intel_sseu sseu)
+{
+	struct drm_i915_private *dev_priv = ctx->i915;
+	struct i915_timeline *timeline;
+	struct i915_request *rq;
+	enum intel_engine_id id;
+	int ret;
+
+	if (!engine->emit_rpcs_config)
+		return -ENODEV;
+
+	if (ctx->__engine[engine->id].sseu.value == sseu.value)
+		return 0;
+
+	lockdep_assert_held(&dev_priv->drm.struct_mutex);
+
+	i915_retire_requests(dev_priv);
+
+	/* Now use the RCS to actually reconfigure. */
+	engine = dev_priv->engine[RCS];
+
+	rq = i915_request_alloc(engine, dev_priv->kernel_context);
+	if (IS_ERR(rq))
+		return PTR_ERR(rq);
+
+	ret = engine->emit_rpcs_config(rq, ctx, sseu);
+	if (ret) {
+		__i915_request_add(rq, true);
+		return ret;
+	}
+
+	/* Queue this switch after all other activity */
+	list_for_each_entry(timeline, &dev_priv->gt.timelines, link) {
+		struct i915_request *prev;
+
+		prev = last_request_on_engine(timeline, engine);
+		if (prev)
+			i915_sw_fence_await_sw_fence_gfp(&rq->submit,
+							 &prev->submit,
+							 I915_FENCE_GFP);
+	}
+
+	__i915_request_add(rq, true);
+
+	/*
+	 * Apply the configuration to all engine. Our hardware doesn't
+	 * currently support different configurations for each engine.
+	 */
+	for_each_engine(engine, dev_priv, id)
+		ctx->__engine[id].sseu.value = sseu.value;
+
+	return 0;
+}
+
 int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 				    struct drm_file *file)
 {
@@ -784,6 +870,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_CONTEXT_PARAM_PRIORITY:
 		args->value = ctx->sched.priority;
 		break;
+	case I915_CONTEXT_PARAM_SSEU: {
+		struct drm_i915_gem_context_param_sseu param_sseu;
+		struct intel_engine_cs *engine;
+		struct intel_context *ce;
+
+		if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
+				   sizeof(param_sseu))) {
+			ret = -EFAULT;
+			break;
+		}
+
+		engine = intel_engine_lookup_user(to_i915(dev),
+						  param_sseu.class,
+						  param_sseu.instance);
+		if (!engine) {
+			ret = -EINVAL;
+			break;
+		}
+
+		ce = &ctx->__engine[engine->id];
+
+		param_sseu.slice_mask = ce->sseu.slice_mask;
+		param_sseu.subslice_mask = ce->sseu.subslice_mask;
+		param_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
+		param_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
+
+		if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
+				 sizeof(param_sseu)))
+			ret = -EFAULT;
+		break;
+	}
 	default:
 		ret = -EINVAL;
 		break;
@@ -858,7 +975,41 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 				ctx->sched.priority = priority;
 		}
 		break;
+	case I915_CONTEXT_PARAM_SSEU:
+		{
+			struct drm_i915_private *dev_priv = to_i915(dev);
+			struct drm_i915_gem_context_param_sseu user_sseu;
+			struct intel_engine_cs *engine;
+			union intel_sseu ctx_sseu;
+
+			if (args->size) {
+				ret = -EINVAL;
+				break;
+			}
+
+			if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
+					   sizeof(user_sseu))) {
+				ret = -EFAULT;
+				break;
+			}
+
+			engine = intel_engine_lookup_user(dev_priv,
+							  user_sseu.class,
+							  user_sseu.instance);
+			if (!engine) {
+				ret = -EINVAL;
+				break;
+			}
 
+			ret = intel_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
+							&user_sseu, &ctx_sseu);
+			if (ret)
+				break;
+
+			ret = i915_gem_context_reconfigure_sseu(ctx, engine,
+								ctx_sseu);
+		}
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d0c429c4bd35..8882b159dafd 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2241,6 +2241,72 @@ u32 *gen8_emit_lri_pipe_control(struct drm_i915_private *dev_priv,
 	return cs;
 }
 
+static u32 make_rpcs(const struct sseu_dev_info *sseu,
+		     union intel_sseu ctx_sseu)
+{
+	u32 rpcs = 0;
+
+	/*
+	 * Starting in Gen9, render power gating can leave
+	 * slice/subslice/EU in a partially enabled state. We
+	 * must make an explicit request through RPCS for full
+	 * enablement.
+	*/
+	if (sseu->has_slice_pg) {
+		rpcs |= GEN8_RPCS_S_CNT_ENABLE;
+		rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
+		rpcs |= GEN8_RPCS_ENABLE;
+	}
+
+	if (sseu->has_subslice_pg) {
+		rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
+		rpcs |= hweight8(ctx_sseu.subslice_mask) <<
+			GEN8_RPCS_SS_CNT_SHIFT;
+		rpcs |= GEN8_RPCS_ENABLE;
+	}
+
+	if (sseu->has_eu_pg) {
+		rpcs |= ctx_sseu.min_eus_per_subslice <<
+			GEN8_RPCS_EU_MIN_SHIFT;
+		rpcs |= ctx_sseu.max_eus_per_subslice <<
+			GEN8_RPCS_EU_MAX_SHIFT;
+		rpcs |= GEN8_RPCS_ENABLE;
+	}
+
+	return rpcs;
+}
+
+static int gen8_emit_rpcs_config(struct i915_request *rq,
+				 struct i915_gem_context *ctx,
+				 union intel_sseu sseu)
+{
+	struct drm_i915_private *dev_priv = rq->i915;
+	struct intel_context *ce = to_intel_context(ctx, dev_priv->engine[RCS]);
+	u64 offset;
+	u32 *cs;
+
+	/* Let the deferred state allocation take care of this. */
+	if (!ce->state)
+		return 0;
+
+	cs = intel_ring_begin(rq, 4);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	offset = ce->state->node.start +
+		LRC_STATE_PN * PAGE_SIZE +
+		(CTX_R_PWR_CLK_STATE + 1) * 4;
+
+	*cs++ = MI_STORE_DWORD_IMM_GEN4;
+	*cs++ = lower_32_bits(offset);
+	*cs++ = upper_32_bits(offset);
+	*cs++ = make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu);
+
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
 static int gen8_init_rcs_context(struct i915_request *rq)
 {
 	int ret;
@@ -2331,6 +2397,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 	engine->emit_breadcrumb = gen8_emit_breadcrumb;
 	engine->emit_breadcrumb_sz = gen8_emit_breadcrumb_sz;
 
+	engine->emit_rpcs_config = gen8_emit_rpcs_config;
+
 	engine->set_default_submission = execlists_set_default_submission;
 
 	if (INTEL_GEN(engine->i915) < 11) {
@@ -2479,41 +2547,6 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
 	return logical_ring_init(engine);
 }
 
-static u32 make_rpcs(const struct sseu_dev_info *sseu,
-		     union intel_sseu ctx_sseu)
-{
-	u32 rpcs = 0;
-
-	/*
-	 * Starting in Gen9, render power gating can leave
-	 * slice/subslice/EU in a partially enabled state. We
-	 * must make an explicit request through RPCS for full
-	 * enablement.
-	*/
-	if (sseu->has_slice_pg) {
-		rpcs |= GEN8_RPCS_S_CNT_ENABLE;
-		rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
-		rpcs |= GEN8_RPCS_ENABLE;
-	}
-
-	if (sseu->has_subslice_pg) {
-		rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
-		rpcs |= hweight8(ctx_sseu.subslice_mask) <<
-		        GEN8_RPCS_SS_CNT_SHIFT;
-		rpcs |= GEN8_RPCS_ENABLE;
-	}
-
-	if (sseu->has_eu_pg) {
-		rpcs |= ctx_sseu.min_eus_per_subslice <<
-			GEN8_RPCS_EU_MIN_SHIFT;
-		rpcs |= ctx_sseu.max_eus_per_subslice <<
-			GEN8_RPCS_EU_MAX_SHIFT;
-		rpcs |= GEN8_RPCS_ENABLE;
-	}
-
-	return rpcs;
-}
-
 static u32 intel_lr_indirect_ctx_offset(struct intel_engine_cs *engine)
 {
 	u32 indirect_ctx_offset;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 8f19349a6055..44fb3a1cf8f9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2026,6 +2026,8 @@ static void intel_ring_default_vfuncs(struct drm_i915_private *dev_priv,
 			engine->emit_breadcrumb_sz++;
 	}
 
+	engine->emit_rpcs_config = NULL; /* Only supported on Gen8+ */
+
 	engine->set_default_submission = i9xx_set_default_submission;
 
 	if (INTEL_GEN(dev_priv) >= 6)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 80819172619e..79e820fa9838 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -471,6 +471,10 @@ struct intel_engine_cs {
 	void		(*emit_breadcrumb)(struct i915_request *rq, u32 *cs);
 	int		emit_breadcrumb_sz;
 
+	int		(*emit_rpcs_config)(struct i915_request *rq,
+					    struct i915_gem_context *ctx,
+					    union intel_sseu sseu);
+
 	/* Pass the request to the hardware queue (e.g. directly into
 	 * the legacy ringbuffer or to the end of an execlist).
 	 *
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 7f5634ce8e88..24b90836ce1d 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1456,9 +1456,47 @@ struct drm_i915_gem_context_param {
 #define   I915_CONTEXT_MAX_USER_PRIORITY	1023 /* inclusive */
 #define   I915_CONTEXT_DEFAULT_PRIORITY		0
 #define   I915_CONTEXT_MIN_USER_PRIORITY	-1023 /* inclusive */
+	/*
+	 * When using the following param, value should be a pointer to
+	 * drm_i915_gem_context_param_sseu.
+	 */
+#define I915_CONTEXT_PARAM_SSEU		0x7
 	__u64 value;
 };
 
+struct drm_i915_gem_context_param_sseu {
+	/*
+	 * Engine class & instance to be configured or queried.
+	 */
+	__u32 class;
+	__u32 instance;
+
+	/*
+	 * Mask of slices to enable for the context. Valid values are a subset
+	 * of the bitmask value returned for I915_PARAM_SLICE_MASK.
+	 */
+	__u8 slice_mask;
+
+	/*
+	 * Mask of subslices to enable for the context. Valid values are a
+	 * subset of the bitmask value return by I915_PARAM_SUBSLICE_MASK.
+	 */
+	__u8 subslice_mask;
+
+	/*
+	 * Minimum/Maximum number of EUs to enable per subslice for the
+	 * context. min_eus_per_subslice must be inferior or equal to
+	 * max_eus_per_subslice.
+	 */
+	__u8 min_eus_per_subslice;
+	__u8 max_eus_per_subslice;
+
+	/*
+	 * Unused for now. Must be cleared to zero.
+	 */
+	__u32 rsvd;
+};
+
 enum drm_i915_oa_format {
 	I915_OA_FORMAT_A13 = 1,	    /* HSW only */
 	I915_OA_FORMAT_A29,	    /* HSW only */
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* ✗ Fi.CI.BAT: failure for drm/i915: per context slice/subslice powergating (rev3)
  2018-05-09 17:48 [PATCH v4 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (7 preceding siblings ...)
  2018-05-09 17:48 ` [PATCH v4 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
@ 2018-05-09 19:03 ` Patchwork
  8 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2018-05-09 19:03 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: per context slice/subslice powergating (rev3)
URL   : https://patchwork.freedesktop.org/series/42285/
State : failure

== Summary ==

CHK     include/config/kernel.release
  CHK     include/generated/uapi/linux/version.h
  CHK     include/generated/utsrelease.h
  CHK     include/generated/bounds.h
  CHK     include/generated/timeconst.h
  CHK     include/generated/asm-offsets.h
  CALL    scripts/checksyscalls.sh
  DESCEND  objtool
  CHK     scripts/mod/devicetable-offsets.h
  CHK     include/generated/compile.h
  CHK     kernel/config_data.h
  CC [M]  drivers/gpu/drm/i915/i915_perf.o
drivers/gpu/drm/i915/i915_perf.c: In function ‘alloc_noa_reprogram_bo’:
drivers/gpu/drm/i915/i915_perf.c:1774:25: error: macro "IS_GEN8" passed 2 arguments, but takes just 1
  if (IS_GEN8(dev_priv, 8))
                         ^
drivers/gpu/drm/i915/i915_perf.c:1774:6: error: ‘IS_GEN8’ undeclared (first use in this function); did you mean ‘IS_ERR’?
  if (IS_GEN8(dev_priv, 8))
      ^~~~~~~
      IS_ERR
drivers/gpu/drm/i915/i915_perf.c:1774:6: note: each undeclared identifier is reported only once for each function it appears in
scripts/Makefile.build:312: recipe for target 'drivers/gpu/drm/i915/i915_perf.o' failed
make[4]: *** [drivers/gpu/drm/i915/i915_perf.o] Error 1
scripts/Makefile.build:559: recipe for target 'drivers/gpu/drm/i915' failed
make[3]: *** [drivers/gpu/drm/i915] Error 2
scripts/Makefile.build:559: recipe for target 'drivers/gpu/drm' failed
make[2]: *** [drivers/gpu/drm] Error 2
scripts/Makefile.build:559: recipe for target 'drivers/gpu' failed
make[1]: *** [drivers/gpu] Error 2
Makefile:1060: recipe for target 'drivers' failed
make: *** [drivers] Error 2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 6/8] drm/i915: reprogram NOA muxes on context switch when using perf
  2018-05-09 17:48 ` [PATCH v4 6/8] drm/i915: reprogram NOA muxes on context switch when using perf Lionel Landwerlin
@ 2018-05-09 23:38   ` Lionel Landwerlin
  0 siblings, 0 replies; 11+ messages in thread
From: Lionel Landwerlin @ 2018-05-09 23:38 UTC (permalink / raw)
  To: intel-gfx

On 09/05/18 18:48, Lionel Landwerlin wrote:
> @@ -1953,10 +1992,26 @@ static int gen8_emit_bb_start(struct i915_request *rq,
>   		rq->ctx->ppgtt->pd_dirty_rings &= ~intel_engine_flag(rq->engine);
>   	}
>   
> -	cs = intel_ring_begin(rq, 6);
> +	cs = intel_ring_begin(rq, rq->engine->id == RCS ? 10 : 6);
>   	if (IS_ERR(cs))
>   		return PTR_ERR(cs);
>   
> +	if (rq->engine->id == RCS) {
> +		/*
> +		 * Leave some instructions to be written with an
> +		 * MI_BATCH_BUFFER_START to the i915/perf NOA reprogramming
> +		 * batchbuffer. We only turn those MI_NOOP into
> +		 * MI_BATCH_BUFFER_START when we detect a SSEU powergating
> +		 * configuration change that might affect NOA. This is only
> +		 * for the RCS.
> +		 */
> +		rq->perf_prog = intel_ring_offset(rq, cs);
> +		*cs++ = MI_NOOP;
> +		*cs++ = MI_NOOP;
> +		*cs++ = MI_NOOP;
> +		*cs++ = MI_NOOP; /* Aligning to 2 dwords */
> +	}
> +
I just realized that isn't going to work if a request is preempted, then 
later resubmitted after another context with a different powergating 
config...
This reprog bb won't be executed because the CS pointer should be past 
that point already.
It seems to make this approach unworkable?

Would a per-ctx-wa-bb call into the reprogramming buffer under 
MI_PREDICATE work?
LOAD rpcs into predicate_reg0
LOAD engine storage for last rpcs into prediate_reg1
PREDICATE reg0 == reg1
MI_LRI noa registers
PREDICATE unset
STORE rpcs into engine storage for last rpcs

Thanks,

-
Lionel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-05-09 23:38 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-09 17:48 [PATCH v4 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
2018-05-09 17:48 ` [PATCH v4 1/8] drm/i915: Program RPCS for Broadwell Lionel Landwerlin
2018-05-09 17:48 ` [PATCH v4 2/8] drm/i915: Record the sseu configuration per-context & engine Lionel Landwerlin
2018-05-09 17:48 ` [PATCH v4 3/8] drm/i915/perf: simplify configure all context function Lionel Landwerlin
2018-05-09 17:48 ` [PATCH v4 4/8] drm/i915: add new pipe control helper for mmio writes Lionel Landwerlin
2018-05-09 17:48 ` [PATCH v4 5/8] drm/i915: give engine to execlists cancel helper Lionel Landwerlin
2018-05-09 17:48 ` [PATCH v4 6/8] drm/i915: reprogram NOA muxes on context switch when using perf Lionel Landwerlin
2018-05-09 23:38   ` Lionel Landwerlin
2018-05-09 17:48 ` [PATCH v4 7/8] drm/i915: count powergating transitions per engine Lionel Landwerlin
2018-05-09 17:48 ` [PATCH v4 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
2018-05-09 19:03 ` ✗ Fi.CI.BAT: failure for drm/i915: per context slice/subslice powergating (rev3) Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.