[PATCH v3 0/6] drm/i915: per context slice/subslice powergating

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/6] drm/i915: per context slice/subslice powergating
@ 2018-05-08 18:03 Lionel Landwerlin
  2018-05-08 18:03 ` [PATCH v3 1/6] drm/i915: Program RPCS for Broadwell Lionel Landwerlin
                   ` (9 more replies)
  0 siblings, 10 replies; 16+ messages in thread
From: Lionel Landwerlin @ 2018-05-08 18:03 UTC (permalink / raw)
  To: intel-gfx

Hi all,

Another update, trying to minimize the number of reprogramming issued
due to powergating configuration changes. This is achieved by tracking
the last submitted powergating configuration in the engine submission
mechanism and reprogramming only on configuration changes.

Another improvement that could be applied to limit the number of
changes in the powergating configuration could be :

   - to align all contexts to a given configuration (implictly discard
     the userspace requests).

   - within the submission mechanism, have a round robin scheduling
     that switches between buckets of contexts of different
     powergating configuration

This series doesn't include such mechanisms, it's probably best for me
to leave that to people who are more experienced.

Cheers,

Chris Wilson (3):
  drm/i915: Program RPCS for Broadwell
  drm/i915: Record the sseu configuration per-context & engine
  drm/i915: Expose RPCS (SSEU) configuration to userspace

Lionel Landwerlin (3):
  drm/i915/perf: simplify configure all context function
  drm/i915: reprogram NOA muxes on context switch when using perf
  drm/i915: count powergating transitions per engine

 drivers/gpu/drm/i915/i915_drv.h           |   6 +
 drivers/gpu/drm/i915/i915_gem.h           |  13 ++
 drivers/gpu/drm/i915/i915_gem_context.c   | 117 ++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_context.h   |   3 +
 drivers/gpu/drm/i915/i915_perf.c          | 119 ++++++++++++++++--
 drivers/gpu/drm/i915/i915_request.h       |   6 +
 drivers/gpu/drm/i915/intel_engine_cs.c    |   2 +
 drivers/gpu/drm/i915/intel_gpu_commands.h |   1 +
 drivers/gpu/drm/i915/intel_lrc.c          | 146 +++++++++++++++++++---
 drivers/gpu/drm/i915/intel_lrc.h          |   4 +
 drivers/gpu/drm/i915/intel_ringbuffer.c   |   3 +
 drivers/gpu/drm/i915/intel_ringbuffer.h   |   3 +
 include/uapi/drm/i915_drm.h               |  38 ++++++
 13 files changed, 433 insertions(+), 28 deletions(-)

--
2.17.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 1/6] drm/i915: Program RPCS for Broadwell
  2018-05-08 18:03 [PATCH v3 0/6] drm/i915: per context slice/subslice powergating Lionel Landwerlin
@ 2018-05-08 18:03 ` Lionel Landwerlin
  2018-05-08 18:03 ` [PATCH v3 2/6] drm/i915: Record the sseu configuration per-context & engine Lionel Landwerlin
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Lionel Landwerlin @ 2018-05-08 18:03 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

Currently we only configure the power gating for Skylake and above, but
the configuration should equally apply to Broadwell and Braswell. Even
though, there is not as much variation as for later generations, we want
to expose control over the configuration to userspace and may want to
opt out of the "always-enabled" setting.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 911f288f78aa..474f1af5fbfc 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2378,13 +2378,6 @@ make_rpcs(struct drm_i915_private *dev_priv)
 {
 	u32 rpcs = 0;
 
-	/*
-	 * No explicit RPCS request is needed to ensure full
-	 * slice/subslice/EU enablement prior to Gen9.
-	*/
-	if (INTEL_GEN(dev_priv) < 9)
-		return 0;
-
 	/*
 	 * Starting in Gen9, render power gating can leave
 	 * slice/subslice/EU in a partially enabled state. We
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 2/6] drm/i915: Record the sseu configuration per-context & engine
  2018-05-08 18:03 [PATCH v3 0/6] drm/i915: per context slice/subslice powergating Lionel Landwerlin
  2018-05-08 18:03 ` [PATCH v3 1/6] drm/i915: Program RPCS for Broadwell Lionel Landwerlin
@ 2018-05-08 18:03 ` Lionel Landwerlin
  2018-05-08 18:03 ` [PATCH v3 3/6] drm/i915/perf: simplify configure all context function Lionel Landwerlin
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Lionel Landwerlin @ 2018-05-08 18:03 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

We want to expose the ability to reconfigure the slices, subslice and
eu per context and per engine. To facilitate that, store the current
configuration on the context for each engine, which is initially set
to the device default upon creation.

v2: record sseu configuration per context & engine (Chris)

v3: introduce the i915_gem_context_sseu to store powergating
    programming, sseu_dev_info has grown quite a bit (Lionel)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.h         | 13 +++++++++++++
 drivers/gpu/drm/i915/i915_gem_context.c | 23 +++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_context.h |  3 +++
 drivers/gpu/drm/i915/intel_lrc.c        | 24 ++++++++++++------------
 4 files changed, 51 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
index 525920404ede..5a5315dc9c2c 100644
--- a/drivers/gpu/drm/i915/i915_gem.h
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -69,6 +69,19 @@ struct drm_i915_private;
 
 #define I915_NUM_ENGINES 8
 
+/*
+ * Powergating configuration for a particular (context,engine).
+ */
+union i915_gem_sseu {
+	struct {
+		u8 slice_mask;
+		u8 subslice_mask;
+		u8 min_eus_per_subslice;
+		u8 max_eus_per_subslice;
+	};
+	u64 value;
+};
+
 void i915_gem_park(struct drm_i915_private *i915);
 void i915_gem_unpark(struct drm_i915_private *i915);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 33f8a4b3c981..2af071c02e74 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -261,11 +261,26 @@ static u32 default_desc_template(const struct drm_i915_private *i915,
 	return desc;
 }
 
+static union i915_gem_sseu
+i915_gem_sseu_from_device_sseu(const struct sseu_dev_info *sseu)
+{
+	union i915_gem_sseu value = {
+		.slice_mask = sseu->slice_mask,
+		.subslice_mask = sseu->subslice_mask[0],
+		.min_eus_per_subslice = sseu->max_eus_per_subslice,
+		.max_eus_per_subslice = sseu->max_eus_per_subslice,
+	};
+
+	return value;
+}
+
 static struct i915_gem_context *
 __create_hw_context(struct drm_i915_private *dev_priv,
 		    struct drm_i915_file_private *file_priv)
 {
 	struct i915_gem_context *ctx;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
 	int ret;
 
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
@@ -314,6 +329,14 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 	 * is no remap info, it will be a NOP. */
 	ctx->remap_slice = ALL_L3_SLICES(dev_priv);
 
+	/* On all engines, use the whole device by default */
+	for_each_engine(engine, dev_priv, id) {
+		struct intel_context *ce = &ctx->__engine[id];
+
+		ce->sseu =i915_gem_sseu_from_device_sseu(
+			&INTEL_INFO(dev_priv)->sseu);
+	}
+
 	i915_gem_context_set_bannable(ctx);
 	ctx->ring_size = 4 * PAGE_SIZE;
 	ctx->desc_template =
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index ace3b129c189..46e2166fc03b 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -30,6 +30,7 @@
 #include <linux/radix-tree.h>
 
 #include "i915_gem.h"
+#include "intel_device_info.h"
 
 struct pid;
 
@@ -149,6 +150,8 @@ struct i915_gem_context {
 		u32 *lrc_reg_state;
 		u64 lrc_desc;
 		int pin_count;
+		/** sseu: Control eu/slice partitioning */
+		union i915_gem_sseu sseu;
 	} __engine[I915_NUM_ENGINES];
 
 	/** ring_size: size for allocating the per-engine ring buffer */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 474f1af5fbfc..1c08bd2be6c3 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2373,8 +2373,8 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
 	return logical_ring_init(engine);
 }
 
-static u32
-make_rpcs(struct drm_i915_private *dev_priv)
+static u32 make_rpcs(const struct sseu_dev_info *sseu,
+		     union i915_gem_sseu ctx_sseu)
 {
 	u32 rpcs = 0;
 
@@ -2384,24 +2384,23 @@ make_rpcs(struct drm_i915_private *dev_priv)
 	 * must make an explicit request through RPCS for full
 	 * enablement.
 	*/
-	if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) {
+	if (sseu->has_slice_pg) {
 		rpcs |= GEN8_RPCS_S_CNT_ENABLE;
-		rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask) <<
-			GEN8_RPCS_S_CNT_SHIFT;
+		rpcs |= hweight8(ctx_sseu.slice_mask) << GEN8_RPCS_S_CNT_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
 
-	if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) {
+	if (sseu->has_subslice_pg) {
 		rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
-		rpcs |= hweight8(INTEL_INFO(dev_priv)->sseu.subslice_mask[0]) <<
-			GEN8_RPCS_SS_CNT_SHIFT;
+		rpcs |= hweight8(ctx_sseu.subslice_mask) <<
+		        GEN8_RPCS_SS_CNT_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
 
-	if (INTEL_INFO(dev_priv)->sseu.has_eu_pg) {
-		rpcs |= INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
+	if (sseu->has_eu_pg) {
+		rpcs |= ctx_sseu.min_eus_per_subslice <<
 			GEN8_RPCS_EU_MIN_SHIFT;
-		rpcs |= INTEL_INFO(dev_priv)->sseu.eu_per_subslice <<
+		rpcs |= ctx_sseu.max_eus_per_subslice <<
 			GEN8_RPCS_EU_MAX_SHIFT;
 		rpcs |= GEN8_RPCS_ENABLE;
 	}
@@ -2525,7 +2524,8 @@ static void execlists_init_reg_state(u32 *regs,
 	if (rcs) {
 		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
-			make_rpcs(dev_priv));
+			make_rpcs(&INTEL_INFO(dev_priv)->sseu,
+				  ctx->__engine[engine->id].sseu));
 
 		i915_oa_init_reg_state(engine, ctx, regs);
 	}
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 3/6] drm/i915/perf: simplify configure all context function
  2018-05-08 18:03 [PATCH v3 0/6] drm/i915: per context slice/subslice powergating Lionel Landwerlin
  2018-05-08 18:03 ` [PATCH v3 1/6] drm/i915: Program RPCS for Broadwell Lionel Landwerlin
  2018-05-08 18:03 ` [PATCH v3 2/6] drm/i915: Record the sseu configuration per-context & engine Lionel Landwerlin
@ 2018-05-08 18:03 ` Lionel Landwerlin
  2018-05-08 18:03 ` [PATCH v3 4/6] drm/i915: reprogram NOA muxes on context switch when using perf Lionel Landwerlin
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Lionel Landwerlin @ 2018-05-08 18:03 UTC (permalink / raw)
  To: intel-gfx

We don't need any special treatment on error so just return as soon as
possible.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index d9341415df40..5b279a82445a 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1765,7 +1765,7 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 	/* Switch away from any user context. */
 	ret = gen8_switch_to_updated_kernel_context(dev_priv, oa_config);
 	if (ret)
-		goto out;
+		return ret;
 
 	/*
 	 * The OA register config is setup through the context image. This image
@@ -1782,7 +1782,7 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 	 */
 	ret = i915_gem_wait_for_idle(dev_priv, wait_flags);
 	if (ret)
-		goto out;
+		return ret;
 
 	/* Update all contexts now that we've stalled the submission. */
 	list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
@@ -1794,10 +1794,8 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 			continue;
 
 		regs = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
-		if (IS_ERR(regs)) {
-			ret = PTR_ERR(regs);
-			goto out;
-		}
+		if (IS_ERR(regs))
+			return PTR_ERR(regs);
 
 		ce->state->obj->mm.dirty = true;
 		regs += LRC_STATE_PN * PAGE_SIZE / sizeof(*regs);
@@ -1807,7 +1805,6 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 		i915_gem_object_unpin_map(ce->state->obj);
 	}
 
- out:
 	return ret;
 }
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 4/6] drm/i915: reprogram NOA muxes on context switch when using perf
  2018-05-08 18:03 [PATCH v3 0/6] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (2 preceding siblings ...)
  2018-05-08 18:03 ` [PATCH v3 3/6] drm/i915/perf: simplify configure all context function Lionel Landwerlin
@ 2018-05-08 18:03 ` Lionel Landwerlin
  2018-05-09  8:59   ` Chris Wilson
  2018-05-08 18:03 ` [PATCH v3 5/6] drm/i915: count powergating transitions per engine Lionel Landwerlin
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Lionel Landwerlin @ 2018-05-08 18:03 UTC (permalink / raw)
  To: intel-gfx

If some of the contexts submitting workloads to the GPU have been
configured to shutdown slices/subslices, we might loose the NOA
configurations written in the NOA muxes. We need to reprogram them
when we detect a powergating configuration change.

In this change i915/perf is responsible for setting up a reprogramming
batchbuffer which we execute just before the userspace submitted
batchbuffer. We do this while preemption is still disable, only if
needed. The decision to execute this reprogramming batchbuffer is made
when we assign a request to an execlist port.

v2: Only reprogram when detecting configuration changes (Chris/Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h           |   6 ++
 drivers/gpu/drm/i915/i915_perf.c          | 108 ++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_request.h       |   6 ++
 drivers/gpu/drm/i915/intel_gpu_commands.h |   1 +
 drivers/gpu/drm/i915/intel_lrc.c          |  59 +++++++++++-
 drivers/gpu/drm/i915/intel_ringbuffer.c   |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h   |   2 +
 7 files changed, 183 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 04e27806e581..07cdbfb4ad7a 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1461,6 +1461,12 @@ struct i915_perf_stream {
 	 * @oa_config: The OA configuration used by the stream.
 	 */
 	struct i915_oa_config *oa_config;
+
+	/**
+	 * @noa_reprogram_vma: A batchbuffer reprogramming the NOA muxes, used
+	 * after switching powergating configurations.
+	 */
+	struct i915_vma *noa_reprogram_vma;
 };
 
 /**
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 5b279a82445a..1b9e3d6a53a2 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1691,6 +1691,96 @@ static int gen8_emit_oa_config(struct i915_request *rq,
 	return 0;
 }
 
+#define MAX_LRI_SIZE (125U)
+
+static u32 noa_reprogram_bb_size(struct i915_oa_config *oa_config)
+{
+	u32 n_lri;
+
+	/* Very unlikely but possible that we have no muxes to configure. */
+	if (!oa_config->mux_regs_len)
+		return 0;
+
+	n_lri = (oa_config->mux_regs_len / MAX_LRI_SIZE) +
+		(oa_config->mux_regs_len % MAX_LRI_SIZE) != 0;
+
+	return n_lri * 4 + oa_config->mux_regs_len * 8 + /* MI_LOAD_REGISTER_IMMs */
+		6 * 4 + /* PIPE_CONTROL */
+		4; /* MI_BATCH_BUFFER_END */
+}
+
+static int
+alloc_noa_reprogram_bo(struct i915_perf_stream *stream)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+	struct i915_oa_config *oa_config = stream->oa_config;
+	struct drm_i915_gem_object *bo;
+	struct i915_vma *vma;
+	u32 buffer_size;
+	u32 *cs;
+	int i, ret, n_loaded_regs;
+
+	buffer_size = ALIGN(noa_reprogram_bb_size(oa_config), PAGE_SIZE);
+	if (buffer_size == 0)
+		return 0;
+
+	bo = i915_gem_object_create(dev_priv, buffer_size);
+	if (IS_ERR(bo)) {
+		DRM_ERROR("Failed to allocate NOA reprogramming buffer\n");
+		return PTR_ERR(bo);
+	}
+
+	cs = i915_gem_object_pin_map(bo, I915_MAP_WB);
+	if (IS_ERR(cs)) {
+		ret = PTR_ERR(cs);
+		goto err_unref_bo;
+	}
+
+	n_loaded_regs = 0;
+	for (i = 0; i < oa_config->mux_regs_len; i++) {
+		if ((n_loaded_regs % MAX_LRI_SIZE) == 0) {
+			u32 n_lri = min(oa_config->mux_regs_len - n_loaded_regs,
+					MAX_LRI_SIZE);
+			*cs++ = MI_LOAD_REGISTER_IMM(n_lri);
+		}
+
+		*cs++ = i915_mmio_reg_offset(oa_config->mux_regs[i].addr);
+		*cs++ = oa_config->mux_regs[i].value;
+		n_loaded_regs++;
+	}
+
+	cs = gen8_emit_pipe_control(cs, PIPE_CONTROL_MMIO_WRITE, 0);
+
+	*cs++ = MI_BATCH_BUFFER_END;
+
+	i915_gem_object_unpin_map(bo);
+
+	ret = i915_gem_object_set_to_gtt_domain(bo, false);
+	if (ret)
+		goto err_unref_bo;
+
+	vma = i915_vma_instance(bo, &dev_priv->ggtt.base, NULL);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
+		goto err_unref_bo;
+	}
+
+	ret = i915_vma_pin(vma, 0, 0, PIN_USER | PIN_GLOBAL);
+	if (ret)
+		goto err_unref_vma;
+
+	stream->noa_reprogram_vma = vma;
+
+	return 0;
+
+err_unref_vma:
+	i915_vma_put(vma);
+err_unref_bo:
+	i915_gem_object_put(bo);
+
+	return ret;
+}
+
 static int gen8_switch_to_updated_kernel_context(struct drm_i915_private *dev_priv,
 						 const struct i915_oa_config *oa_config)
 {
@@ -2102,6 +2192,12 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 		goto err_config;
 	}
 
+	ret = alloc_noa_reprogram_bo(stream);
+	if (ret) {
+		DRM_DEBUG("Enable to alloc NOA reprogramming BO\n");
+		goto err_config;
+	}
+
 	/* PRM - observability performance counters:
 	 *
 	 *   OACONTROL, performance counter enable, note:
@@ -2136,6 +2232,13 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 
 	dev_priv->perf.oa.exclusive_stream = stream;
 
+	ret = dev_priv->perf.oa.ops.enable_metric_set(dev_priv,
+						      stream->oa_config);
+	if (ret)
+		goto err_enable;
+
+	stream->ops = &i915_oa_stream_ops;
+
 	mutex_unlock(&dev_priv->drm.struct_mutex);
 
 	return 0;
@@ -2497,6 +2600,11 @@ static void i915_perf_destroy_locked(struct i915_perf_stream *stream)
 	if (stream->ctx)
 		i915_gem_context_put(stream->ctx);
 
+	if (stream->noa_reprogram_vma) {
+		i915_vma_unpin(stream->noa_reprogram_vma);
+		i915_gem_object_put(stream->noa_reprogram_vma->obj);
+	}
+
 	kfree(stream);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index eddbd4245cb3..3357ceb4bdb1 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -149,6 +149,12 @@ struct i915_request {
 	/** Preallocate space in the ring for the emitting the request */
 	u32 reserved_space;
 
+	/*
+	 * Pointer in the batchbuffer to where the i915/perf NOA reprogramming
+	 * can be inserted just before HW submission.
+	 */
+	u32 *perf_prog_start;
+
 	/** Batch buffer related to this request if any (used for
 	 * error state dump only).
 	 */
diff --git a/drivers/gpu/drm/i915/intel_gpu_commands.h b/drivers/gpu/drm/i915/intel_gpu_commands.h
index 105e2a9e874a..09b0fded8b17 100644
--- a/drivers/gpu/drm/i915/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/intel_gpu_commands.h
@@ -146,6 +146,7 @@
 #define   MI_BATCH_GTT		    (2<<6) /* aliased with (1<<7) on gen4 */
 #define MI_BATCH_BUFFER_START_GEN8	MI_INSTR(0x31, 1)
 #define   MI_BATCH_RESOURCE_STREAMER (1<<10)
+#define   MI_BATCH_SECOND_LEVEL      (1<<22)
 
 /*
  * 3D instructions used by the kernel
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 1c08bd2be6c3..df4a7bb07eae 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -510,6 +510,38 @@ static bool can_merge_ctx(const struct i915_gem_context *prev,
 	return true;
 }
 
+static void maybe_enable_noa_repgoram(struct i915_request *rq)
+{
+	struct intel_engine_cs *engine = rq->engine;
+	struct drm_i915_private *dev_priv = engine->i915;
+	struct i915_perf_stream *stream = dev_priv->perf.oa.exclusive_stream;
+	struct intel_context *ce;
+	u32 *cs = rq->perf_prog_start;
+
+	/* Slice/subslice/EU powergating only matters on the RCS. */
+	if (engine->id != RCS)
+		return;
+
+	/* Nothing need to reprogram when perf isn't enabled. */
+	if (!stream)
+		return;
+
+	ce = &rq->ctx->__engine[engine->id];
+
+	/*
+	 * If the powergating configuration doesn't change, no need to
+	 * reprogram.
+	 */
+	if (engine->last_sseu.value == ce->sseu.value)
+		return;
+
+	*cs++ = MI_BATCH_BUFFER_START_GEN8 | MI_BATCH_SECOND_LEVEL;
+	*cs++ = 0;
+	*cs++ = i915_ggtt_offset(stream->noa_reprogram_vma);
+
+	engine->last_sseu = ce->sseu;
+}
+
 static void port_assign(struct execlist_port *port, struct i915_request *rq)
 {
 	GEM_BUG_ON(rq == port_request(port));
@@ -517,6 +549,8 @@ static void port_assign(struct execlist_port *port, struct i915_request *rq)
 	if (port_isset(port))
 		i915_request_put(port_request(port));
 
+	maybe_enable_noa_repgoram(rq);
+
 	port_set(port, port_pack(i915_request_get(rq), port_count(port)));
 }
 
@@ -1047,6 +1081,13 @@ static void execlists_submission_tasklet(unsigned long data)
 			    buf[2*head + 1] == execlists->preempt_complete_status) {
 				GEM_TRACE("%s preempt-idle\n", engine->name);
 
+				/*
+				 * Clear out the state of the sseu on the
+				 * engine, as it's not clear what it will be
+				 * after preemption.
+				 */
+				memset(&engine->last_sseu, 0, sizeof(engine->last_sseu));
+
 				execlists_cancel_port_requests(execlists);
 				execlists_unwind_incomplete_requests(execlists);
 
@@ -1937,10 +1978,26 @@ static int gen8_emit_bb_start(struct i915_request *rq,
 		rq->ctx->ppgtt->pd_dirty_rings &= ~intel_engine_flag(rq->engine);
 	}
 
-	cs = intel_ring_begin(rq, 6);
+	cs = intel_ring_begin(rq, rq->engine->id == RCS ? 10 : 6);
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
+	if (rq->engine->id == RCS) {
+		/*
+		 * Leave some instructions to be written with an
+		 * MI_BATCH_BUFFER_START to the i915/perf NOA reprogramming
+		 * batchbuffer. We only turn those MI_NOOP into
+		 * MI_BATCH_BUFFER_START when we detect a SSEU powergating
+		 * configuration change that might affect NOA. This is only
+		 * for the RCS.
+		 */
+		rq->perf_prog_start = cs;
+		*cs++ = MI_NOOP;
+		*cs++ = MI_NOOP;
+		*cs++ = MI_NOOP;
+		*cs++ = MI_NOOP; /* Aligning to 2 dwords */
+	}
+
 	/*
 	 * WaDisableCtxRestoreArbitration:bdw,chv
 	 *
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 8f19349a6055..cde791234397 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -525,6 +525,8 @@ static int init_ring_common(struct intel_engine_cs *engine)
 	if (INTEL_GEN(dev_priv) > 2)
 		I915_WRITE_MODE(engine, _MASKED_BIT_DISABLE(STOP_RING));
 
+	memset(&engine->last_sseu, 0, sizeof(engine->last_sseu));
+
 out:
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 010750e8ee44..2f8518e1a49f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -343,6 +343,8 @@ struct intel_engine_cs {
 
 	struct drm_i915_gem_object *default_state;
 
+	union i915_gem_sseu last_sseu;
+
 	atomic_t irq_count;
 	unsigned long irq_posted;
 #define ENGINE_IRQ_BREADCRUMB 0
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 5/6] drm/i915: count powergating transitions per engine
  2018-05-08 18:03 [PATCH v3 0/6] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (3 preceding siblings ...)
  2018-05-08 18:03 ` [PATCH v3 4/6] drm/i915: reprogram NOA muxes on context switch when using perf Lionel Landwerlin
@ 2018-05-08 18:03 ` Lionel Landwerlin
  2018-05-08 18:03 ` [PATCH v3 6/6] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Lionel Landwerlin @ 2018-05-08 18:03 UTC (permalink / raw)
  To: intel-gfx

This can be used to monitor the number of powergating transition
changes for a particular workload.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/intel_engine_cs.c  | 2 ++
 drivers/gpu/drm/i915/intel_lrc.c        | 1 +
 drivers/gpu/drm/i915/intel_ringbuffer.c | 1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h | 1 +
 4 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 70325e0824e3..63348c20c6ed 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1437,6 +1437,8 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	hexdump(m, engine->status_page.page_addr, PAGE_SIZE);
 
 	drm_printf(m, "Idle? %s\n", yesno(intel_engine_is_idle(engine)));
+
+	drm_printf(m, "Powergating transitions: %u\n", atomic_read(&engine->sseu_transitions));
 }
 
 static u8 user_class_map[] = {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index df4a7bb07eae..5b089a8439ac 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -540,6 +540,7 @@ static void maybe_enable_noa_repgoram(struct i915_request *rq)
 	*cs++ = i915_ggtt_offset(stream->noa_reprogram_vma);
 
 	engine->last_sseu = ce->sseu;
+	atomic_inc(&engine->sseu_transitions);
 }
 
 static void port_assign(struct execlist_port *port, struct i915_request *rq)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index cde791234397..8e9aa31e9b2d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -526,6 +526,7 @@ static int init_ring_common(struct intel_engine_cs *engine)
 		I915_WRITE_MODE(engine, _MASKED_BIT_DISABLE(STOP_RING));
 
 	memset(&engine->last_sseu, 0, sizeof(engine->last_sseu));
+	atomic_set(&engine->sseu_transitions, 0);
 
 out:
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 2f8518e1a49f..ad269f09af5e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -344,6 +344,7 @@ struct intel_engine_cs {
 	struct drm_i915_gem_object *default_state;
 
 	union i915_gem_sseu last_sseu;
+	atomic_t sseu_transitions;
 
 	atomic_t irq_count;
 	unsigned long irq_posted;
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 6/6] drm/i915: Expose RPCS (SSEU) configuration to userspace
  2018-05-08 18:03 [PATCH v3 0/6] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (4 preceding siblings ...)
  2018-05-08 18:03 ` [PATCH v3 5/6] drm/i915: count powergating transitions per engine Lionel Landwerlin
@ 2018-05-08 18:03 ` Lionel Landwerlin
  2018-05-08 18:33 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: per context slice/subslice powergating (rev2) Patchwork
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Lionel Landwerlin @ 2018-05-08 18:03 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

	struct drm_i915_gem_context_param arg;
	struct drm_i915_gem_context_param_sseu sseu = { .class = 0, instance = 0, };

	memset(&arg, 0, sizeof(arg));
	arg.ctx_id = ctx;
	arg.param = I915_CONTEXT_PARAM_SSEU;
	arg.value = (uintptr_t) &sseu;
	if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
		sseu.packed.subslice_mask = 0;

		drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
	}

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)

v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

v5: Validate sseu configuration against the device's capabilities (Lionel)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
c: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
CC: Zhipeng Gong <zhipeng.gong@intel.com>
CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 94 ++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_lrc.c        | 55 +++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h        |  4 ++
 include/uapi/drm/i915_drm.h             | 38 ++++++++++
 4 files changed, 190 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 2af071c02e74..6a93c73f368e 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -748,6 +748,35 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
 	return 0;
 }
 
+static int
+i915_gem_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
+			     const struct drm_i915_gem_context_param_sseu *user_sseu,
+			     union i915_gem_sseu *ctx_sseu)
+{
+	if ((user_sseu->slice_mask & ~sseu->slice_mask) != 0 ||
+	    user_sseu->slice_mask == 0)
+		return -EINVAL;
+
+	if ((user_sseu->subslice_mask & ~sseu->subslice_mask[0]) != 0 ||
+	    user_sseu->subslice_mask == 0)
+		return -EINVAL;
+
+	if (user_sseu->min_eus_per_subslice > sseu->max_eus_per_subslice)
+		return -EINVAL;
+
+	if (user_sseu->max_eus_per_subslice > sseu->max_eus_per_subslice ||
+	    user_sseu->max_eus_per_subslice < user_sseu->min_eus_per_subslice ||
+	    user_sseu->max_eus_per_subslice == 0)
+		return -EINVAL;
+
+	ctx_sseu->slice_mask = user_sseu->slice_mask;
+	ctx_sseu->subslice_mask = user_sseu->subslice_mask;
+	ctx_sseu->min_eus_per_subslice = user_sseu->min_eus_per_subslice;
+	ctx_sseu->max_eus_per_subslice = user_sseu->max_eus_per_subslice;
+
+	return 0;
+}
+
 int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 				    struct drm_file *file)
 {
@@ -785,6 +814,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_CONTEXT_PARAM_PRIORITY:
 		args->value = ctx->sched.priority;
 		break;
+	case I915_CONTEXT_PARAM_SSEU: {
+		struct drm_i915_gem_context_param_sseu param_sseu;
+		struct intel_engine_cs *engine;
+		struct intel_context *ce;
+
+		if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
+				   sizeof(param_sseu))) {
+			ret = -EFAULT;
+			break;
+		}
+
+		engine = intel_engine_lookup_user(to_i915(dev),
+						  param_sseu.class,
+						  param_sseu.instance);
+		if (!engine) {
+			ret = -EINVAL;
+			break;
+		}
+
+		ce = &ctx->__engine[engine->id];
+
+		param_sseu.slice_mask = ce->sseu.slice_mask;
+		param_sseu.subslice_mask = ce->sseu.subslice_mask;
+		param_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
+		param_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
+
+		if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
+				 sizeof(param_sseu)))
+			ret = -EFAULT;
+		break;
+	}
 	default:
 		ret = -EINVAL;
 		break;
@@ -840,7 +900,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		else
 			i915_gem_context_clear_bannable(ctx);
 		break;
-
 	case I915_CONTEXT_PARAM_PRIORITY:
 		{
 			s64 priority = args->value;
@@ -859,7 +918,40 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 				ctx->sched.priority = priority;
 		}
 		break;
+	case I915_CONTEXT_PARAM_SSEU:
+		if (args->size) {
+			ret = -EINVAL;
+		} else if (!HAS_EXECLISTS(ctx->i915)) {
+			ret = -ENODEV;
+		} else {
+			struct drm_i915_private *dev_priv = to_i915(dev);
+			struct drm_i915_gem_context_param_sseu user_sseu;
+			union i915_gem_sseu ctx_sseu;
+			struct intel_engine_cs *engine;
+
+			if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
+					   sizeof(user_sseu))) {
+				ret = -EFAULT;
+				break;
+			}
+
+			engine = intel_engine_lookup_user(dev_priv,
+							  user_sseu.class,
+							  user_sseu.instance);
+			if (!engine) {
+				ret = -EINVAL;
+				break;
+			}
 
+			ret = i915_gem_sseu_from_user_sseu(
+				&INTEL_INFO(dev_priv)->sseu,
+				&user_sseu, &ctx_sseu);
+			if (ret)
+				break;
+
+			ret = intel_lr_context_set_sseu(ctx, engine, ctx_sseu);
+		}
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 5b089a8439ac..ecf0a3dabdec 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2757,6 +2757,61 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv)
 	}
 }
 
+int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
+			      struct intel_engine_cs *engine,
+			      union i915_gem_sseu sseu)
+{
+	struct drm_i915_private *dev_priv = ctx->i915;
+	struct intel_context *ce;
+	enum intel_engine_id id;
+	int ret;
+
+	lockdep_assert_held(&dev_priv->drm.struct_mutex);
+
+	if (ctx->__engine[engine->id].sseu.value == sseu.value)
+		return 0;
+
+	/*
+	 * We can only program this on render ring.
+	 */
+	ce = &ctx->__engine[RCS];
+
+	if (ce->pin_count) { /* Assume that the context is active! */
+		ret = i915_gem_switch_to_kernel_context(dev_priv);
+		if (ret)
+			return ret;
+
+		ret = i915_gem_wait_for_idle(dev_priv,
+					     I915_WAIT_INTERRUPTIBLE |
+					     I915_WAIT_LOCKED);
+		if (ret)
+			return ret;
+	}
+
+	if (ce->state) {
+		void *vaddr = i915_gem_object_pin_map(ce->state->obj, I915_MAP_WB);
+		u32 *regs;
+
+		if (IS_ERR(vaddr))
+			return PTR_ERR(vaddr);
+
+		regs = vaddr + LRC_STATE_PN * PAGE_SIZE;
+
+		regs[CTX_R_PWR_CLK_STATE + 1] =
+			make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu);
+		i915_gem_object_unpin_map(ce->state->obj);
+	}
+
+	/*
+	 * Apply the configuration to all engine. Our hardware doesn't
+	 * currently support different configurations for each engine.
+	 */
+	for_each_engine(engine, dev_priv, id)
+		ctx->__engine[id].sseu.value = sseu.value;
+
+	return 0;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/intel_lrc.c"
 #endif
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 4ec7d8dd13c8..9cf7ec4caa0a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -111,4 +111,8 @@ intel_lr_context_descriptor(struct i915_gem_context *ctx,
 	return to_intel_context(ctx, engine)->lrc_desc;
 }
 
+int intel_lr_context_set_sseu(struct i915_gem_context *ctx,
+			      struct intel_engine_cs *engine,
+			      union i915_gem_sseu sseu);
+
 #endif /* _INTEL_LRC_H_ */
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 7f5634ce8e88..24b90836ce1d 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1456,9 +1456,47 @@ struct drm_i915_gem_context_param {
 #define   I915_CONTEXT_MAX_USER_PRIORITY	1023 /* inclusive */
 #define   I915_CONTEXT_DEFAULT_PRIORITY		0
 #define   I915_CONTEXT_MIN_USER_PRIORITY	-1023 /* inclusive */
+	/*
+	 * When using the following param, value should be a pointer to
+	 * drm_i915_gem_context_param_sseu.
+	 */
+#define I915_CONTEXT_PARAM_SSEU		0x7
 	__u64 value;
 };
 
+struct drm_i915_gem_context_param_sseu {
+	/*
+	 * Engine class & instance to be configured or queried.
+	 */
+	__u32 class;
+	__u32 instance;
+
+	/*
+	 * Mask of slices to enable for the context. Valid values are a subset
+	 * of the bitmask value returned for I915_PARAM_SLICE_MASK.
+	 */
+	__u8 slice_mask;
+
+	/*
+	 * Mask of subslices to enable for the context. Valid values are a
+	 * subset of the bitmask value return by I915_PARAM_SUBSLICE_MASK.
+	 */
+	__u8 subslice_mask;
+
+	/*
+	 * Minimum/Maximum number of EUs to enable per subslice for the
+	 * context. min_eus_per_subslice must be inferior or equal to
+	 * max_eus_per_subslice.
+	 */
+	__u8 min_eus_per_subslice;
+	__u8 max_eus_per_subslice;
+
+	/*
+	 * Unused for now. Must be cleared to zero.
+	 */
+	__u32 rsvd;
+};
+
 enum drm_i915_oa_format {
 	I915_OA_FORMAT_A13 = 1,	    /* HSW only */
 	I915_OA_FORMAT_A29,	    /* HSW only */
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for drm/i915: per context slice/subslice powergating (rev2)
  2018-05-08 18:03 [PATCH v3 0/6] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (5 preceding siblings ...)
  2018-05-08 18:03 ` [PATCH v3 6/6] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
@ 2018-05-08 18:33 ` Patchwork
  2018-05-08 18:35 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2018-05-08 18:33 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: per context slice/subslice powergating (rev2)
URL   : https://patchwork.freedesktop.org/series/42285/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
adb11ec2e377 drm/i915: Program RPCS for Broadwell
48dc54799480 drm/i915: Record the sseu configuration per-context & engine
-:82: CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#82: FILE: drivers/gpu/drm/i915/i915_gem_context.c:336:
+		ce->sseu =i915_gem_sseu_from_device_sseu(

-:82: ERROR:SPACING: spaces required around that '=' (ctx:WxV)
#82: FILE: drivers/gpu/drm/i915/i915_gem_context.c:336:
+		ce->sseu =i915_gem_sseu_from_device_sseu(
 		         ^

-:144: ERROR:CODE_INDENT: code indent should use tabs where possible
#144: FILE: drivers/gpu/drm/i915/intel_lrc.c:2396:
+^I^I        GEN8_RPCS_SS_CNT_SHIFT;$

total: 2 errors, 0 warnings, 1 checks, 125 lines checked
5942e5428072 drm/i915/perf: simplify configure all context function
e4d6f1ff98c7 drm/i915: reprogram NOA muxes on context switch when using perf
-:204: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#204: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:149:
+#define   MI_BATCH_SECOND_LEVEL      (1<<22)
                                        ^

total: 0 errors, 0 warnings, 1 checks, 265 lines checked
43e11cb225e3 drm/i915: count powergating transitions per engine
6601a0636fd2 drm/i915: Expose RPCS (SSEU) configuration to userspace
-:25: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#25: 
	struct drm_i915_gem_context_param_sseu sseu = { .class = 0, instance = 0, };

-:170: CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#170: FILE: drivers/gpu/drm/i915/i915_gem_context.c:946:
+			ret = i915_gem_sseu_from_user_sseu(

total: 0 errors, 1 warnings, 1 checks, 235 lines checked

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* ✗ Fi.CI.SPARSE: warning for drm/i915: per context slice/subslice powergating (rev2)
  2018-05-08 18:03 [PATCH v3 0/6] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (6 preceding siblings ...)
  2018-05-08 18:33 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: per context slice/subslice powergating (rev2) Patchwork
@ 2018-05-08 18:35 ` Patchwork
  2018-05-08 18:51 ` ✓ Fi.CI.BAT: success " Patchwork
  2018-05-08 22:34 ` ✗ Fi.CI.IGT: failure " Patchwork
  9 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2018-05-08 18:35 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: per context slice/subslice powergating (rev2)
URL   : https://patchwork.freedesktop.org/series/42285/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Commit: drm/i915: Program RPCS for Broadwell
Okay!

Commit: drm/i915: Record the sseu configuration per-context & engine
Okay!

Commit: drm/i915/perf: simplify configure all context function
Okay!

Commit: drm/i915: reprogram NOA muxes on context switch when using perf
+drivers/gpu/drm/i915/i915_perf.c:1742:37: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3653:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3659:16: warning: expression using sizeof(void)

Commit: drm/i915: count powergating transitions per engine
Okay!

Commit: drm/i915: Expose RPCS (SSEU) configuration to userspace
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: per context slice/subslice powergating (rev2)
  2018-05-08 18:03 [PATCH v3 0/6] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (7 preceding siblings ...)
  2018-05-08 18:35 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2018-05-08 18:51 ` Patchwork
  2018-05-08 22:34 ` ✗ Fi.CI.IGT: failure " Patchwork
  9 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2018-05-08 18:51 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: per context slice/subslice powergating (rev2)
URL   : https://patchwork.freedesktop.org/series/42285/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4158 -> Patchwork_8949 =

== Summary - SUCCESS ==

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/42285/revisions/2/mbox/

== Known issues ==

  Here are the changes found in Patchwork_8949 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@kms_frontbuffer_tracking@basic:
      {fi-hsw-peppy}:     PASS -> DMESG-FAIL (fdo#102614, fdo#106103)

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a:
      fi-cnl-psr:         PASS -> DMESG-WARN (fdo#104951)

    
    ==== Possible fixes ====

    igt@gem_exec_suspend@basic-s3:
      fi-skl-guc:         FAIL (fdo#104699, fdo#105900) -> PASS

    igt@gem_exec_suspend@basic-s4-devices:
      fi-kbl-7500u:       DMESG-WARN (fdo#105128) -> PASS

    igt@gem_mmap_gtt@basic-small-bo-tiledx:
      fi-gdg-551:         FAIL (fdo#102575) -> PASS

    igt@kms_frontbuffer_tracking@basic:
      fi-hsw-4200u:       DMESG-FAIL (fdo#102614, fdo#106103) -> PASS

    
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  fdo#102575 https://bugs.freedesktop.org/show_bug.cgi?id=102575
  fdo#102614 https://bugs.freedesktop.org/show_bug.cgi?id=102614
  fdo#104699 https://bugs.freedesktop.org/show_bug.cgi?id=104699
  fdo#104951 https://bugs.freedesktop.org/show_bug.cgi?id=104951
  fdo#105128 https://bugs.freedesktop.org/show_bug.cgi?id=105128
  fdo#105900 https://bugs.freedesktop.org/show_bug.cgi?id=105900
  fdo#106103 https://bugs.freedesktop.org/show_bug.cgi?id=106103


== Participating hosts (40 -> 37) ==

  Additional (1): fi-byt-j1900 
  Missing    (4): fi-ctg-p8600 fi-ilk-m540 fi-byt-squawks fi-skl-6700hq 


== Build changes ==

    * Linux: CI_DRM_4158 -> Patchwork_8949

  CI_DRM_4158: b4cf5831333d423c2420f167111c03e4c1729672 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4465: 41eb85f918b02918787fc59d9cb5aab93b81f323 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_8949: 6601a0636fd2acc9eeb2b504f78aa0ce507544ca @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4465: 33e58d5583eb7ed3966a1b905f875a1dfa959f6b @ git://anongit.freedesktop.org/piglit


== Linux commits ==

6601a0636fd2 drm/i915: Expose RPCS (SSEU) configuration to userspace
43e11cb225e3 drm/i915: count powergating transitions per engine
e4d6f1ff98c7 drm/i915: reprogram NOA muxes on context switch when using perf
5942e5428072 drm/i915/perf: simplify configure all context function
48dc54799480 drm/i915: Record the sseu configuration per-context & engine
adb11ec2e377 drm/i915: Program RPCS for Broadwell

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8949/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* ✗ Fi.CI.IGT: failure for drm/i915: per context slice/subslice powergating (rev2)
  2018-05-08 18:03 [PATCH v3 0/6] drm/i915: per context slice/subslice powergating Lionel Landwerlin
                   ` (8 preceding siblings ...)
  2018-05-08 18:51 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2018-05-08 22:34 ` Patchwork
  9 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2018-05-08 22:34 UTC (permalink / raw)
  To: Lionel Landwerlin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: per context slice/subslice powergating (rev2)
URL   : https://patchwork.freedesktop.org/series/42285/
State : failure

== Summary ==

= CI Bug Log - changes from CI_DRM_4158_full -> Patchwork_8949_full =

== Summary - FAILURE ==

  Serious unknown changes coming with Patchwork_8949_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_8949_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/42285/revisions/2/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_8949_full:

  === IGT changes ===

    ==== Possible regressions ====

    igt@gem_ctx_param@invalid-param-set:
      shard-kbl:          PASS -> FAIL
      shard-hsw:          PASS -> FAIL +1
      shard-snb:          PASS -> FAIL +1
      shard-glk:          PASS -> FAIL
      shard-apl:          PASS -> FAIL

    
    ==== Warnings ====

    igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-onoff:
      shard-hsw:          SKIP -> PASS

    
== Known issues ==

  Here are the changes found in Patchwork_8949_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@gem_ppgtt@blt-vs-render-ctxn:
      shard-kbl:          PASS -> INCOMPLETE (fdo#106023, fdo#103665)

    igt@kms_flip@basic-flip-vs-wf_vblank:
      shard-hsw:          PASS -> FAIL (fdo#100368)

    igt@perf@create-destroy-userspace-config:
      shard-glk:          PASS -> INCOMPLETE (k.org#198133, fdo#103359) +13

    igt@perf@invalid-oa-metric-set-id:
      shard-apl:          PASS -> INCOMPLETE (fdo#103927) +12

    igt@perf@polling:
      shard-kbl:          PASS -> INCOMPLETE (fdo#103665) +13

    igt@prime_vgem@basic-fence-flip:
      shard-apl:          PASS -> FAIL (fdo#104008)

    
    ==== Possible fixes ====

    igt@kms_cursor_legacy@2x-long-flip-vs-cursor-legacy:
      shard-hsw:          FAIL (fdo#104873) -> PASS

    igt@kms_flip@2x-plain-flip-ts-check:
      shard-hsw:          FAIL (fdo#103928) -> PASS

    igt@kms_flip@absolute-wf_vblank-interruptible:
      shard-glk:          FAIL (fdo#106087) -> PASS

    igt@kms_flip@plain-flip-ts-check-interruptible:
      shard-glk:          FAIL (fdo#105312) -> PASS

    igt@kms_flip@wf_vblank-ts-check-interruptible:
      shard-kbl:          DMESG-WARN (fdo#105602, fdo#103558) -> PASS +4

    igt@kms_rotation_crc@primary-rotation-270:
      shard-apl:          FAIL (fdo#104724, fdo#103925) -> PASS +1

    igt@kms_setmode@basic:
      shard-apl:          FAIL (fdo#99912) -> PASS

    
    ==== Warnings ====

    igt@drv_selftest@live_gtt:
      shard-apl:          DMESG-FAIL -> INCOMPLETE (fdo#103927)

    
  fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
  fdo#103359 https://bugs.freedesktop.org/show_bug.cgi?id=103359
  fdo#103558 https://bugs.freedesktop.org/show_bug.cgi?id=103558
  fdo#103665 https://bugs.freedesktop.org/show_bug.cgi?id=103665
  fdo#103925 https://bugs.freedesktop.org/show_bug.cgi?id=103925
  fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927
  fdo#103928 https://bugs.freedesktop.org/show_bug.cgi?id=103928
  fdo#104008 https://bugs.freedesktop.org/show_bug.cgi?id=104008
  fdo#104724 https://bugs.freedesktop.org/show_bug.cgi?id=104724
  fdo#104873 https://bugs.freedesktop.org/show_bug.cgi?id=104873
  fdo#105312 https://bugs.freedesktop.org/show_bug.cgi?id=105312
  fdo#105602 https://bugs.freedesktop.org/show_bug.cgi?id=105602
  fdo#106023 https://bugs.freedesktop.org/show_bug.cgi?id=106023
  fdo#106087 https://bugs.freedesktop.org/show_bug.cgi?id=106087
  fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912
  k.org#198133 https://bugzilla.kernel.org/show_bug.cgi?id=198133


== Participating hosts (5 -> 5) ==

  No changes in participating hosts


== Build changes ==

    * Linux: CI_DRM_4158 -> Patchwork_8949

  CI_DRM_4158: b4cf5831333d423c2420f167111c03e4c1729672 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4465: 41eb85f918b02918787fc59d9cb5aab93b81f323 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_8949: 6601a0636fd2acc9eeb2b504f78aa0ce507544ca @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4465: 33e58d5583eb7ed3966a1b905f875a1dfa959f6b @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8949/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/6] drm/i915: reprogram NOA muxes on context switch when using perf
  2018-05-08 18:03 ` [PATCH v3 4/6] drm/i915: reprogram NOA muxes on context switch when using perf Lionel Landwerlin
@ 2018-05-09  8:59   ` Chris Wilson
  2018-05-09  9:05     ` Chris Wilson
                       ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Chris Wilson @ 2018-05-09  8:59 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2018-05-08 19:03:45)
> If some of the contexts submitting workloads to the GPU have been
> configured to shutdown slices/subslices, we might loose the NOA
> configurations written in the NOA muxes. We need to reprogram them
> when we detect a powergating configuration change.
> 
> In this change i915/perf is responsible for setting up a reprogramming
> batchbuffer which we execute just before the userspace submitted
> batchbuffer. We do this while preemption is still disable, only if
> needed. The decision to execute this reprogramming batchbuffer is made
> when we assign a request to an execlist port.
> 
> v2: Only reprogram when detecting configuration changes (Chris/Lionel)
> 
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h           |   6 ++
>  drivers/gpu/drm/i915/i915_perf.c          | 108 ++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_request.h       |   6 ++
>  drivers/gpu/drm/i915/intel_gpu_commands.h |   1 +
>  drivers/gpu/drm/i915/intel_lrc.c          |  59 +++++++++++-
>  drivers/gpu/drm/i915/intel_ringbuffer.c   |   2 +
>  drivers/gpu/drm/i915/intel_ringbuffer.h   |   2 +
>  7 files changed, 183 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 04e27806e581..07cdbfb4ad7a 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1461,6 +1461,12 @@ struct i915_perf_stream {
>          * @oa_config: The OA configuration used by the stream.
>          */
>         struct i915_oa_config *oa_config;
> +
> +       /**
> +        * @noa_reprogram_vma: A batchbuffer reprogramming the NOA muxes, used
> +        * after switching powergating configurations.
> +        */
> +       struct i915_vma *noa_reprogram_vma;
>  };
>  
>  /**
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 5b279a82445a..1b9e3d6a53a2 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1691,6 +1691,96 @@ static int gen8_emit_oa_config(struct i915_request *rq,
>         return 0;
>  }
>  
> +#define MAX_LRI_SIZE (125U)
> +
> +static u32 noa_reprogram_bb_size(struct i915_oa_config *oa_config)
> +{
> +       u32 n_lri;
> +
> +       /* Very unlikely but possible that we have no muxes to configure. */
> +       if (!oa_config->mux_regs_len)
> +               return 0;
> +
> +       n_lri = (oa_config->mux_regs_len / MAX_LRI_SIZE) +
> +               (oa_config->mux_regs_len % MAX_LRI_SIZE) != 0;
> +
> +       return n_lri * 4 + oa_config->mux_regs_len * 8 + /* MI_LOAD_REGISTER_IMMs */
> +               6 * 4 + /* PIPE_CONTROL */
> +               4; /* MI_BATCH_BUFFER_END */
> +}
> +
> +static int
> +alloc_noa_reprogram_bo(struct i915_perf_stream *stream)
> +{
> +       struct drm_i915_private *dev_priv = stream->dev_priv;
> +       struct i915_oa_config *oa_config = stream->oa_config;
> +       struct drm_i915_gem_object *bo;
> +       struct i915_vma *vma;
> +       u32 buffer_size;
> +       u32 *cs;
> +       int i, ret, n_loaded_regs;
> +
> +       buffer_size = ALIGN(noa_reprogram_bb_size(oa_config), PAGE_SIZE);
> +       if (buffer_size == 0)
> +               return 0;
> +
> +       bo = i915_gem_object_create(dev_priv, buffer_size);
> +       if (IS_ERR(bo)) {
> +               DRM_ERROR("Failed to allocate NOA reprogramming buffer\n");
> +               return PTR_ERR(bo);
> +       }
> +
> +       cs = i915_gem_object_pin_map(bo, I915_MAP_WB);
> +       if (IS_ERR(cs)) {
> +               ret = PTR_ERR(cs);
> +               goto err_unref_bo;
> +       }
> +
> +       n_loaded_regs = 0;
> +       for (i = 0; i < oa_config->mux_regs_len; i++) {
> +               if ((n_loaded_regs % MAX_LRI_SIZE) == 0) {
> +                       u32 n_lri = min(oa_config->mux_regs_len - n_loaded_regs,
> +                                       MAX_LRI_SIZE);
> +                       *cs++ = MI_LOAD_REGISTER_IMM(n_lri);
> +               }
> +
> +               *cs++ = i915_mmio_reg_offset(oa_config->mux_regs[i].addr);
> +               *cs++ = oa_config->mux_regs[i].value;
> +               n_loaded_regs++;
> +       }
> +
> +       cs = gen8_emit_pipe_control(cs, PIPE_CONTROL_MMIO_WRITE, 0);

What's the pc for? Isn't it a bit dangerous to request a mmio write to
to reg 0?

> +
> +       *cs++ = MI_BATCH_BUFFER_END;
> +
> +       i915_gem_object_unpin_map(bo);
> +
> +       ret = i915_gem_object_set_to_gtt_domain(bo, false);
> +       if (ret)
> +               goto err_unref_bo;
> +
> +       vma = i915_vma_instance(bo, &dev_priv->ggtt.base, NULL);
> +       if (IS_ERR(vma)) {
> +               ret = PTR_ERR(vma);
> +               goto err_unref_bo;
> +       }
> +
> +       ret = i915_vma_pin(vma, 0, 0, PIN_USER | PIN_GLOBAL);
> +       if (ret)
> +               goto err_unref_vma;

No GGTT access, so just PIN_USER for GPU access.

> +       stream->noa_reprogram_vma = vma;
> +
> +       return 0;
> +
> +err_unref_vma:
> +       i915_vma_put(vma);
> +err_unref_bo:
> +       i915_gem_object_put(bo);
> +
> +       return ret;
> +}
> +
>  static int gen8_switch_to_updated_kernel_context(struct drm_i915_private *dev_priv,
>                                                  const struct i915_oa_config *oa_config)
>  {
> @@ -2102,6 +2192,12 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>                 goto err_config;
>         }
>  
> +       ret = alloc_noa_reprogram_bo(stream);
> +       if (ret) {
> +               DRM_DEBUG("Enable to alloc NOA reprogramming BO\n");
> +               goto err_config;
> +       }
> +
>         /* PRM - observability performance counters:
>          *
>          *   OACONTROL, performance counter enable, note:
> @@ -2136,6 +2232,13 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>  
>         dev_priv->perf.oa.exclusive_stream = stream;
>  
> +       ret = dev_priv->perf.oa.ops.enable_metric_set(dev_priv,
> +                                                     stream->oa_config);
> +       if (ret)
> +               goto err_enable;
> +
> +       stream->ops = &i915_oa_stream_ops;
> +
>         mutex_unlock(&dev_priv->drm.struct_mutex);
>  
>         return 0;
> @@ -2497,6 +2600,11 @@ static void i915_perf_destroy_locked(struct i915_perf_stream *stream)
>         if (stream->ctx)
>                 i915_gem_context_put(stream->ctx);
>  
> +       if (stream->noa_reprogram_vma) {
> +               i915_vma_unpin(stream->noa_reprogram_vma);
> +               i915_gem_object_put(stream->noa_reprogram_vma->obj);

This will be unhappy due to the lack of active tracking.
i915_vma_unpin_and_release(&stream->noa_reprogram_vma).

Hmm, worse than that. It's not being tracked at all, so expect it to be
freed while still in use.

> +       }
> +
>         kfree(stream);
>  }
>  
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index eddbd4245cb3..3357ceb4bdb1 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -149,6 +149,12 @@ struct i915_request {
>         /** Preallocate space in the ring for the emitting the request */
>         u32 reserved_space;
>  
> +       /*
> +        * Pointer in the batchbuffer to where the i915/perf NOA reprogramming
> +        * can be inserted just before HW submission.
> +        */
> +       u32 *perf_prog_start;

Just u32 offset, no need for the extra space of a pointer here.
(Probably should just limit rings to 64k and pack the offsets into u16.)
Or have them as offset from head, and u8 dwords? Hmm.

I have cold shivers from the thought of more special locations inside
the batch. Not the first, and not the last, so I feel like there should
be a better way (or at least more consistent).

> +
>         /** Batch buffer related to this request if any (used for
>          * error state dump only).
>          */
> diff --git a/drivers/gpu/drm/i915/intel_gpu_commands.h b/drivers/gpu/drm/i915/intel_gpu_commands.h
> index 105e2a9e874a..09b0fded8b17 100644
> --- a/drivers/gpu/drm/i915/intel_gpu_commands.h
> +++ b/drivers/gpu/drm/i915/intel_gpu_commands.h
> @@ -146,6 +146,7 @@
>  #define   MI_BATCH_GTT             (2<<6) /* aliased with (1<<7) on gen4 */
>  #define MI_BATCH_BUFFER_START_GEN8     MI_INSTR(0x31, 1)
>  #define   MI_BATCH_RESOURCE_STREAMER (1<<10)
> +#define   MI_BATCH_SECOND_LEVEL      (1<<22)
>  
>  /*
>   * 3D instructions used by the kernel
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 1c08bd2be6c3..df4a7bb07eae 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -510,6 +510,38 @@ static bool can_merge_ctx(const struct i915_gem_context *prev,
>         return true;
>  }
>  
> +static void maybe_enable_noa_repgoram(struct i915_request *rq)
> +{
> +       struct intel_engine_cs *engine = rq->engine;
> +       struct drm_i915_private *dev_priv = engine->i915;
> +       struct i915_perf_stream *stream = dev_priv->perf.oa.exclusive_stream;

Could there be any more pointer chasing? </chandler>

> +       struct intel_context *ce;
> +       u32 *cs = rq->perf_prog_start;
> +
> +       /* Slice/subslice/EU powergating only matters on the RCS. */
> +       if (engine->id != RCS)
> +               return;
> +
> +       /* Nothing need to reprogram when perf isn't enabled. */
> +       if (!stream)
> +               return;
> +
> +       ce = &rq->ctx->__engine[engine->id];

rq->hw_context; but to_intel_context() until that patch lands.

> +
> +       /*
> +        * If the powergating configuration doesn't change, no need to
> +        * reprogram.
> +        */
> +       if (engine->last_sseu.value == ce->sseu.value)
> +               return;
> +
> +       *cs++ = MI_BATCH_BUFFER_START_GEN8 | MI_BATCH_SECOND_LEVEL;

You are not a second level batch. You are calling from the ring to a
global address of _0_.

> +       *cs++ = 0;

low 32bits = 0

> +       *cs++ = i915_ggtt_offset(stream->noa_reprogram_vma);

high 32bits = address you wanted. (order reversed)

> +
> +       engine->last_sseu = ce->sseu;
> +}
> +
>  static void port_assign(struct execlist_port *port, struct i915_request *rq)
>  {
>         GEM_BUG_ON(rq == port_request(port));
> @@ -517,6 +549,8 @@ static void port_assign(struct execlist_port *port, struct i915_request *rq)
>         if (port_isset(port))
>                 i915_request_put(port_request(port));
>  
> +       maybe_enable_noa_repgoram(rq);

How should this work for the guc? How should this work for amalgamated
contexts? Should this only be done for the first request in the sequence
and definitely not done for lite-restore.

> +
>         port_set(port, port_pack(i915_request_get(rq), port_count(port)));
>  }
>  
> @@ -1047,6 +1081,13 @@ static void execlists_submission_tasklet(unsigned long data)
>                             buf[2*head + 1] == execlists->preempt_complete_status) {
>                                 GEM_TRACE("%s preempt-idle\n", engine->name);
>  
> +                               /*
> +                                * Clear out the state of the sseu on the
> +                                * engine, as it's not clear what it will be
> +                                * after preemption.
> +                                */
> +                               memset(&engine->last_sseu, 0, sizeof(engine->last_sseu));

Oh, you are missing complete_preempt_context. But you also need to reset
after reset, so cancel_port_requests is also sensisble.

> +
>                                 execlists_cancel_port_requests(execlists);
>                                 execlists_unwind_incomplete_requests(execlists);
>  
> @@ -1937,10 +1978,26 @@ static int gen8_emit_bb_start(struct i915_request *rq,
>                 rq->ctx->ppgtt->pd_dirty_rings &= ~intel_engine_flag(rq->engine);
>         }
>  
> -       cs = intel_ring_begin(rq, 6);
> +       cs = intel_ring_begin(rq, rq->engine->id == RCS ? 10 : 6);
>         if (IS_ERR(cs))
>                 return PTR_ERR(cs);
>  
> +       if (rq->engine->id == RCS) {
> +               /*
> +                * Leave some instructions to be written with an
> +                * MI_BATCH_BUFFER_START to the i915/perf NOA reprogramming
> +                * batchbuffer. We only turn those MI_NOOP into
> +                * MI_BATCH_BUFFER_START when we detect a SSEU powergating
> +                * configuration change that might affect NOA. This is only
> +                * for the RCS.
> +                */
> +               rq->perf_prog_start = cs;
> +               *cs++ = MI_NOOP;
> +               *cs++ = MI_NOOP;
> +               *cs++ = MI_NOOP;
> +               *cs++ = MI_NOOP; /* Aligning to 2 dwords */
> +       }
> +
>         /*
>          * WaDisableCtxRestoreArbitration:bdw,chv
>          *
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 8f19349a6055..cde791234397 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -525,6 +525,8 @@ static int init_ring_common(struct intel_engine_cs *engine)
>         if (INTEL_GEN(dev_priv) > 2)
>                 I915_WRITE_MODE(engine, _MASKED_BIT_DISABLE(STOP_RING));
>  
> +       memset(&engine->last_sseu, 0, sizeof(engine->last_sseu));

Why only legacy submission?

> +
>  out:
>         intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
>  
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 010750e8ee44..2f8518e1a49f 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -343,6 +343,8 @@ struct intel_engine_cs {
>  
>         struct drm_i915_gem_object *default_state;
>  
> +       union i915_gem_sseu last_sseu;

The struct here is internal; debatable if it wants the GEM moniker for
being part of the GEM user interface.

Looks like this wanted a sefltest.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/6] drm/i915: reprogram NOA muxes on context switch when using perf
  2018-05-09  8:59   ` Chris Wilson
@ 2018-05-09  9:05     ` Chris Wilson
  2018-05-09  9:23       ` Lionel Landwerlin
  2018-05-09  9:12     ` Lionel Landwerlin
  2018-05-09 15:31     ` Lionel Landwerlin
  2 siblings, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2018-05-09  9:05 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Chris Wilson (2018-05-09 09:59:09)
> Quoting Lionel Landwerlin (2018-05-08 19:03:45)
> > If some of the contexts submitting workloads to the GPU have been
> > configured to shutdown slices/subslices, we might loose the NOA
> > configurations written in the NOA muxes. We need to reprogram them
> > when we detect a powergating configuration change.
> > 
> > In this change i915/perf is responsible for setting up a reprogramming
> > batchbuffer which we execute just before the userspace submitted
> > batchbuffer. We do this while preemption is still disable, only if
> > needed. The decision to execute this reprogramming batchbuffer is made
> > when we assign a request to an execlist port.
> > 
> > v2: Only reprogram when detecting configuration changes (Chris/Lionel)
> > 
> > Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h           |   6 ++
> >  drivers/gpu/drm/i915/i915_perf.c          | 108 ++++++++++++++++++++++
> >  drivers/gpu/drm/i915/i915_request.h       |   6 ++
> >  drivers/gpu/drm/i915/intel_gpu_commands.h |   1 +
> >  drivers/gpu/drm/i915/intel_lrc.c          |  59 +++++++++++-
> >  drivers/gpu/drm/i915/intel_ringbuffer.c   |   2 +
> >  drivers/gpu/drm/i915/intel_ringbuffer.h   |   2 +
> >  7 files changed, 183 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 04e27806e581..07cdbfb4ad7a 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1461,6 +1461,12 @@ struct i915_perf_stream {
> >          * @oa_config: The OA configuration used by the stream.
> >          */
> >         struct i915_oa_config *oa_config;
> > +
> > +       /**
> > +        * @noa_reprogram_vma: A batchbuffer reprogramming the NOA muxes, used
> > +        * after switching powergating configurations.
> > +        */
> > +       struct i915_vma *noa_reprogram_vma;
> >  };
> >  
> >  /**
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> > index 5b279a82445a..1b9e3d6a53a2 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -1691,6 +1691,96 @@ static int gen8_emit_oa_config(struct i915_request *rq,
> >         return 0;
> >  }
> >  
> > +#define MAX_LRI_SIZE (125U)
> > +
> > +static u32 noa_reprogram_bb_size(struct i915_oa_config *oa_config)
> > +{
> > +       u32 n_lri;
> > +
> > +       /* Very unlikely but possible that we have no muxes to configure. */
> > +       if (!oa_config->mux_regs_len)
> > +               return 0;
> > +
> > +       n_lri = (oa_config->mux_regs_len / MAX_LRI_SIZE) +
> > +               (oa_config->mux_regs_len % MAX_LRI_SIZE) != 0;
> > +
> > +       return n_lri * 4 + oa_config->mux_regs_len * 8 + /* MI_LOAD_REGISTER_IMMs */
> > +               6 * 4 + /* PIPE_CONTROL */
> > +               4; /* MI_BATCH_BUFFER_END */
> > +}
> > +
> > +static int
> > +alloc_noa_reprogram_bo(struct i915_perf_stream *stream)
> > +{
> > +       struct drm_i915_private *dev_priv = stream->dev_priv;
> > +       struct i915_oa_config *oa_config = stream->oa_config;
> > +       struct drm_i915_gem_object *bo;
> > +       struct i915_vma *vma;
> > +       u32 buffer_size;
> > +       u32 *cs;
> > +       int i, ret, n_loaded_regs;
> > +
> > +       buffer_size = ALIGN(noa_reprogram_bb_size(oa_config), PAGE_SIZE);
> > +       if (buffer_size == 0)
> > +               return 0;
> > +
> > +       bo = i915_gem_object_create(dev_priv, buffer_size);
> > +       if (IS_ERR(bo)) {
> > +               DRM_ERROR("Failed to allocate NOA reprogramming buffer\n");
> > +               return PTR_ERR(bo);
> > +       }
> > +
> > +       cs = i915_gem_object_pin_map(bo, I915_MAP_WB);
> > +       if (IS_ERR(cs)) {
> > +               ret = PTR_ERR(cs);
> > +               goto err_unref_bo;
> > +       }
> > +
> > +       n_loaded_regs = 0;
> > +       for (i = 0; i < oa_config->mux_regs_len; i++) {
> > +               if ((n_loaded_regs % MAX_LRI_SIZE) == 0) {
> > +                       u32 n_lri = min(oa_config->mux_regs_len - n_loaded_regs,
> > +                                       MAX_LRI_SIZE);
> > +                       *cs++ = MI_LOAD_REGISTER_IMM(n_lri);
> > +               }
> > +
> > +               *cs++ = i915_mmio_reg_offset(oa_config->mux_regs[i].addr);
> > +               *cs++ = oa_config->mux_regs[i].value;
> > +               n_loaded_regs++;
> > +       }
> > +
> > +       cs = gen8_emit_pipe_control(cs, PIPE_CONTROL_MMIO_WRITE, 0);
> 
> What's the pc for? Isn't it a bit dangerous to request a mmio write to
> to reg 0?
> 
> > +
> > +       *cs++ = MI_BATCH_BUFFER_END;
> > +
> > +       i915_gem_object_unpin_map(bo);
> > +
> > +       ret = i915_gem_object_set_to_gtt_domain(bo, false);
> > +       if (ret)
> > +               goto err_unref_bo;
> > +
> > +       vma = i915_vma_instance(bo, &dev_priv->ggtt.base, NULL);
> > +       if (IS_ERR(vma)) {
> > +               ret = PTR_ERR(vma);
> > +               goto err_unref_bo;
> > +       }
> > +
> > +       ret = i915_vma_pin(vma, 0, 0, PIN_USER | PIN_GLOBAL);
> > +       if (ret)
> > +               goto err_unref_vma;
> 
> No GGTT access, so just PIN_USER for GPU access.
> 
> > +       stream->noa_reprogram_vma = vma;
> > +
> > +       return 0;
> > +
> > +err_unref_vma:
> > +       i915_vma_put(vma);
> > +err_unref_bo:
> > +       i915_gem_object_put(bo);
> > +
> > +       return ret;
> > +}
> > +
> >  static int gen8_switch_to_updated_kernel_context(struct drm_i915_private *dev_priv,
> >                                                  const struct i915_oa_config *oa_config)
> >  {
> > @@ -2102,6 +2192,12 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
> >                 goto err_config;
> >         }
> >  
> > +       ret = alloc_noa_reprogram_bo(stream);
> > +       if (ret) {
> > +               DRM_DEBUG("Enable to alloc NOA reprogramming BO\n");
> > +               goto err_config;
> > +       }
> > +
> >         /* PRM - observability performance counters:
> >          *
> >          *   OACONTROL, performance counter enable, note:
> > @@ -2136,6 +2232,13 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
> >  
> >         dev_priv->perf.oa.exclusive_stream = stream;
> >  
> > +       ret = dev_priv->perf.oa.ops.enable_metric_set(dev_priv,
> > +                                                     stream->oa_config);
> > +       if (ret)
> > +               goto err_enable;
> > +
> > +       stream->ops = &i915_oa_stream_ops;
> > +
> >         mutex_unlock(&dev_priv->drm.struct_mutex);
> >  
> >         return 0;
> > @@ -2497,6 +2600,11 @@ static void i915_perf_destroy_locked(struct i915_perf_stream *stream)
> >         if (stream->ctx)
> >                 i915_gem_context_put(stream->ctx);
> >  
> > +       if (stream->noa_reprogram_vma) {
> > +               i915_vma_unpin(stream->noa_reprogram_vma);
> > +               i915_gem_object_put(stream->noa_reprogram_vma->obj);
> 
> This will be unhappy due to the lack of active tracking.
> i915_vma_unpin_and_release(&stream->noa_reprogram_vma).
> 
> Hmm, worse than that. It's not being tracked at all, so expect it to be
> freed while still in use.
> 
> > +       }
> > +
> >         kfree(stream);
> >  }
> >  
> > diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> > index eddbd4245cb3..3357ceb4bdb1 100644
> > --- a/drivers/gpu/drm/i915/i915_request.h
> > +++ b/drivers/gpu/drm/i915/i915_request.h
> > @@ -149,6 +149,12 @@ struct i915_request {
> >         /** Preallocate space in the ring for the emitting the request */
> >         u32 reserved_space;
> >  
> > +       /*
> > +        * Pointer in the batchbuffer to where the i915/perf NOA reprogramming
> > +        * can be inserted just before HW submission.
> > +        */
> > +       u32 *perf_prog_start;
> 
> Just u32 offset, no need for the extra space of a pointer here.
> (Probably should just limit rings to 64k and pack the offsets into u16.)
> Or have them as offset from head, and u8 dwords? Hmm.
> 
> I have cold shivers from the thought of more special locations inside
> the batch. Not the first, and not the last, so I feel like there should
> be a better way (or at least more consistent).
> 
> > +
> >         /** Batch buffer related to this request if any (used for
> >          * error state dump only).
> >          */
> > diff --git a/drivers/gpu/drm/i915/intel_gpu_commands.h b/drivers/gpu/drm/i915/intel_gpu_commands.h
> > index 105e2a9e874a..09b0fded8b17 100644
> > --- a/drivers/gpu/drm/i915/intel_gpu_commands.h
> > +++ b/drivers/gpu/drm/i915/intel_gpu_commands.h
> > @@ -146,6 +146,7 @@
> >  #define   MI_BATCH_GTT             (2<<6) /* aliased with (1<<7) on gen4 */
> >  #define MI_BATCH_BUFFER_START_GEN8     MI_INSTR(0x31, 1)
> >  #define   MI_BATCH_RESOURCE_STREAMER (1<<10)
> > +#define   MI_BATCH_SECOND_LEVEL      (1<<22)
> >  
> >  /*
> >   * 3D instructions used by the kernel
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index 1c08bd2be6c3..df4a7bb07eae 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -510,6 +510,38 @@ static bool can_merge_ctx(const struct i915_gem_context *prev,
> >         return true;
> >  }
> >  
> > +static void maybe_enable_noa_repgoram(struct i915_request *rq)
> > +{
> > +       struct intel_engine_cs *engine = rq->engine;
> > +       struct drm_i915_private *dev_priv = engine->i915;
> > +       struct i915_perf_stream *stream = dev_priv->perf.oa.exclusive_stream;
> 
> Could there be any more pointer chasing? </chandler>

Thinking about more about how to make this part cleaner, could we not
store the engine->noa_batch and then this all becomes

vma = engine->noa_batch;
if (vma)
	return;

Locking! Missed it in the first pass, but we are unlocked here... That
also ties into lifetimes.

I'm thinking more like ctx->noa_batch with that being pinned along with
the context. We need something along those lines to track the
locking/lifetime.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/6] drm/i915: reprogram NOA muxes on context switch when using perf
  2018-05-09  8:59   ` Chris Wilson
  2018-05-09  9:05     ` Chris Wilson
@ 2018-05-09  9:12     ` Lionel Landwerlin
  2018-05-09 15:31     ` Lionel Landwerlin
  2 siblings, 0 replies; 16+ messages in thread
From: Lionel Landwerlin @ 2018-05-09  9:12 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 428 bytes --]

On 09/05/18 09:59, Chris Wilson wrote:
>> +
>> +       *cs++ = MI_BATCH_BUFFER_START_GEN8 | MI_BATCH_SECOND_LEVEL;
> You are not a second level batch. You are calling from the ring to a
> global address of _0_.
>
>> +       *cs++ = 0;
> low 32bits = 0
>
>> +       *cs++ = i915_ggtt_offset(stream->noa_reprogram_vma);
> high 32bits = address you wanted. (order reversed)
>
How is this not hanging my machine is beyond me... :(


[-- Attachment #1.2: Type: text/html, Size: 1143 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/6] drm/i915: reprogram NOA muxes on context switch when using perf
  2018-05-09  9:05     ` Chris Wilson
@ 2018-05-09  9:23       ` Lionel Landwerlin
  0 siblings, 0 replies; 16+ messages in thread
From: Lionel Landwerlin @ 2018-05-09  9:23 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 763 bytes --]

On 09/05/18 10:05, Chris Wilson wrote:
>> Could there be any more pointer chasing? </chandler>
> Thinking about more about how to make this part cleaner, could we not
> store the engine->noa_batch and then this all becomes
>
> vma = engine->noa_batch;
> if (vma)
> 	return;
>
> Locking! Missed it in the first pass, but we are unlocked here... That
> also ties into lifetimes.
>
> I'm thinking more like ctx->noa_batch with that being pinned along with
> the context. We need something along those lines to track the
> locking/lifetime.
> -Chris
>
In another series tracking pid of submitted requests, I used "empty" 
requests to change the state of the engine.
I could use that approach to message-pass what needs to be used in the 
engine submission.

-
Lionel

[-- Attachment #1.2: Type: text/html, Size: 1253 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 4/6] drm/i915: reprogram NOA muxes on context switch when using perf
  2018-05-09  8:59   ` Chris Wilson
  2018-05-09  9:05     ` Chris Wilson
  2018-05-09  9:12     ` Lionel Landwerlin
@ 2018-05-09 15:31     ` Lionel Landwerlin
  2 siblings, 0 replies; 16+ messages in thread
From: Lionel Landwerlin @ 2018-05-09 15:31 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 09/05/18 09:59, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2018-05-08 19:03:45)
>> If some of the contexts submitting workloads to the GPU have been
>> configured to shutdown slices/subslices, we might loose the NOA
>> configurations written in the NOA muxes. We need to reprogram them
>> when we detect a powergating configuration change.
>>
>> In this change i915/perf is responsible for setting up a reprogramming
>> batchbuffer which we execute just before the userspace submitted
>> batchbuffer. We do this while preemption is still disable, only if
>> needed. The decision to execute this reprogramming batchbuffer is made
>> when we assign a request to an execlist port.
>>
>> v2: Only reprogram when detecting configuration changes (Chris/Lionel)
>>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h           |   6 ++
>>   drivers/gpu/drm/i915/i915_perf.c          | 108 ++++++++++++++++++++++
>>   drivers/gpu/drm/i915/i915_request.h       |   6 ++
>>   drivers/gpu/drm/i915/intel_gpu_commands.h |   1 +
>>   drivers/gpu/drm/i915/intel_lrc.c          |  59 +++++++++++-
>>   drivers/gpu/drm/i915/intel_ringbuffer.c   |   2 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.h   |   2 +
>>   7 files changed, 183 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index 04e27806e581..07cdbfb4ad7a 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -1461,6 +1461,12 @@ struct i915_perf_stream {
>>           * @oa_config: The OA configuration used by the stream.
>>           */
>>          struct i915_oa_config *oa_config;
>> +
>> +       /**
>> +        * @noa_reprogram_vma: A batchbuffer reprogramming the NOA muxes, used
>> +        * after switching powergating configurations.
>> +        */
>> +       struct i915_vma *noa_reprogram_vma;
>>   };
>>   
>>   /**
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> index 5b279a82445a..1b9e3d6a53a2 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -1691,6 +1691,96 @@ static int gen8_emit_oa_config(struct i915_request *rq,
>>          return 0;
>>   }
>>   
>> +#define MAX_LRI_SIZE (125U)
>> +
>> +static u32 noa_reprogram_bb_size(struct i915_oa_config *oa_config)
>> +{
>> +       u32 n_lri;
>> +
>> +       /* Very unlikely but possible that we have no muxes to configure. */
>> +       if (!oa_config->mux_regs_len)
>> +               return 0;
>> +
>> +       n_lri = (oa_config->mux_regs_len / MAX_LRI_SIZE) +
>> +               (oa_config->mux_regs_len % MAX_LRI_SIZE) != 0;
>> +
>> +       return n_lri * 4 + oa_config->mux_regs_len * 8 + /* MI_LOAD_REGISTER_IMMs */
>> +               6 * 4 + /* PIPE_CONTROL */
>> +               4; /* MI_BATCH_BUFFER_END */
>> +}
>> +
>> +static int
>> +alloc_noa_reprogram_bo(struct i915_perf_stream *stream)
>> +{
>> +       struct drm_i915_private *dev_priv = stream->dev_priv;
>> +       struct i915_oa_config *oa_config = stream->oa_config;
>> +       struct drm_i915_gem_object *bo;
>> +       struct i915_vma *vma;
>> +       u32 buffer_size;
>> +       u32 *cs;
>> +       int i, ret, n_loaded_regs;
>> +
>> +       buffer_size = ALIGN(noa_reprogram_bb_size(oa_config), PAGE_SIZE);
>> +       if (buffer_size == 0)
>> +               return 0;
>> +
>> +       bo = i915_gem_object_create(dev_priv, buffer_size);
>> +       if (IS_ERR(bo)) {
>> +               DRM_ERROR("Failed to allocate NOA reprogramming buffer\n");
>> +               return PTR_ERR(bo);
>> +       }
>> +
>> +       cs = i915_gem_object_pin_map(bo, I915_MAP_WB);
>> +       if (IS_ERR(cs)) {
>> +               ret = PTR_ERR(cs);
>> +               goto err_unref_bo;
>> +       }
>> +
>> +       n_loaded_regs = 0;
>> +       for (i = 0; i < oa_config->mux_regs_len; i++) {
>> +               if ((n_loaded_regs % MAX_LRI_SIZE) == 0) {
>> +                       u32 n_lri = min(oa_config->mux_regs_len - n_loaded_regs,
>> +                                       MAX_LRI_SIZE);
>> +                       *cs++ = MI_LOAD_REGISTER_IMM(n_lri);
>> +               }
>> +
>> +               *cs++ = i915_mmio_reg_offset(oa_config->mux_regs[i].addr);
>> +               *cs++ = oa_config->mux_regs[i].value;
>> +               n_loaded_regs++;
>> +       }
>> +
>> +       cs = gen8_emit_pipe_control(cs, PIPE_CONTROL_MMIO_WRITE, 0);
> What's the pc for? Isn't it a bit dangerous to request a mmio write to
> to reg 0?

Oops, I've been using this wrong :(
I thought it would flush pending MMIO writes.

>
>> +
>> +       *cs++ = MI_BATCH_BUFFER_END;
>> +
>> +       i915_gem_object_unpin_map(bo);
>> +
>> +       ret = i915_gem_object_set_to_gtt_domain(bo, false);
>> +       if (ret)
>> +               goto err_unref_bo;
>> +
>> +       vma = i915_vma_instance(bo, &dev_priv->ggtt.base, NULL);
>> +       if (IS_ERR(vma)) {
>> +               ret = PTR_ERR(vma);
>> +               goto err_unref_bo;
>> +       }
>> +
>> +       ret = i915_vma_pin(vma, 0, 0, PIN_USER | PIN_GLOBAL);
>> +       if (ret)
>> +               goto err_unref_vma;
> No GGTT access, so just PIN_USER for GPU access.

Okay. I just don't seem to understand any of the vma stuff...

>
>> +       stream->noa_reprogram_vma = vma;
>> +
>> +       return 0;
>> +
>> +err_unref_vma:
>> +       i915_vma_put(vma);
>> +err_unref_bo:
>> +       i915_gem_object_put(bo);
>> +
>> +       return ret;
>> +}
>> +
>>   static int gen8_switch_to_updated_kernel_context(struct drm_i915_private *dev_priv,
>>                                                   const struct i915_oa_config *oa_config)
>>   {
>> @@ -2102,6 +2192,12 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>>                  goto err_config;
>>          }
>>   
>> +       ret = alloc_noa_reprogram_bo(stream);
>> +       if (ret) {
>> +               DRM_DEBUG("Enable to alloc NOA reprogramming BO\n");
>> +               goto err_config;
>> +       }
>> +
>>          /* PRM - observability performance counters:
>>           *
>>           *   OACONTROL, performance counter enable, note:
>> @@ -2136,6 +2232,13 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>>   
>>          dev_priv->perf.oa.exclusive_stream = stream;
>>   
>> +       ret = dev_priv->perf.oa.ops.enable_metric_set(dev_priv,
>> +                                                     stream->oa_config);
>> +       if (ret)
>> +               goto err_enable;
>> +
>> +       stream->ops = &i915_oa_stream_ops;
>> +
>>          mutex_unlock(&dev_priv->drm.struct_mutex);
>>   
>>          return 0;
>> @@ -2497,6 +2600,11 @@ static void i915_perf_destroy_locked(struct i915_perf_stream *stream)
>>          if (stream->ctx)
>>                  i915_gem_context_put(stream->ctx);
>>   
>> +       if (stream->noa_reprogram_vma) {
>> +               i915_vma_unpin(stream->noa_reprogram_vma);
>> +               i915_gem_object_put(stream->noa_reprogram_vma->obj);
> This will be unhappy due to the lack of active tracking.
> i915_vma_unpin_and_release(&stream->noa_reprogram_vma).
>
> Hmm, worse than that. It's not being tracked at all, so expect it to be
> freed while still in use.

Moving to engine->noa_reprogram_vma.

I should be able to set this from i915/perf as there is a window where 
it idle the GPU before editing the window

>
>> +       }
>> +
>>          kfree(stream);
>>   }
>>   
>> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
>> index eddbd4245cb3..3357ceb4bdb1 100644
>> --- a/drivers/gpu/drm/i915/i915_request.h
>> +++ b/drivers/gpu/drm/i915/i915_request.h
>> @@ -149,6 +149,12 @@ struct i915_request {
>>          /** Preallocate space in the ring for the emitting the request */
>>          u32 reserved_space;
>>   
>> +       /*
>> +        * Pointer in the batchbuffer to where the i915/perf NOA reprogramming
>> +        * can be inserted just before HW submission.
>> +        */
>> +       u32 *perf_prog_start;
> Just u32 offset, no need for the extra space of a pointer here.
> (Probably should just limit rings to 64k and pack the offsets into u16.)
> Or have them as offset from head, and u8 dwords? Hmm.
>
> I have cold shivers from the thought of more special locations inside
> the batch. Not the first, and not the last, so I feel like there should
> be a better way (or at least more consistent).

Switching to an offset.

>
>> +
>>          /** Batch buffer related to this request if any (used for
>>           * error state dump only).
>>           */
>> diff --git a/drivers/gpu/drm/i915/intel_gpu_commands.h b/drivers/gpu/drm/i915/intel_gpu_commands.h
>> index 105e2a9e874a..09b0fded8b17 100644
>> --- a/drivers/gpu/drm/i915/intel_gpu_commands.h
>> +++ b/drivers/gpu/drm/i915/intel_gpu_commands.h
>> @@ -146,6 +146,7 @@
>>   #define   MI_BATCH_GTT             (2<<6) /* aliased with (1<<7) on gen4 */
>>   #define MI_BATCH_BUFFER_START_GEN8     MI_INSTR(0x31, 1)
>>   #define   MI_BATCH_RESOURCE_STREAMER (1<<10)
>> +#define   MI_BATCH_SECOND_LEVEL      (1<<22)
>>   
>>   /*
>>    * 3D instructions used by the kernel
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index 1c08bd2be6c3..df4a7bb07eae 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -510,6 +510,38 @@ static bool can_merge_ctx(const struct i915_gem_context *prev,
>>          return true;
>>   }
>>   
>> +static void maybe_enable_noa_repgoram(struct i915_request *rq)
>> +{
>> +       struct intel_engine_cs *engine = rq->engine;
>> +       struct drm_i915_private *dev_priv = engine->i915;
>> +       struct i915_perf_stream *stream = dev_priv->perf.oa.exclusive_stream;
> Could there be any more pointer chasing? </chandler>

Putting this into the engine.

>
>> +       struct intel_context *ce;
>> +       u32 *cs = rq->perf_prog_start;
>> +
>> +       /* Slice/subslice/EU powergating only matters on the RCS. */
>> +       if (engine->id != RCS)
>> +               return;
>> +
>> +       /* Nothing need to reprogram when perf isn't enabled. */
>> +       if (!stream)
>> +               return;
>> +
>> +       ce = &rq->ctx->__engine[engine->id];
> rq->hw_context; but to_intel_context() until that patch lands.

Moving sseu to the request so I can do the rpcs config through MI_LRI.

>
>> +
>> +       /*
>> +        * If the powergating configuration doesn't change, no need to
>> +        * reprogram.
>> +        */
>> +       if (engine->last_sseu.value == ce->sseu.value)
>> +               return;
>> +
>> +       *cs++ = MI_BATCH_BUFFER_START_GEN8 | MI_BATCH_SECOND_LEVEL;
> You are not a second level batch. You are calling from the ring to a
> global address of _0_.
>
>> +       *cs++ = 0;
> low 32bits = 0
>
>> +       *cs++ = i915_ggtt_offset(stream->noa_reprogram_vma);
> high 32bits = address you wanted. (order reversed)

Fixing....

>
>> +
>> +       engine->last_sseu = ce->sseu;
>> +}
>> +
>>   static void port_assign(struct execlist_port *port, struct i915_request *rq)
>>   {
>>          GEM_BUG_ON(rq == port_request(port));
>> @@ -517,6 +549,8 @@ static void port_assign(struct execlist_port *port, struct i915_request *rq)
>>          if (port_isset(port))
>>                  i915_request_put(port_request(port));
>>   
>> +       maybe_enable_noa_repgoram(rq);
> How should this work for the guc? How should this work for amalgamated
> contexts? Should this only be done for the first request in the sequence
> and definitely not done for lite-restore.

I'm assuming that GuC does not reorder requests submitted by i915.

>> +
>>          port_set(port, port_pack(i915_request_get(rq), port_count(port)));
>>   }
>>   
>> @@ -1047,6 +1081,13 @@ static void execlists_submission_tasklet(unsigned long data)
>>                              buf[2*head + 1] == execlists->preempt_complete_status) {
>>                                  GEM_TRACE("%s preempt-idle\n", engine->name);
>>   
>> +                               /*
>> +                                * Clear out the state of the sseu on the
>> +                                * engine, as it's not clear what it will be
>> +                                * after preemption.
>> +                                */
>> +                               memset(&engine->last_sseu, 0, sizeof(engine->last_sseu));
> Oh, you are missing complete_preempt_context. But you also need to reset
> after reset, so cancel_port_requests is also sensisble.

Thanks, putting it there.

>
>> +
>>                                  execlists_cancel_port_requests(execlists);
>>                                  execlists_unwind_incomplete_requests(execlists);
>>   
>> @@ -1937,10 +1978,26 @@ static int gen8_emit_bb_start(struct i915_request *rq,
>>                  rq->ctx->ppgtt->pd_dirty_rings &= ~intel_engine_flag(rq->engine);
>>          }
>>   
>> -       cs = intel_ring_begin(rq, 6);
>> +       cs = intel_ring_begin(rq, rq->engine->id == RCS ? 10 : 6);
>>          if (IS_ERR(cs))
>>                  return PTR_ERR(cs);
>>   
>> +       if (rq->engine->id == RCS) {
>> +               /*
>> +                * Leave some instructions to be written with an
>> +                * MI_BATCH_BUFFER_START to the i915/perf NOA reprogramming
>> +                * batchbuffer. We only turn those MI_NOOP into
>> +                * MI_BATCH_BUFFER_START when we detect a SSEU powergating
>> +                * configuration change that might affect NOA. This is only
>> +                * for the RCS.
>> +                */
>> +               rq->perf_prog_start = cs;
>> +               *cs++ = MI_NOOP;
>> +               *cs++ = MI_NOOP;
>> +               *cs++ = MI_NOOP;
>> +               *cs++ = MI_NOOP; /* Aligning to 2 dwords */
>> +       }
>> +
>>          /*
>>           * WaDisableCtxRestoreArbitration:bdw,chv
>>           *
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index 8f19349a6055..cde791234397 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -525,6 +525,8 @@ static int init_ring_common(struct intel_engine_cs *engine)
>>          if (INTEL_GEN(dev_priv) > 2)
>>                  I915_WRITE_MODE(engine, _MASKED_BIT_DISABLE(STOP_RING));
>>   
>> +       memset(&engine->last_sseu, 0, sizeof(engine->last_sseu));
> Why only legacy submission?

Fixing...

>
>> +
>>   out:
>>          intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
>>   
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> index 010750e8ee44..2f8518e1a49f 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> @@ -343,6 +343,8 @@ struct intel_engine_cs {
>>   
>>          struct drm_i915_gem_object *default_state;
>>   
>> +       union i915_gem_sseu last_sseu;
> The struct here is internal; debatable if it wants the GEM moniker for
> being part of the GEM user interface.
>
> Looks like this wanted a sefltest.
> -Chris
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-05-09 15:31 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-08 18:03 [PATCH v3 0/6] drm/i915: per context slice/subslice powergating Lionel Landwerlin
2018-05-08 18:03 ` [PATCH v3 1/6] drm/i915: Program RPCS for Broadwell Lionel Landwerlin
2018-05-08 18:03 ` [PATCH v3 2/6] drm/i915: Record the sseu configuration per-context & engine Lionel Landwerlin
2018-05-08 18:03 ` [PATCH v3 3/6] drm/i915/perf: simplify configure all context function Lionel Landwerlin
2018-05-08 18:03 ` [PATCH v3 4/6] drm/i915: reprogram NOA muxes on context switch when using perf Lionel Landwerlin
2018-05-09  8:59   ` Chris Wilson
2018-05-09  9:05     ` Chris Wilson
2018-05-09  9:23       ` Lionel Landwerlin
2018-05-09  9:12     ` Lionel Landwerlin
2018-05-09 15:31     ` Lionel Landwerlin
2018-05-08 18:03 ` [PATCH v3 5/6] drm/i915: count powergating transitions per engine Lionel Landwerlin
2018-05-08 18:03 ` [PATCH v3 6/6] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
2018-05-08 18:33 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: per context slice/subslice powergating (rev2) Patchwork
2018-05-08 18:35 ` ✗ Fi.CI.SPARSE: " Patchwork
2018-05-08 18:51 ` ✓ Fi.CI.BAT: success " Patchwork
2018-05-08 22:34 ` ✗ Fi.CI.IGT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.