All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] drm/i915: Engine relative MMIO
@ 2019-09-23 23:51 John.C.Harrison
  2019-09-23 23:51 ` [PATCH 1/2] " John.C.Harrison
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: John.C.Harrison @ 2019-09-23 23:51 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Newer hardware has support for making LRI accesses to MMIO registers
relative to the engine that is executing the LRI instruction. This is
required for things like hardware based load balancing across engines.

John Harrison (2):
  drm/i915: Engine relative MMIO
  drm/i915: Engine relative MMIO for Gen12

 drivers/gpu/drm/i915/gem/i915_gem_context.c  |  7 ++--
 drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 43 ++++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_engine_types.h |  4 ++
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  8 +++-
 drivers/gpu/drm/i915/gt/intel_lrc.c          | 40 +++++++++---------
 drivers/gpu/drm/i915/gt/intel_mocs.c         | 12 +++---
 drivers/gpu/drm/i915/gt/intel_ringbuffer.c   | 12 +++---
 drivers/gpu/drm/i915/i915_perf.c             |  9 +++-
 drivers/gpu/drm/i915/intel_device_info.c     | 14 +++++++
 drivers/gpu/drm/i915/intel_device_info.h     |  1 +
 10 files changed, 113 insertions(+), 37 deletions(-)

-- 
2.21.0.5.gaeb582a983

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] drm/i915: Engine relative MMIO
  2019-09-23 23:51 [PATCH 0/2] drm/i915: Engine relative MMIO John.C.Harrison
@ 2019-09-23 23:51 ` John.C.Harrison
  2019-09-24  8:45   ` Tvrtko Ursulin
  2019-09-24  9:16   ` Tvrtko Ursulin
  2019-09-23 23:51 ` [PATCH 2/2] drm/i915: Engine relative MMIO for Gen12 John.C.Harrison
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 11+ messages in thread
From: John.C.Harrison @ 2019-09-23 23:51 UTC (permalink / raw)
  To: Intel-GFX; +Cc: Chris P Wilson

From: John Harrison <John.C.Harrison@Intel.com>

With virtual engines, it is no longer possible to know which specific
physical engine a given request will be executed on at the time that
request is generated. This means that the request itself must be engine
agnostic - any direct register writes must be relative to the engine
and not absolute addresses.

The LRI command has support for engine relative addressing. However,
the mechanism is not transparent to the driver. The scheme for Gen11
(MI_LRI_ADD_CS_MMIO_START) requires the LRI address to have no
absolute engine base component. The hardware then adds on the correct
engine offset at execution time.

Due to the non-trivial and differing schemes on different hardware, it
is not possible to simply update the code that creates the LRI
commands to set a remap flag and let the hardware get on with it.
Instead, this patch adds function wrappers for generating the LRI
command itself and then for constructing the correct address to use
with the LRI.

v2: Fix build break in GVT. Remove flags parameter [review feedback
from Chris W].

v3: Fix build break in selftest. Rebase to newer base tree and fix
merge conflict.

v4: More rebasing. Rmoved relative addressing support from Gen7-9 only
code paths [review feedback from Chris W].

v5: More rebasing (new 'gt' directory). Fix white space issue. Use
COPY class rather than BCS0 id for checking against BCS engine.

v6: Change to MI_LRI for generic code and __MI_LRI for legacy code.
MI_LRI is now a macro wrapper around an engine function pointer to
reduce runtime checks on LRI relative support. [review feedback from
Tvrtko]

v7: Replace engine function pointer with a per engine flags field that
is added by a common function. [Daniele]

v8: Re-write on top of patch series by Mika.

v9: Fix workaround IGT failure. Not sure why but my local test runs
were passing by claiming no workarounds whereas the pre-commit check
failed. As all the workaround registers are currently engine
independent, it is safe to simply not add the engine relative flag
(which is what the selftest was expecting).

Bspec: 45606
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
CC: Rodrigo Vivi <rodrigo.vivi@intel.com>
CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
CC: Chris P Wilson <chris.p.wilson@intel.com>
CC: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c  |  7 ++--
 drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 24 ++++++++++++
 drivers/gpu/drm/i915/gt/intel_engine_types.h |  3 ++
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  7 +++-
 drivers/gpu/drm/i915/gt/intel_lrc.c          | 40 ++++++++++----------
 drivers/gpu/drm/i915/gt/intel_mocs.c         |  6 +--
 drivers/gpu/drm/i915/gt/intel_ringbuffer.c   | 12 +++---
 drivers/gpu/drm/i915/i915_perf.c             |  8 +++-
 8 files changed, 73 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 4a34c4f62065..c8d7075a481d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -980,7 +980,7 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
 {
 	struct i915_address_space *vm = rq->hw_context->vm;
 	struct intel_engine_cs *engine = rq->engine;
-	u32 base = engine->mmio_base;
+	u32 base = engine->lri_mmio_base;
 	u32 *cs;
 	int i;
 
@@ -992,7 +992,7 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
 		if (IS_ERR(cs))
 			return PTR_ERR(cs);
 
-		*cs++ = MI_LOAD_REGISTER_IMM(2);
+		*cs++ = MI_LOAD_REGISTER_IMM(2) | engine->lri_cmd_mode;
 
 		*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(base, 0));
 		*cs++ = upper_32_bits(pd_daddr);
@@ -1014,7 +1014,8 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
 		if (IS_ERR(cs))
 			return PTR_ERR(cs);
 
-		*cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) | MI_LRI_FORCE_POSTED;
+		*cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) |
+			MI_LRI_FORCE_POSTED | engine->lri_cmd_mode;
 		for (i = GEN8_3LVL_PDPES; i--; ) {
 			const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index f451d5076bde..5aa58e069806 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -236,6 +236,28 @@ static u32 __engine_mmio_base(struct drm_i915_private *i915,
 	return bases[i].base;
 }
 
+static bool i915_engine_has_relative_lri(const struct intel_engine_cs *engine)
+{
+	if (INTEL_GEN(engine->i915) < 11)
+		return false;
+
+	if (engine->class == COPY_ENGINE_CLASS)
+		return false;
+
+	return true;
+}
+
+static void lri_init(struct intel_engine_cs *engine)
+{
+	if (i915_engine_has_relative_lri(engine)) {
+		engine->lri_cmd_mode = MI_LRI_ADD_CS_MMIO_START_GEN11;
+		engine->lri_mmio_base = 0;
+	} else {
+		engine->lri_cmd_mode = 0;
+		engine->lri_mmio_base = engine->mmio_base;
+	}
+}
+
 static void __sprint_engine_name(struct intel_engine_cs *engine)
 {
 	/*
@@ -320,6 +342,8 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
 	/* Nothing to do here, execute in order of dependencies */
 	engine->schedule = NULL;
 
+	lri_init(engine);
+
 	seqlock_init(&engine->stats.lock);
 
 	ATOMIC_INIT_NOTIFIER_HEAD(&engine->context_status_notifier);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 943f0663837e..38f486f7f7e3 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -306,6 +306,9 @@ struct intel_engine_cs {
 	u32 context_size;
 	u32 mmio_base;
 
+	u32 lri_mmio_base;
+	u32 lri_cmd_mode;
+
 	u32 uabi_capabilities;
 
 	struct rb_node uabi_node;
diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index f78b13d74e17..bfb51d8d7ce2 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -133,11 +133,16 @@
  *   simply ignores the register load under certain conditions.
  * - One can actually load arbitrary many arbitrary registers: Simply issue x
  *   address/value pairs. Don't overdue it, though, x <= 2^4 must hold!
+ * - Newer hardware supports engine relative addressing but older hardware does
+ *   not. This is required for hw engine load balancing. Hence the MI_LRI
+ *   instruction itself is prefixed with '__' and should only be used on
+ *   legacy hardware code paths. Generic code must always use the MI_LRI
+ *   and i915_get_lri_reg() helper functions instead.
  */
 #define MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*(x)-1)
 /* Gen11+. addr = base + (ctx_restore ? offset & GENMASK(12,2) : offset) */
-#define   MI_LRI_CS_MMIO		(1<<19)
 #define   MI_LRI_FORCE_POSTED		(1<<12)
+#define   MI_LRI_ADD_CS_MMIO_START_GEN11	BIT(19)
 #define MI_STORE_REGISTER_MEM        MI_INSTR(0x24, 1)
 #define MI_STORE_REGISTER_MEM_GEN8   MI_INSTR(0x24, 2)
 #define   MI_SRM_LRM_GLOBAL_GTT		(1<<22)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 6cfdc0f9f2b9..7cd59d29c4e4 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2010,11 +2010,12 @@ struct lri {
 	u32 value;
 };
 
-static u32 *emit_lri(u32 *batch, const struct lri *lri, unsigned int count)
+static u32 *emit_lri(struct intel_engine_cs *engine, u32 *batch,
+		     const struct lri *lri, unsigned int count)
 {
 	GEM_BUG_ON(!count || count > 63);
 
-	*batch++ = MI_LOAD_REGISTER_IMM(count);
+	*batch++ = MI_LOAD_REGISTER_IMM(count) | engine->lri_cmd_mode;
 	do {
 		*batch++ = i915_mmio_reg_offset(lri->reg);
 		*batch++ = lri->value;
@@ -2054,7 +2055,7 @@ static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
 	/* WaFlushCoherentL3CacheLinesAtContextSwitch:skl,bxt,glk */
 	batch = gen8_emit_flush_coherentl3_wa(engine, batch);
 
-	batch = emit_lri(batch, lri, ARRAY_SIZE(lri));
+	batch = emit_lri(engine, batch, lri, ARRAY_SIZE(lri));
 
 	/* WaMediaPoolStateCmdInWABB:bxt,glk */
 	if (HAS_POOLED_EU(engine->i915)) {
@@ -3282,7 +3283,7 @@ static void init_common_reg_state(u32 * const regs,
 				  struct intel_engine_cs *engine,
 				  struct intel_ring *ring)
 {
-	const u32 base = engine->mmio_base;
+	const u32 base = engine->lri_mmio_base;
 
 	CTX_REG(regs, CTX_CONTEXT_CONTROL, RING_CONTEXT_CONTROL(base),
 		_MASKED_BIT_DISABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT) |
@@ -3307,7 +3308,7 @@ static void init_wa_bb_reg_state(u32 * const regs,
 				 u32 pos_bb_per_ctx)
 {
 	struct i915_ctx_workarounds * const wa_ctx = &engine->wa_ctx;
-	const u32 base = engine->mmio_base;
+	const u32 base = engine->lri_mmio_base;
 	const u32 pos_indirect_ctx = pos_bb_per_ctx + 2;
 	const u32 pos_indirect_ctx_offset = pos_indirect_ctx + 2;
 
@@ -3335,6 +3336,7 @@ static void init_wa_bb_reg_state(u32 * const regs,
 }
 
 static void init_ppgtt_reg_state(u32 *regs, u32 base,
+				 struct intel_engine_cs *engine,
 				 struct i915_ppgtt *ppgtt)
 {
 	/* PDP values well be assigned later if needed */
@@ -3376,14 +3378,12 @@ static void gen8_init_reg_state(u32 * const regs,
 {
 	struct i915_ppgtt * const ppgtt = vm_alias(ce->vm);
 	const bool rcs = engine->class == RENDER_CLASS;
-	const u32 base = engine->mmio_base;
-	const u32 lri_base =
-		intel_engine_has_relative_mmio(engine) ? MI_LRI_CS_MMIO : 0;
+	const u32 base = engine->lri_mmio_base;
 
 	regs[CTX_LRI_HEADER_0] =
 		MI_LOAD_REGISTER_IMM(rcs ? 14 : 11) |
 		MI_LRI_FORCE_POSTED |
-		lri_base;
+		engine->lri_cmd_mode;
 
 	init_common_reg_state(regs, ppgtt, engine, ring);
 	CTX_REG(regs, CTX_SECOND_BB_HEAD_U, RING_SBBADDR_UDW(base), 0);
@@ -3395,15 +3395,17 @@ static void gen8_init_reg_state(u32 * const regs,
 	regs[CTX_LRI_HEADER_1] =
 		MI_LOAD_REGISTER_IMM(9) |
 		MI_LRI_FORCE_POSTED |
-		lri_base;
+		engine->lri_cmd_mode;
 
 	CTX_REG(regs, CTX_CTX_TIMESTAMP, RING_CTX_TIMESTAMP(base), 0);
 
-	init_ppgtt_reg_state(regs, base, ppgtt);
+	init_ppgtt_reg_state(regs, base, engine, ppgtt);
 
 	if (rcs) {
-		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1) | lri_base;
-		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE, 0);
+		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1) |
+					 engine->lri_cmd_mode;
+		CTX_REG(regs, CTX_R_PWR_CLK_STATE,
+			GEN8_R_PWR_CLK_STATE, 0);
 	}
 
 	regs[CTX_END] = MI_BATCH_BUFFER_END;
@@ -3418,14 +3420,12 @@ static void gen12_init_reg_state(u32 * const regs,
 {
 	struct i915_ppgtt * const ppgtt = i915_vm_to_ppgtt(ce->vm);
 	const bool rcs = engine->class == RENDER_CLASS;
-	const u32 base = engine->mmio_base;
-	const u32 lri_base =
-		intel_engine_has_relative_mmio(engine) ? MI_LRI_CS_MMIO : 0;
+	const u32 base = engine->lri_mmio_base;
 
 	regs[CTX_LRI_HEADER_0] =
 		MI_LOAD_REGISTER_IMM(rcs ? 11 : 9) |
 		MI_LRI_FORCE_POSTED |
-		lri_base;
+		engine->lri_cmd_mode;
 
 	init_common_reg_state(regs, ppgtt, engine, ring);
 
@@ -3435,15 +3435,15 @@ static void gen12_init_reg_state(u32 * const regs,
 	regs[CTX_LRI_HEADER_1] =
 		MI_LOAD_REGISTER_IMM(9) |
 		MI_LRI_FORCE_POSTED |
-		lri_base;
+		engine->lri_cmd_mode;
 
 	CTX_REG(regs, CTX_CTX_TIMESTAMP, RING_CTX_TIMESTAMP(base), 0);
 
-	init_ppgtt_reg_state(regs, base, ppgtt);
+	init_ppgtt_reg_state(regs, base, engine, ppgtt);
 
 	if (rcs) {
 		regs[GEN12_CTX_LRI_HEADER_3] =
-			MI_LOAD_REGISTER_IMM(1) | lri_base;
+			MI_LOAD_REGISTER_IMM(1) | engine->lri_cmd_mode;
 		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE, 0);
 
 		/* TODO: oa_init_reg_state ? */
diff --git a/drivers/gpu/drm/i915/gt/intel_mocs.c b/drivers/gpu/drm/i915/gt/intel_mocs.c
index 728704bbbe18..8e8fe3deb95c 100644
--- a/drivers/gpu/drm/i915/gt/intel_mocs.c
+++ b/drivers/gpu/drm/i915/gt/intel_mocs.c
@@ -368,9 +368,6 @@ static u32 get_entry_control(const struct drm_i915_mocs_table *table,
 /**
  * intel_mocs_init_engine() - emit the mocs control table
  * @engine:	The engine for whom to emit the registers.
- *
- * This function simply emits a MI_LOAD_REGISTER_IMM command for the
- * given table starting at the given address.
  */
 void intel_mocs_init_engine(struct intel_engine_cs *engine)
 {
@@ -456,7 +453,8 @@ static int emit_mocs_control_table(struct i915_request *rq,
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
-	*cs++ = MI_LOAD_REGISTER_IMM(table->n_entries);
+	*cs++ = MI_LOAD_REGISTER_IMM(table->n_entries) |
+		rq->engine->lri_cmd_mode;
 
 	for (index = 0; index < table->size; index++) {
 		u32 value = get_entry_control(table, index);
diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
index 0747b8c9f768..7027367b1f1a 100644
--- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
@@ -1536,12 +1536,13 @@ static int load_pd_dir(struct i915_request *rq, const struct i915_ppgtt *ppgtt)
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
-	*cs++ = MI_LOAD_REGISTER_IMM(1);
-	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_DCLV(engine->mmio_base));
+	/* Can these not be merged into a single LRI??? */
+	*cs++ = MI_LOAD_REGISTER_IMM(1) | engine->lri_cmd_mode;
+	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_DCLV(engine->lri_mmio_base));
 	*cs++ = PP_DIR_DCLV_2G;
 
-	*cs++ = MI_LOAD_REGISTER_IMM(1);
-	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine->mmio_base));
+	*cs++ = MI_LOAD_REGISTER_IMM(1) | engine->lri_cmd_mode;
+	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine->lri_mmio_base));
 	*cs++ = px_base(ppgtt->pd)->ggtt_offset << 10;
 
 	intel_ring_advance(rq, cs);
@@ -1709,7 +1710,8 @@ static int remap_l3_slice(struct i915_request *rq, int slice)
 	 * here because no other code should access these registers other than
 	 * at initialization time.
 	 */
-	*cs++ = MI_LOAD_REGISTER_IMM(GEN7_L3LOG_SIZE/4);
+	*cs++ = MI_LOAD_REGISTER_IMM(GEN7_L3LOG_SIZE / 4) |
+		rq->engine->lri_cmd_mode;
 	for (i = 0; i < GEN7_L3LOG_SIZE/4; i++) {
 		*cs++ = i915_mmio_reg_offset(GEN7_L3LOG(slice, i));
 		*cs++ = remap_info[i];
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index c1b764233761..8b85cdfada21 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1693,6 +1693,11 @@ gen8_update_reg_state_unlocked(struct i915_perf_stream *stream,
 	};
 	int i;
 
+	/*
+	 * NB: The LRI instruction is generated by the hardware.
+	 * Should we read it in and assert that the offset flag is set?
+	 */
+
 	CTX_REG(reg_state, ctx_oactxctrl, GEN8_OACTXCONTROL,
 		(stream->period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) |
 		(stream->periodic ? GEN8_OA_TIMER_ENABLE : 0) |
@@ -1752,8 +1757,9 @@ gen8_load_flex(struct i915_request *rq,
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
-	*cs++ = MI_LOAD_REGISTER_IMM(count);
+	*cs++ = MI_LOAD_REGISTER_IMM(count) | rq->engine->lri_cmd_mode;
 	do {
+		/* FIXME: Is this table LRI remap/offset friendly? */
 		*cs++ = i915_mmio_reg_offset(flex->reg);
 		*cs++ = flex->value;
 	} while (flex++, --count);
-- 
2.21.0.5.gaeb582a983

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] drm/i915: Engine relative MMIO for Gen12
  2019-09-23 23:51 [PATCH 0/2] drm/i915: Engine relative MMIO John.C.Harrison
  2019-09-23 23:51 ` [PATCH 1/2] " John.C.Harrison
@ 2019-09-23 23:51 ` John.C.Harrison
  2019-09-24  9:12   ` Tvrtko Ursulin
  2019-09-24  0:25 ` ✓ Fi.CI.BAT: success for drm/i915: Engine relative MMIO (rev9) Patchwork
  2019-09-24 14:57 ` ✗ Fi.CI.IGT: failure " Patchwork
  3 siblings, 1 reply; 11+ messages in thread
From: John.C.Harrison @ 2019-09-23 23:51 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Gen12 introduces a completely new and different scheme for
implementing engine relative MMIO accesses - MI_LRI_MMIO_REMAP. This
requires using the base address of instance zero of the relevant
engine class. And then, it is only valid if the register in
question falls within a certain range as specified by a table.

v2: Add support for fused parts where instance zero of a given engine
class may be missing. Make aux_tables presence a device flag rather
than just hard coded. Pre-calculate the correct LRI base address
rather than using a per-instance base and then adding on a delta to
the correct base later. Make all the table range checking a debug only
feature - the theory is that all driver access should be within the
remap ranges. [Daniele]

v3: Re-base on Mika's changes. This removes all the abstraction layer.
Which also means there is no common code path that all LRI accesses go
via. Thus it is not possible to implement the range check. However, as
noted in v2, the range check should not be necessary anyway.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
CC: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 27 +++++++++++++++++---
 drivers/gpu/drm/i915/gt/intel_engine_types.h |  1 +
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  9 ++++---
 drivers/gpu/drm/i915/gt/intel_mocs.c         |  6 ++---
 drivers/gpu/drm/i915/i915_perf.c             |  3 ++-
 drivers/gpu/drm/i915/intel_device_info.c     | 14 ++++++++++
 drivers/gpu/drm/i915/intel_device_info.h     |  1 +
 7 files changed, 49 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 5aa58e069806..09fbbffce419 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -247,14 +247,32 @@ static bool i915_engine_has_relative_lri(const struct intel_engine_cs *engine)
 	return true;
 }
 
-static void lri_init(struct intel_engine_cs *engine)
+static void lri_init(struct intel_engine_cs *engine, u32 first_instance)
 {
 	if (i915_engine_has_relative_lri(engine)) {
-		engine->lri_cmd_mode = MI_LRI_ADD_CS_MMIO_START_GEN11;
-		engine->lri_mmio_base = 0;
+		if (INTEL_GEN(engine->i915) < 12) {
+			engine->lri_cmd_mode = MI_LRI_ADD_CS_MMIO_START_GEN11;
+			engine->lri_mmio_base = 0;//engine->mmio_base;
+			engine->lri_engine = engine;
+		} else {
+			engine->lri_cmd_mode = MI_LRI_MMIO_REMAP_GEN12;
+
+			engine->lri_engine = engine->gt->engine_class
+						[engine->class][first_instance];
+			GEM_BUG_ON(!engine->lri_engine);
+			engine->lri_mmio_base = engine->lri_engine->mmio_base;
+
+			/* According to the bspec, accesses should be compared
+			 * against the remap table and remapping only enabled
+			 * if found. In practice, all driver accesses should be
+			 * to remap registers. So, no need for the complication
+			 * of checking against any device specific tables.
+			 */
+		}
 	} else {
 		engine->lri_cmd_mode = 0;
 		engine->lri_mmio_base = engine->mmio_base;
+		engine->lri_engine = engine;
 	}
 }
 
@@ -342,7 +360,8 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
 	/* Nothing to do here, execute in order of dependencies */
 	engine->schedule = NULL;
 
-	lri_init(engine);
+	lri_init(engine,
+		 INTEL_INFO(engine->i915)->first_instance[engine->class]);
 
 	seqlock_init(&engine->stats.lock);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 38f486f7f7e3..62caa74e8558 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -308,6 +308,7 @@ struct intel_engine_cs {
 
 	u32 lri_mmio_base;
 	u32 lri_cmd_mode;
+	struct intel_engine_cs *lri_engine;
 
 	u32 uabi_capabilities;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index bfb51d8d7ce2..9c5be384026f 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -133,15 +133,16 @@
  *   simply ignores the register load under certain conditions.
  * - One can actually load arbitrary many arbitrary registers: Simply issue x
  *   address/value pairs. Don't overdue it, though, x <= 2^4 must hold!
- * - Newer hardware supports engine relative addressing but older hardware does
- *   not. This is required for hw engine load balancing. Hence the MI_LRI
- *   instruction itself is prefixed with '__' and should only be used on
- *   legacy hardware code paths. Generic code must always use the MI_LRI
+ * - Newer hardware supports engine relative addressing but using multiple
+ *   incompatible schemes. This is required for hw engine load balancing. Hence
+ *   the MI_LRI instruction itself is prefixed with '__' and should only be
+ *   used on legacy hardware code paths. Generic code must always use the MI_LRI
  *   and i915_get_lri_reg() helper functions instead.
  */
 #define MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*(x)-1)
 /* Gen11+. addr = base + (ctx_restore ? offset & GENMASK(12,2) : offset) */
 #define   MI_LRI_FORCE_POSTED		(1<<12)
+#define   MI_LRI_MMIO_REMAP_GEN12		BIT(17)
 #define   MI_LRI_ADD_CS_MMIO_START_GEN11	BIT(19)
 #define MI_STORE_REGISTER_MEM        MI_INSTR(0x24, 1)
 #define MI_STORE_REGISTER_MEM_GEN8   MI_INSTR(0x24, 2)
diff --git a/drivers/gpu/drm/i915/gt/intel_mocs.c b/drivers/gpu/drm/i915/gt/intel_mocs.c
index 8e8fe3deb95c..e121fea5b33c 100644
--- a/drivers/gpu/drm/i915/gt/intel_mocs.c
+++ b/drivers/gpu/drm/i915/gt/intel_mocs.c
@@ -438,7 +438,7 @@ static void intel_mocs_init_global(struct intel_gt *gt)
 static int emit_mocs_control_table(struct i915_request *rq,
 				   const struct drm_i915_mocs_table *table)
 {
-	enum intel_engine_id engine = rq->engine->id;
+	enum intel_engine_id engine_id = rq->engine->lri_engine->id;
 	unsigned int index;
 	u32 unused_value;
 	u32 *cs;
@@ -459,13 +459,13 @@ static int emit_mocs_control_table(struct i915_request *rq,
 	for (index = 0; index < table->size; index++) {
 		u32 value = get_entry_control(table, index);
 
-		*cs++ = i915_mmio_reg_offset(mocs_register(engine, index));
+		*cs++ = i915_mmio_reg_offset(mocs_register(engine_id, index));
 		*cs++ = value;
 	}
 
 	/* All remaining entries are also unused */
 	for (; index < table->n_entries; index++) {
-		*cs++ = i915_mmio_reg_offset(mocs_register(engine, index));
+		*cs++ = i915_mmio_reg_offset(mocs_register(engine_id, index));
 		*cs++ = unused_value;
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 8b85cdfada21..fd628ae12343 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1695,7 +1695,8 @@ gen8_update_reg_state_unlocked(struct i915_perf_stream *stream,
 
 	/*
 	 * NB: The LRI instruction is generated by the hardware.
-	 * Should we read it in and assert that the offset flag is set?
+	 * Should we read it in and assert that the appropriate
+	 * offset flag is set?
 	 */
 
 	CTX_REG(reg_state, ctx_oactxctrl, GEN8_OACTXCONTROL,
diff --git a/drivers/gpu/drm/i915/intel_device_info.c b/drivers/gpu/drm/i915/intel_device_info.c
index 85e480bdc673..dece012d3ad4 100644
--- a/drivers/gpu/drm/i915/intel_device_info.c
+++ b/drivers/gpu/drm/i915/intel_device_info.c
@@ -1030,6 +1030,7 @@ void intel_device_info_init_mmio(struct drm_i915_private *dev_priv)
 	u32 media_fuse;
 	u16 vdbox_mask;
 	u16 vebox_mask;
+	bool found;
 
 	if (INTEL_GEN(dev_priv) < 11)
 		return;
@@ -1040,6 +1041,7 @@ void intel_device_info_init_mmio(struct drm_i915_private *dev_priv)
 	vebox_mask = (media_fuse & GEN11_GT_VEBOX_DISABLE_MASK) >>
 		      GEN11_GT_VEBOX_DISABLE_SHIFT;
 
+	found = false;
 	for (i = 0; i < I915_MAX_VCS; i++) {
 		if (!HAS_ENGINE(dev_priv, _VCS(i))) {
 			vdbox_mask &= ~BIT(i);
@@ -1059,11 +1061,17 @@ void intel_device_info_init_mmio(struct drm_i915_private *dev_priv)
 		 */
 		if (IS_TIGERLAKE(dev_priv) || logical_vdbox++ % 2 == 0)
 			RUNTIME_INFO(dev_priv)->vdbox_sfc_access |= BIT(i);
+
+		if (!found) {
+			found = true;
+			info->first_instance[VIDEO_DECODE_CLASS] = i;
+		}
 	}
 	DRM_DEBUG_DRIVER("vdbox enable: %04x, instances: %04lx\n",
 			 vdbox_mask, VDBOX_MASK(dev_priv));
 	GEM_BUG_ON(vdbox_mask != VDBOX_MASK(dev_priv));
 
+	found = false;
 	for (i = 0; i < I915_MAX_VECS; i++) {
 		if (!HAS_ENGINE(dev_priv, _VECS(i))) {
 			vebox_mask &= ~BIT(i);
@@ -1073,6 +1081,12 @@ void intel_device_info_init_mmio(struct drm_i915_private *dev_priv)
 		if (!(BIT(i) & vebox_mask)) {
 			info->engine_mask &= ~BIT(_VECS(i));
 			DRM_DEBUG_DRIVER("vecs%u fused off\n", i);
+			continue;
+		}
+
+		if (!found) {
+			found = true;
+			info->first_instance[VIDEO_ENHANCEMENT_CLASS] = i;
 		}
 	}
 	DRM_DEBUG_DRIVER("vebox enable: %04x, instances: %04lx\n",
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index 0cdc2465534b..75a9950969fb 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -152,6 +152,7 @@ struct intel_device_info {
 	u8 gen;
 	u8 gt; /* GT number, 0 if undefined */
 	intel_engine_mask_t engine_mask; /* Engines supported by the HW */
+	u32 first_instance[MAX_ENGINE_CLASS]; /* instance 0 might be fused */
 
 	enum intel_platform platform;
 
-- 
2.21.0.5.gaeb582a983

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Engine relative MMIO (rev9)
  2019-09-23 23:51 [PATCH 0/2] drm/i915: Engine relative MMIO John.C.Harrison
  2019-09-23 23:51 ` [PATCH 1/2] " John.C.Harrison
  2019-09-23 23:51 ` [PATCH 2/2] drm/i915: Engine relative MMIO for Gen12 John.C.Harrison
@ 2019-09-24  0:25 ` Patchwork
  2019-09-24 14:57 ` ✗ Fi.CI.IGT: failure " Patchwork
  3 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2019-09-24  0:25 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Engine relative MMIO (rev9)
URL   : https://patchwork.freedesktop.org/series/57117/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_6945 -> Patchwork_14509
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/

Known issues
------------

  Here are the changes found in Patchwork_14509 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_mmap_gtt@basic-write-cpu-read-gtt:
    - fi-icl-u3:          [PASS][1] -> [DMESG-WARN][2] ([fdo#107724]) +1 similar issue
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/fi-icl-u3/igt@gem_mmap_gtt@basic-write-cpu-read-gtt.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/fi-icl-u3/igt@gem_mmap_gtt@basic-write-cpu-read-gtt.html

  
#### Possible fixes ####

  * igt@gem_mmap_gtt@basic-read-write-distinct:
    - fi-icl-u3:          [DMESG-WARN][3] ([fdo#107724]) -> [PASS][4] +1 similar issue
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/fi-icl-u3/igt@gem_mmap_gtt@basic-read-write-distinct.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/fi-icl-u3/igt@gem_mmap_gtt@basic-read-write-distinct.html

  * igt@i915_module_load@reload:
    - fi-icl-u3:          [DMESG-WARN][5] ([fdo#107724] / [fdo#111214]) -> [PASS][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/fi-icl-u3/igt@i915_module_load@reload.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/fi-icl-u3/igt@i915_module_load@reload.html

  * igt@i915_selftest@live_execlists:
    - fi-icl-u2:          [DMESG-FAIL][7] ([fdo#111108]) -> [PASS][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/fi-icl-u2/igt@i915_selftest@live_execlists.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/fi-icl-u2/igt@i915_selftest@live_execlists.html

  * igt@i915_selftest@live_hangcheck:
    - fi-kbl-7500u:       [INCOMPLETE][9] ([fdo#108744]) -> [PASS][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/fi-kbl-7500u/igt@i915_selftest@live_hangcheck.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/fi-kbl-7500u/igt@i915_selftest@live_hangcheck.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#107724]: https://bugs.freedesktop.org/show_bug.cgi?id=107724
  [fdo#108744]: https://bugs.freedesktop.org/show_bug.cgi?id=108744
  [fdo#111045]: https://bugs.freedesktop.org/show_bug.cgi?id=111045
  [fdo#111096]: https://bugs.freedesktop.org/show_bug.cgi?id=111096
  [fdo#111108]: https://bugs.freedesktop.org/show_bug.cgi?id=111108
  [fdo#111214]: https://bugs.freedesktop.org/show_bug.cgi?id=111214


Participating hosts (54 -> 45)
------------------------------

  Missing    (9): fi-ilk-m540 fi-tgl-u fi-hsw-4200u fi-tgl-u2 fi-byt-squawks fi-bsw-cyan fi-icl-y fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_6945 -> Patchwork_14509

  CI-20190529: 20190529
  CI_DRM_6945: f11d819264a3fab210498a4920ef34a891da39e0 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5197: aa534ff47fd2f455c8be9e59eae807695b87fcdd @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_14509: 4f9d4f7b382c34dfd5f4ced2e713a225f9745878 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

4f9d4f7b382c drm/i915: Engine relative MMIO for Gen12
d7d175dc43cd drm/i915: Engine relative MMIO

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] drm/i915: Engine relative MMIO
  2019-09-23 23:51 ` [PATCH 1/2] " John.C.Harrison
@ 2019-09-24  8:45   ` Tvrtko Ursulin
  2019-09-24 22:28     ` Rodrigo Vivi
  2019-09-24  9:16   ` Tvrtko Ursulin
  1 sibling, 1 reply; 11+ messages in thread
From: Tvrtko Ursulin @ 2019-09-24  8:45 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX; +Cc: Chris P Wilson


On 24/09/2019 00:51, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> With virtual engines, it is no longer possible to know which specific
> physical engine a given request will be executed on at the time that
> request is generated. This means that the request itself must be engine
> agnostic - any direct register writes must be relative to the engine
> and not absolute addresses.
> 
> The LRI command has support for engine relative addressing. However,
> the mechanism is not transparent to the driver. The scheme for Gen11
> (MI_LRI_ADD_CS_MMIO_START) requires the LRI address to have no
> absolute engine base component. The hardware then adds on the correct
> engine offset at execution time.
> 
> Due to the non-trivial and differing schemes on different hardware, it
> is not possible to simply update the code that creates the LRI
> commands to set a remap flag and let the hardware get on with it.
> Instead, this patch adds function wrappers for generating the LRI
> command itself and then for constructing the correct address to use
> with the LRI.
> 
> v2: Fix build break in GVT. Remove flags parameter [review feedback
> from Chris W].
> 
> v3: Fix build break in selftest. Rebase to newer base tree and fix
> merge conflict.
> 
> v4: More rebasing. Rmoved relative addressing support from Gen7-9 only
> code paths [review feedback from Chris W].
> 
> v5: More rebasing (new 'gt' directory). Fix white space issue. Use
> COPY class rather than BCS0 id for checking against BCS engine.
> 
> v6: Change to MI_LRI for generic code and __MI_LRI for legacy code.
> MI_LRI is now a macro wrapper around an engine function pointer to
> reduce runtime checks on LRI relative support. [review feedback from
> Tvrtko]
> 
> v7: Replace engine function pointer with a per engine flags field that
> is added by a common function. [Daniele]
> 
> v8: Re-write on top of patch series by Mika.
> 
> v9: Fix workaround IGT failure. Not sure why but my local test runs
> were passing by claiming no workarounds whereas the pre-commit check
> failed. As all the workaround registers are currently engine
> independent, it is safe to simply not add the engine relative flag
> (which is what the selftest was expecting).
> 
> Bspec: 45606
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> CC: Rodrigo Vivi <rodrigo.vivi@intel.com>
> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> CC: Chris P Wilson <chris.p.wilson@intel.com>
> CC: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_context.c  |  7 ++--
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 24 ++++++++++++
>   drivers/gpu/drm/i915/gt/intel_engine_types.h |  3 ++
>   drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  7 +++-
>   drivers/gpu/drm/i915/gt/intel_lrc.c          | 40 ++++++++++----------
>   drivers/gpu/drm/i915/gt/intel_mocs.c         |  6 +--
>   drivers/gpu/drm/i915/gt/intel_ringbuffer.c   | 12 +++---
>   drivers/gpu/drm/i915/i915_perf.c             |  8 +++-
>   8 files changed, 73 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 4a34c4f62065..c8d7075a481d 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -980,7 +980,7 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
>   {
>   	struct i915_address_space *vm = rq->hw_context->vm;
>   	struct intel_engine_cs *engine = rq->engine;
> -	u32 base = engine->mmio_base;
> +	u32 base = engine->lri_mmio_base;
>   	u32 *cs;
>   	int i;
>   
> @@ -992,7 +992,7 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
>   		if (IS_ERR(cs))
>   			return PTR_ERR(cs);
>   
> -		*cs++ = MI_LOAD_REGISTER_IMM(2);
> +		*cs++ = MI_LOAD_REGISTER_IMM(2) | engine->lri_cmd_mode;

I missed the part where we ended up with this new flavour and I can't 
say I am a fan.

What's the point in having to remember to or-in the flags at so many 
call sites? Are you now touching all mi_lri, or most, or how many? If 
all or most then why not MI_LOAD_REGISTER_IMM(engine, 2)? If not most 
then the same with _REL suffix as discussed before and leave the legacy 
untouched.

Don't know really on all the details since I lost track, but the current 
option leaves me concerned.

Regards,

Tvrtko

>   
>   		*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(base, 0));
>   		*cs++ = upper_32_bits(pd_daddr);
> @@ -1014,7 +1014,8 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
>   		if (IS_ERR(cs))
>   			return PTR_ERR(cs);
>   
> -		*cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) | MI_LRI_FORCE_POSTED;
> +		*cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) |
> +			MI_LRI_FORCE_POSTED | engine->lri_cmd_mode;
>   		for (i = GEN8_3LVL_PDPES; i--; ) {
>   			const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index f451d5076bde..5aa58e069806 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -236,6 +236,28 @@ static u32 __engine_mmio_base(struct drm_i915_private *i915,
>   	return bases[i].base;
>   }
>   
> +static bool i915_engine_has_relative_lri(const struct intel_engine_cs *engine)
> +{
> +	if (INTEL_GEN(engine->i915) < 11)
> +		return false;
> +
> +	if (engine->class == COPY_ENGINE_CLASS)
> +		return false;
> +
> +	return true;
> +}
> +
> +static void lri_init(struct intel_engine_cs *engine)
> +{
> +	if (i915_engine_has_relative_lri(engine)) {
> +		engine->lri_cmd_mode = MI_LRI_ADD_CS_MMIO_START_GEN11;
> +		engine->lri_mmio_base = 0;
> +	} else {
> +		engine->lri_cmd_mode = 0;
> +		engine->lri_mmio_base = engine->mmio_base;
> +	}
> +}
> +
>   static void __sprint_engine_name(struct intel_engine_cs *engine)
>   {
>   	/*
> @@ -320,6 +342,8 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
>   	/* Nothing to do here, execute in order of dependencies */
>   	engine->schedule = NULL;
>   
> +	lri_init(engine);
> +
>   	seqlock_init(&engine->stats.lock);
>   
>   	ATOMIC_INIT_NOTIFIER_HEAD(&engine->context_status_notifier);
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 943f0663837e..38f486f7f7e3 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -306,6 +306,9 @@ struct intel_engine_cs {
>   	u32 context_size;
>   	u32 mmio_base;
>   
> +	u32 lri_mmio_base;
> +	u32 lri_cmd_mode;
> +
>   	u32 uabi_capabilities;
>   
>   	struct rb_node uabi_node;
> diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> index f78b13d74e17..bfb51d8d7ce2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> @@ -133,11 +133,16 @@
>    *   simply ignores the register load under certain conditions.
>    * - One can actually load arbitrary many arbitrary registers: Simply issue x
>    *   address/value pairs. Don't overdue it, though, x <= 2^4 must hold!
> + * - Newer hardware supports engine relative addressing but older hardware does
> + *   not. This is required for hw engine load balancing. Hence the MI_LRI
> + *   instruction itself is prefixed with '__' and should only be used on
> + *   legacy hardware code paths. Generic code must always use the MI_LRI
> + *   and i915_get_lri_reg() helper functions instead.
>    */
>   #define MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*(x)-1)
>   /* Gen11+. addr = base + (ctx_restore ? offset & GENMASK(12,2) : offset) */
> -#define   MI_LRI_CS_MMIO		(1<<19)
>   #define   MI_LRI_FORCE_POSTED		(1<<12)
> +#define   MI_LRI_ADD_CS_MMIO_START_GEN11	BIT(19)
>   #define MI_STORE_REGISTER_MEM        MI_INSTR(0x24, 1)
>   #define MI_STORE_REGISTER_MEM_GEN8   MI_INSTR(0x24, 2)
>   #define   MI_SRM_LRM_GLOBAL_GTT		(1<<22)
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 6cfdc0f9f2b9..7cd59d29c4e4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -2010,11 +2010,12 @@ struct lri {
>   	u32 value;
>   };
>   
> -static u32 *emit_lri(u32 *batch, const struct lri *lri, unsigned int count)
> +static u32 *emit_lri(struct intel_engine_cs *engine, u32 *batch,
> +		     const struct lri *lri, unsigned int count)
>   {
>   	GEM_BUG_ON(!count || count > 63);
>   
> -	*batch++ = MI_LOAD_REGISTER_IMM(count);
> +	*batch++ = MI_LOAD_REGISTER_IMM(count) | engine->lri_cmd_mode;
>   	do {
>   		*batch++ = i915_mmio_reg_offset(lri->reg);
>   		*batch++ = lri->value;
> @@ -2054,7 +2055,7 @@ static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
>   	/* WaFlushCoherentL3CacheLinesAtContextSwitch:skl,bxt,glk */
>   	batch = gen8_emit_flush_coherentl3_wa(engine, batch);
>   
> -	batch = emit_lri(batch, lri, ARRAY_SIZE(lri));
> +	batch = emit_lri(engine, batch, lri, ARRAY_SIZE(lri));
>   
>   	/* WaMediaPoolStateCmdInWABB:bxt,glk */
>   	if (HAS_POOLED_EU(engine->i915)) {
> @@ -3282,7 +3283,7 @@ static void init_common_reg_state(u32 * const regs,
>   				  struct intel_engine_cs *engine,
>   				  struct intel_ring *ring)
>   {
> -	const u32 base = engine->mmio_base;
> +	const u32 base = engine->lri_mmio_base;
>   
>   	CTX_REG(regs, CTX_CONTEXT_CONTROL, RING_CONTEXT_CONTROL(base),
>   		_MASKED_BIT_DISABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT) |
> @@ -3307,7 +3308,7 @@ static void init_wa_bb_reg_state(u32 * const regs,
>   				 u32 pos_bb_per_ctx)
>   {
>   	struct i915_ctx_workarounds * const wa_ctx = &engine->wa_ctx;
> -	const u32 base = engine->mmio_base;
> +	const u32 base = engine->lri_mmio_base;
>   	const u32 pos_indirect_ctx = pos_bb_per_ctx + 2;
>   	const u32 pos_indirect_ctx_offset = pos_indirect_ctx + 2;
>   
> @@ -3335,6 +3336,7 @@ static void init_wa_bb_reg_state(u32 * const regs,
>   }
>   
>   static void init_ppgtt_reg_state(u32 *regs, u32 base,
> +				 struct intel_engine_cs *engine,
>   				 struct i915_ppgtt *ppgtt)
>   {
>   	/* PDP values well be assigned later if needed */
> @@ -3376,14 +3378,12 @@ static void gen8_init_reg_state(u32 * const regs,
>   {
>   	struct i915_ppgtt * const ppgtt = vm_alias(ce->vm);
>   	const bool rcs = engine->class == RENDER_CLASS;
> -	const u32 base = engine->mmio_base;
> -	const u32 lri_base =
> -		intel_engine_has_relative_mmio(engine) ? MI_LRI_CS_MMIO : 0;
> +	const u32 base = engine->lri_mmio_base;
>   
>   	regs[CTX_LRI_HEADER_0] =
>   		MI_LOAD_REGISTER_IMM(rcs ? 14 : 11) |
>   		MI_LRI_FORCE_POSTED |
> -		lri_base;
> +		engine->lri_cmd_mode;
>   
>   	init_common_reg_state(regs, ppgtt, engine, ring);
>   	CTX_REG(regs, CTX_SECOND_BB_HEAD_U, RING_SBBADDR_UDW(base), 0);
> @@ -3395,15 +3395,17 @@ static void gen8_init_reg_state(u32 * const regs,
>   	regs[CTX_LRI_HEADER_1] =
>   		MI_LOAD_REGISTER_IMM(9) |
>   		MI_LRI_FORCE_POSTED |
> -		lri_base;
> +		engine->lri_cmd_mode;
>   
>   	CTX_REG(regs, CTX_CTX_TIMESTAMP, RING_CTX_TIMESTAMP(base), 0);
>   
> -	init_ppgtt_reg_state(regs, base, ppgtt);
> +	init_ppgtt_reg_state(regs, base, engine, ppgtt);
>   
>   	if (rcs) {
> -		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1) | lri_base;
> -		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE, 0);
> +		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1) |
> +					 engine->lri_cmd_mode;
> +		CTX_REG(regs, CTX_R_PWR_CLK_STATE,
> +			GEN8_R_PWR_CLK_STATE, 0);
>   	}
>   
>   	regs[CTX_END] = MI_BATCH_BUFFER_END;
> @@ -3418,14 +3420,12 @@ static void gen12_init_reg_state(u32 * const regs,
>   {
>   	struct i915_ppgtt * const ppgtt = i915_vm_to_ppgtt(ce->vm);
>   	const bool rcs = engine->class == RENDER_CLASS;
> -	const u32 base = engine->mmio_base;
> -	const u32 lri_base =
> -		intel_engine_has_relative_mmio(engine) ? MI_LRI_CS_MMIO : 0;
> +	const u32 base = engine->lri_mmio_base;
>   
>   	regs[CTX_LRI_HEADER_0] =
>   		MI_LOAD_REGISTER_IMM(rcs ? 11 : 9) |
>   		MI_LRI_FORCE_POSTED |
> -		lri_base;
> +		engine->lri_cmd_mode;
>   
>   	init_common_reg_state(regs, ppgtt, engine, ring);
>   
> @@ -3435,15 +3435,15 @@ static void gen12_init_reg_state(u32 * const regs,
>   	regs[CTX_LRI_HEADER_1] =
>   		MI_LOAD_REGISTER_IMM(9) |
>   		MI_LRI_FORCE_POSTED |
> -		lri_base;
> +		engine->lri_cmd_mode;
>   
>   	CTX_REG(regs, CTX_CTX_TIMESTAMP, RING_CTX_TIMESTAMP(base), 0);
>   
> -	init_ppgtt_reg_state(regs, base, ppgtt);
> +	init_ppgtt_reg_state(regs, base, engine, ppgtt);
>   
>   	if (rcs) {
>   		regs[GEN12_CTX_LRI_HEADER_3] =
> -			MI_LOAD_REGISTER_IMM(1) | lri_base;
> +			MI_LOAD_REGISTER_IMM(1) | engine->lri_cmd_mode;
>   		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE, 0);
>   
>   		/* TODO: oa_init_reg_state ? */
> diff --git a/drivers/gpu/drm/i915/gt/intel_mocs.c b/drivers/gpu/drm/i915/gt/intel_mocs.c
> index 728704bbbe18..8e8fe3deb95c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_mocs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_mocs.c
> @@ -368,9 +368,6 @@ static u32 get_entry_control(const struct drm_i915_mocs_table *table,
>   /**
>    * intel_mocs_init_engine() - emit the mocs control table
>    * @engine:	The engine for whom to emit the registers.
> - *
> - * This function simply emits a MI_LOAD_REGISTER_IMM command for the
> - * given table starting at the given address.
>    */
>   void intel_mocs_init_engine(struct intel_engine_cs *engine)
>   {
> @@ -456,7 +453,8 @@ static int emit_mocs_control_table(struct i915_request *rq,
>   	if (IS_ERR(cs))
>   		return PTR_ERR(cs);
>   
> -	*cs++ = MI_LOAD_REGISTER_IMM(table->n_entries);
> +	*cs++ = MI_LOAD_REGISTER_IMM(table->n_entries) |
> +		rq->engine->lri_cmd_mode;
>   
>   	for (index = 0; index < table->size; index++) {
>   		u32 value = get_entry_control(table, index);
> diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
> index 0747b8c9f768..7027367b1f1a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
> @@ -1536,12 +1536,13 @@ static int load_pd_dir(struct i915_request *rq, const struct i915_ppgtt *ppgtt)
>   	if (IS_ERR(cs))
>   		return PTR_ERR(cs);
>   
> -	*cs++ = MI_LOAD_REGISTER_IMM(1);
> -	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_DCLV(engine->mmio_base));
> +	/* Can these not be merged into a single LRI??? */
> +	*cs++ = MI_LOAD_REGISTER_IMM(1) | engine->lri_cmd_mode;
> +	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_DCLV(engine->lri_mmio_base));
>   	*cs++ = PP_DIR_DCLV_2G;
>   
> -	*cs++ = MI_LOAD_REGISTER_IMM(1);
> -	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine->mmio_base));
> +	*cs++ = MI_LOAD_REGISTER_IMM(1) | engine->lri_cmd_mode;
> +	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine->lri_mmio_base));
>   	*cs++ = px_base(ppgtt->pd)->ggtt_offset << 10;
>   
>   	intel_ring_advance(rq, cs);
> @@ -1709,7 +1710,8 @@ static int remap_l3_slice(struct i915_request *rq, int slice)
>   	 * here because no other code should access these registers other than
>   	 * at initialization time.
>   	 */
> -	*cs++ = MI_LOAD_REGISTER_IMM(GEN7_L3LOG_SIZE/4);
> +	*cs++ = MI_LOAD_REGISTER_IMM(GEN7_L3LOG_SIZE / 4) |
> +		rq->engine->lri_cmd_mode;
>   	for (i = 0; i < GEN7_L3LOG_SIZE/4; i++) {
>   		*cs++ = i915_mmio_reg_offset(GEN7_L3LOG(slice, i));
>   		*cs++ = remap_info[i];
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index c1b764233761..8b85cdfada21 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1693,6 +1693,11 @@ gen8_update_reg_state_unlocked(struct i915_perf_stream *stream,
>   	};
>   	int i;
>   
> +	/*
> +	 * NB: The LRI instruction is generated by the hardware.
> +	 * Should we read it in and assert that the offset flag is set?
> +	 */
> +
>   	CTX_REG(reg_state, ctx_oactxctrl, GEN8_OACTXCONTROL,
>   		(stream->period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) |
>   		(stream->periodic ? GEN8_OA_TIMER_ENABLE : 0) |
> @@ -1752,8 +1757,9 @@ gen8_load_flex(struct i915_request *rq,
>   	if (IS_ERR(cs))
>   		return PTR_ERR(cs);
>   
> -	*cs++ = MI_LOAD_REGISTER_IMM(count);
> +	*cs++ = MI_LOAD_REGISTER_IMM(count) | rq->engine->lri_cmd_mode;
>   	do {
> +		/* FIXME: Is this table LRI remap/offset friendly? */
>   		*cs++ = i915_mmio_reg_offset(flex->reg);
>   		*cs++ = flex->value;
>   	} while (flex++, --count);
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] drm/i915: Engine relative MMIO for Gen12
  2019-09-23 23:51 ` [PATCH 2/2] drm/i915: Engine relative MMIO for Gen12 John.C.Harrison
@ 2019-09-24  9:12   ` Tvrtko Ursulin
  0 siblings, 0 replies; 11+ messages in thread
From: Tvrtko Ursulin @ 2019-09-24  9:12 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX


On 24/09/2019 00:51, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Gen12 introduces a completely new and different scheme for
> implementing engine relative MMIO accesses - MI_LRI_MMIO_REMAP. This
> requires using the base address of instance zero of the relevant
> engine class. And then, it is only valid if the register in
> question falls within a certain range as specified by a table.
> 
> v2: Add support for fused parts where instance zero of a given engine
> class may be missing. Make aux_tables presence a device flag rather
> than just hard coded. Pre-calculate the correct LRI base address
> rather than using a per-instance base and then adding on a delta to
> the correct base later. Make all the table range checking a debug only
> feature - the theory is that all driver access should be within the
> remap ranges. [Daniele]
> 
> v3: Re-base on Mika's changes. This removes all the abstraction layer.
> Which also means there is no common code path that all LRI accesses go
> via. Thus it is not possible to implement the range check. However, as
> noted in v2, the range check should not be necessary anyway.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> CC: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 27 +++++++++++++++++---
>   drivers/gpu/drm/i915/gt/intel_engine_types.h |  1 +
>   drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  9 ++++---
>   drivers/gpu/drm/i915/gt/intel_mocs.c         |  6 ++---
>   drivers/gpu/drm/i915/i915_perf.c             |  3 ++-
>   drivers/gpu/drm/i915/intel_device_info.c     | 14 ++++++++++
>   drivers/gpu/drm/i915/intel_device_info.h     |  1 +
>   7 files changed, 49 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 5aa58e069806..09fbbffce419 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -247,14 +247,32 @@ static bool i915_engine_has_relative_lri(const struct intel_engine_cs *engine)
>   	return true;
>   }
>   
> -static void lri_init(struct intel_engine_cs *engine)
> +static void lri_init(struct intel_engine_cs *engine, u32 first_instance)
>   {
>   	if (i915_engine_has_relative_lri(engine)) {
> -		engine->lri_cmd_mode = MI_LRI_ADD_CS_MMIO_START_GEN11;
> -		engine->lri_mmio_base = 0;
> +		if (INTEL_GEN(engine->i915) < 12) {
> +			engine->lri_cmd_mode = MI_LRI_ADD_CS_MMIO_START_GEN11;
> +			engine->lri_mmio_base = 0;//engine->mmio_base;

No C++ style comments please.

> +			engine->lri_engine = engine;
> +		} else {
> +			engine->lri_cmd_mode = MI_LRI_MMIO_REMAP_GEN12;
> +
> +			engine->lri_engine = engine->gt->engine_class
> +						[engine->class][first_instance];
> +			GEM_BUG_ON(!engine->lri_engine);
> +			engine->lri_mmio_base = engine->lri_engine->mmio_base;
> +
> +			/* According to the bspec, accesses should be compared

Preferred style:

/*
  * Blah
  */


> +			 * against the remap table and remapping only enabled
> +			 * if found. In practice, all driver accesses should be
> +			 * to remap registers. So, no need for the complication
> +			 * of checking against any device specific tables.
> +			 */
> +		}
>   	} else {
>   		engine->lri_cmd_mode = 0;
>   		engine->lri_mmio_base = engine->mmio_base;
> +		engine->lri_engine = engine;
>   	}

If it helps at all you could save one indentation level by doing:

if (!supports_lri) {

} else if (GEN == 11) {

} else {

}

>   }
>   
> @@ -342,7 +360,8 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
>   	/* Nothing to do here, execute in order of dependencies */
>   	engine->schedule = NULL;
>   
> -	lri_init(engine);
> +	lri_init(engine,
> +		 INTEL_INFO(engine->i915)->first_instance[engine->class]);
>   
>   	seqlock_init(&engine->stats.lock);
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 38f486f7f7e3..62caa74e8558 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -308,6 +308,7 @@ struct intel_engine_cs {
>   
>   	u32 lri_mmio_base;
>   	u32 lri_cmd_mode;
> +	struct intel_engine_cs *lri_engine;
>   
>   	u32 uabi_capabilities;
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> index bfb51d8d7ce2..9c5be384026f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> @@ -133,15 +133,16 @@
>    *   simply ignores the register load under certain conditions.
>    * - One can actually load arbitrary many arbitrary registers: Simply issue x
>    *   address/value pairs. Don't overdue it, though, x <= 2^4 must hold!
> - * - Newer hardware supports engine relative addressing but older hardware does
> - *   not. This is required for hw engine load balancing. Hence the MI_LRI
> - *   instruction itself is prefixed with '__' and should only be used on
> - *   legacy hardware code paths. Generic code must always use the MI_LRI
> + * - Newer hardware supports engine relative addressing but using multiple
> + *   incompatible schemes. This is required for hw engine load balancing. Hence
> + *   the MI_LRI instruction itself is prefixed with '__' and should only be

Is the double underscore reference out of date now?

> + *   used on legacy hardware code paths. Generic code must always use the MI_LRI
>    *   and i915_get_lri_reg() helper functions instead.
>    */
>   #define MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*(x)-1)
>   /* Gen11+. addr = base + (ctx_restore ? offset & GENMASK(12,2) : offset) */
>   #define   MI_LRI_FORCE_POSTED		(1<<12)
> +#define   MI_LRI_MMIO_REMAP_GEN12		BIT(17)
>   #define   MI_LRI_ADD_CS_MMIO_START_GEN11	BIT(19)
>   #define MI_STORE_REGISTER_MEM        MI_INSTR(0x24, 1)
>   #define MI_STORE_REGISTER_MEM_GEN8   MI_INSTR(0x24, 2)
> diff --git a/drivers/gpu/drm/i915/gt/intel_mocs.c b/drivers/gpu/drm/i915/gt/intel_mocs.c
> index 8e8fe3deb95c..e121fea5b33c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_mocs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_mocs.c
> @@ -438,7 +438,7 @@ static void intel_mocs_init_global(struct intel_gt *gt)
>   static int emit_mocs_control_table(struct i915_request *rq,
>   				   const struct drm_i915_mocs_table *table)
>   {
> -	enum intel_engine_id engine = rq->engine->id;
> +	enum intel_engine_id engine_id = rq->engine->lri_engine->id;

Observation: If you don't rename the local diff is smaller.

>   	unsigned int index;
>   	u32 unused_value;
>   	u32 *cs;
> @@ -459,13 +459,13 @@ static int emit_mocs_control_table(struct i915_request *rq,
>   	for (index = 0; index < table->size; index++) {
>   		u32 value = get_entry_control(table, index);
>   
> -		*cs++ = i915_mmio_reg_offset(mocs_register(engine, index));
> +		*cs++ = i915_mmio_reg_offset(mocs_register(engine_id, index));
>   		*cs++ = value;
>   	}
>   
>   	/* All remaining entries are also unused */
>   	for (; index < table->n_entries; index++) {
> -		*cs++ = i915_mmio_reg_offset(mocs_register(engine, index));
> +		*cs++ = i915_mmio_reg_offset(mocs_register(engine_id, index));
>   		*cs++ = unused_value;
>   	}
>   
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 8b85cdfada21..fd628ae12343 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1695,7 +1695,8 @@ gen8_update_reg_state_unlocked(struct i915_perf_stream *stream,
>   
>   	/*
>   	 * NB: The LRI instruction is generated by the hardware.
> -	 * Should we read it in and assert that the offset flag is set?
> +	 * Should we read it in and assert that the appropriate
> +	 * offset flag is set?
>   	 */
>   
>   	CTX_REG(reg_state, ctx_oactxctrl, GEN8_OACTXCONTROL,
> diff --git a/drivers/gpu/drm/i915/intel_device_info.c b/drivers/gpu/drm/i915/intel_device_info.c
> index 85e480bdc673..dece012d3ad4 100644
> --- a/drivers/gpu/drm/i915/intel_device_info.c
> +++ b/drivers/gpu/drm/i915/intel_device_info.c
> @@ -1030,6 +1030,7 @@ void intel_device_info_init_mmio(struct drm_i915_private *dev_priv)
>   	u32 media_fuse;
>   	u16 vdbox_mask;
>   	u16 vebox_mask;
> +	bool found;
>   
>   	if (INTEL_GEN(dev_priv) < 11)
>   		return;
> @@ -1040,6 +1041,7 @@ void intel_device_info_init_mmio(struct drm_i915_private *dev_priv)
>   	vebox_mask = (media_fuse & GEN11_GT_VEBOX_DISABLE_MASK) >>
>   		      GEN11_GT_VEBOX_DISABLE_SHIFT;
>   
> +	found = false;
>   	for (i = 0; i < I915_MAX_VCS; i++) {
>   		if (!HAS_ENGINE(dev_priv, _VCS(i))) {
>   			vdbox_mask &= ~BIT(i);
> @@ -1059,11 +1061,17 @@ void intel_device_info_init_mmio(struct drm_i915_private *dev_priv)
>   		 */
>   		if (IS_TIGERLAKE(dev_priv) || logical_vdbox++ % 2 == 0)
>   			RUNTIME_INFO(dev_priv)->vdbox_sfc_access |= BIT(i);
> +
> +		if (!found) {
> +			found = true;
> +			info->first_instance[VIDEO_DECODE_CLASS] = i;
> +		}
>   	}
>   	DRM_DEBUG_DRIVER("vdbox enable: %04x, instances: %04lx\n",
>   			 vdbox_mask, VDBOX_MASK(dev_priv));
>   	GEM_BUG_ON(vdbox_mask != VDBOX_MASK(dev_priv));
>   
> +	found = false;
>   	for (i = 0; i < I915_MAX_VECS; i++) {
>   		if (!HAS_ENGINE(dev_priv, _VECS(i))) {
>   			vebox_mask &= ~BIT(i);
> @@ -1073,6 +1081,12 @@ void intel_device_info_init_mmio(struct drm_i915_private *dev_priv)
>   		if (!(BIT(i) & vebox_mask)) {
>   			info->engine_mask &= ~BIT(_VECS(i));
>   			DRM_DEBUG_DRIVER("vecs%u fused off\n", i);
> +			continue;
> +		}
> +
> +		if (!found) {
> +			found = true;
> +			info->first_instance[VIDEO_ENHANCEMENT_CLASS] = i;
>   		}
>   	}
>   	DRM_DEBUG_DRIVER("vebox enable: %04x, instances: %04lx\n",
> diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
> index 0cdc2465534b..75a9950969fb 100644
> --- a/drivers/gpu/drm/i915/intel_device_info.h
> +++ b/drivers/gpu/drm/i915/intel_device_info.h
> @@ -152,6 +152,7 @@ struct intel_device_info {
>   	u8 gen;
>   	u8 gt; /* GT number, 0 if undefined */
>   	intel_engine_mask_t engine_mask; /* Engines supported by the HW */
> +	u32 first_instance[MAX_ENGINE_CLASS]; /* instance 0 might be fused */

u32 is overkill however do you even need to store this persistently? 
Looks like all that is needed at runtime is engine->lri_engine? So you 
could lookup the first instance from intel_engine_setup. Or if I missed 
something and you do need it at runtime, then should go to runtime info 
since we want to avoid dependencies on mkwrite_device_info where possible.

>   
>   	enum intel_platform platform;
>   
> 

Regards,

Tvrtko

P.S. Just a superficial scan, not a normal review.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] drm/i915: Engine relative MMIO
  2019-09-23 23:51 ` [PATCH 1/2] " John.C.Harrison
  2019-09-24  8:45   ` Tvrtko Ursulin
@ 2019-09-24  9:16   ` Tvrtko Ursulin
  1 sibling, 0 replies; 11+ messages in thread
From: Tvrtko Ursulin @ 2019-09-24  9:16 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX; +Cc: Chris P Wilson


On 24/09/2019 00:51, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> With virtual engines, it is no longer possible to know which specific
> physical engine a given request will be executed on at the time that
> request is generated. This means that the request itself must be engine
> agnostic - any direct register writes must be relative to the engine
> and not absolute addresses.
> 
> The LRI command has support for engine relative addressing. However,
> the mechanism is not transparent to the driver. The scheme for Gen11
> (MI_LRI_ADD_CS_MMIO_START) requires the LRI address to have no
> absolute engine base component. The hardware then adds on the correct
> engine offset at execution time.
> 
> Due to the non-trivial and differing schemes on different hardware, it
> is not possible to simply update the code that creates the LRI
> commands to set a remap flag and let the hardware get on with it.
> Instead, this patch adds function wrappers for generating the LRI
> command itself and then for constructing the correct address to use
> with the LRI.
> 
> v2: Fix build break in GVT. Remove flags parameter [review feedback
> from Chris W].
> 
> v3: Fix build break in selftest. Rebase to newer base tree and fix
> merge conflict.
> 
> v4: More rebasing. Rmoved relative addressing support from Gen7-9 only
> code paths [review feedback from Chris W].
> 
> v5: More rebasing (new 'gt' directory). Fix white space issue. Use
> COPY class rather than BCS0 id for checking against BCS engine.
> 
> v6: Change to MI_LRI for generic code and __MI_LRI for legacy code.
> MI_LRI is now a macro wrapper around an engine function pointer to
> reduce runtime checks on LRI relative support. [review feedback from
> Tvrtko]
> 
> v7: Replace engine function pointer with a per engine flags field that
> is added by a common function. [Daniele]
> 
> v8: Re-write on top of patch series by Mika.
> 
> v9: Fix workaround IGT failure. Not sure why but my local test runs
> were passing by claiming no workarounds whereas the pre-commit check
> failed. As all the workaround registers are currently engine
> independent, it is safe to simply not add the engine relative flag
> (which is what the selftest was expecting).
> 
> Bspec: 45606
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> CC: Rodrigo Vivi <rodrigo.vivi@intel.com>
> CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> CC: Chris P Wilson <chris.p.wilson@intel.com>
> CC: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_context.c  |  7 ++--
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 24 ++++++++++++
>   drivers/gpu/drm/i915/gt/intel_engine_types.h |  3 ++
>   drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  7 +++-
>   drivers/gpu/drm/i915/gt/intel_lrc.c          | 40 ++++++++++----------
>   drivers/gpu/drm/i915/gt/intel_mocs.c         |  6 +--
>   drivers/gpu/drm/i915/gt/intel_ringbuffer.c   | 12 +++---
>   drivers/gpu/drm/i915/i915_perf.c             |  8 +++-
>   8 files changed, 73 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 4a34c4f62065..c8d7075a481d 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -980,7 +980,7 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
>   {
>   	struct i915_address_space *vm = rq->hw_context->vm;
>   	struct intel_engine_cs *engine = rq->engine;
> -	u32 base = engine->mmio_base;
> +	u32 base = engine->lri_mmio_base;
>   	u32 *cs;
>   	int i;
>   
> @@ -992,7 +992,7 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
>   		if (IS_ERR(cs))
>   			return PTR_ERR(cs);
>   
> -		*cs++ = MI_LOAD_REGISTER_IMM(2);
> +		*cs++ = MI_LOAD_REGISTER_IMM(2) | engine->lri_cmd_mode;
>   
>   		*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(base, 0));
>   		*cs++ = upper_32_bits(pd_daddr);
> @@ -1014,7 +1014,8 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
>   		if (IS_ERR(cs))
>   			return PTR_ERR(cs);
>   
> -		*cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) | MI_LRI_FORCE_POSTED;
> +		*cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) |
> +			MI_LRI_FORCE_POSTED | engine->lri_cmd_mode;
>   		for (i = GEN8_3LVL_PDPES; i--; ) {
>   			const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index f451d5076bde..5aa58e069806 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -236,6 +236,28 @@ static u32 __engine_mmio_base(struct drm_i915_private *i915,
>   	return bases[i].base;
>   }
>   
> +static bool i915_engine_has_relative_lri(const struct intel_engine_cs *engine)
> +{
> +	if (INTEL_GEN(engine->i915) < 11)
> +		return false;
> +
> +	if (engine->class == COPY_ENGINE_CLASS)
> +		return false;
> +
> +	return true;
> +}
> +
> +static void lri_init(struct intel_engine_cs *engine)
> +{
> +	if (i915_engine_has_relative_lri(engine)) {
> +		engine->lri_cmd_mode = MI_LRI_ADD_CS_MMIO_START_GEN11;
> +		engine->lri_mmio_base = 0;
> +	} else {
> +		engine->lri_cmd_mode = 0;
> +		engine->lri_mmio_base = engine->mmio_base;
> +	}
> +}
> +
>   static void __sprint_engine_name(struct intel_engine_cs *engine)
>   {
>   	/*
> @@ -320,6 +342,8 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
>   	/* Nothing to do here, execute in order of dependencies */
>   	engine->schedule = NULL;
>   
> +	lri_init(engine);
> +
>   	seqlock_init(&engine->stats.lock);
>   
>   	ATOMIC_INIT_NOTIFIER_HEAD(&engine->context_status_notifier);
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 943f0663837e..38f486f7f7e3 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -306,6 +306,9 @@ struct intel_engine_cs {
>   	u32 context_size;
>   	u32 mmio_base;
>   
> +	u32 lri_mmio_base;
> +	u32 lri_cmd_mode;
> +
>   	u32 uabi_capabilities;
>   
>   	struct rb_node uabi_node;
> diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> index f78b13d74e17..bfb51d8d7ce2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> @@ -133,11 +133,16 @@
>    *   simply ignores the register load under certain conditions.
>    * - One can actually load arbitrary many arbitrary registers: Simply issue x
>    *   address/value pairs. Don't overdue it, though, x <= 2^4 must hold!
> + * - Newer hardware supports engine relative addressing but older hardware does
> + *   not. This is required for hw engine load balancing. Hence the MI_LRI
> + *   instruction itself is prefixed with '__' and should only be used on
> + *   legacy hardware code paths. Generic code must always use the MI_LRI
> + *   and i915_get_lri_reg() helper functions instead.
>    */
>   #define MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*(x)-1)
>   /* Gen11+. addr = base + (ctx_restore ? offset & GENMASK(12,2) : offset) */
> -#define   MI_LRI_CS_MMIO		(1<<19)
>   #define   MI_LRI_FORCE_POSTED		(1<<12)
> +#define   MI_LRI_ADD_CS_MMIO_START_GEN11	BIT(19)
>   #define MI_STORE_REGISTER_MEM        MI_INSTR(0x24, 1)
>   #define MI_STORE_REGISTER_MEM_GEN8   MI_INSTR(0x24, 2)
>   #define   MI_SRM_LRM_GLOBAL_GTT		(1<<22)
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 6cfdc0f9f2b9..7cd59d29c4e4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -2010,11 +2010,12 @@ struct lri {
>   	u32 value;
>   };
>   
> -static u32 *emit_lri(u32 *batch, const struct lri *lri, unsigned int count)
> +static u32 *emit_lri(struct intel_engine_cs *engine, u32 *batch,
> +		     const struct lri *lri, unsigned int count)
>   {
>   	GEM_BUG_ON(!count || count > 63);
>   
> -	*batch++ = MI_LOAD_REGISTER_IMM(count);
> +	*batch++ = MI_LOAD_REGISTER_IMM(count) | engine->lri_cmd_mode;
>   	do {
>   		*batch++ = i915_mmio_reg_offset(lri->reg);
>   		*batch++ = lri->value;
> @@ -2054,7 +2055,7 @@ static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
>   	/* WaFlushCoherentL3CacheLinesAtContextSwitch:skl,bxt,glk */
>   	batch = gen8_emit_flush_coherentl3_wa(engine, batch);
>   
> -	batch = emit_lri(batch, lri, ARRAY_SIZE(lri));
> +	batch = emit_lri(engine, batch, lri, ARRAY_SIZE(lri));
>   
>   	/* WaMediaPoolStateCmdInWABB:bxt,glk */
>   	if (HAS_POOLED_EU(engine->i915)) {
> @@ -3282,7 +3283,7 @@ static void init_common_reg_state(u32 * const regs,
>   				  struct intel_engine_cs *engine,
>   				  struct intel_ring *ring)
>   {
> -	const u32 base = engine->mmio_base;
> +	const u32 base = engine->lri_mmio_base;
>   
>   	CTX_REG(regs, CTX_CONTEXT_CONTROL, RING_CONTEXT_CONTROL(base),
>   		_MASKED_BIT_DISABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT) |
> @@ -3307,7 +3308,7 @@ static void init_wa_bb_reg_state(u32 * const regs,
>   				 u32 pos_bb_per_ctx)
>   {
>   	struct i915_ctx_workarounds * const wa_ctx = &engine->wa_ctx;
> -	const u32 base = engine->mmio_base;
> +	const u32 base = engine->lri_mmio_base;
>   	const u32 pos_indirect_ctx = pos_bb_per_ctx + 2;
>   	const u32 pos_indirect_ctx_offset = pos_indirect_ctx + 2;
>   
> @@ -3335,6 +3336,7 @@ static void init_wa_bb_reg_state(u32 * const regs,
>   }
>   
>   static void init_ppgtt_reg_state(u32 *regs, u32 base,
> +				 struct intel_engine_cs *engine,
>   				 struct i915_ppgtt *ppgtt)
>   {
>   	/* PDP values well be assigned later if needed */
> @@ -3376,14 +3378,12 @@ static void gen8_init_reg_state(u32 * const regs,
>   {
>   	struct i915_ppgtt * const ppgtt = vm_alias(ce->vm);
>   	const bool rcs = engine->class == RENDER_CLASS;
> -	const u32 base = engine->mmio_base;
> -	const u32 lri_base =
> -		intel_engine_has_relative_mmio(engine) ? MI_LRI_CS_MMIO : 0;
> +	const u32 base = engine->lri_mmio_base;
>   
>   	regs[CTX_LRI_HEADER_0] =
>   		MI_LOAD_REGISTER_IMM(rcs ? 14 : 11) |
>   		MI_LRI_FORCE_POSTED |
> -		lri_base;
> +		engine->lri_cmd_mode;
>   
>   	init_common_reg_state(regs, ppgtt, engine, ring);
>   	CTX_REG(regs, CTX_SECOND_BB_HEAD_U, RING_SBBADDR_UDW(base), 0);
> @@ -3395,15 +3395,17 @@ static void gen8_init_reg_state(u32 * const regs,
>   	regs[CTX_LRI_HEADER_1] =
>   		MI_LOAD_REGISTER_IMM(9) |
>   		MI_LRI_FORCE_POSTED |
> -		lri_base;
> +		engine->lri_cmd_mode;
>   
>   	CTX_REG(regs, CTX_CTX_TIMESTAMP, RING_CTX_TIMESTAMP(base), 0);
>   
> -	init_ppgtt_reg_state(regs, base, ppgtt);
> +	init_ppgtt_reg_state(regs, base, engine, ppgtt);

Is passed in base always equal to engine->mmio_base? If so then there is 
redundancy and you could drop this change.

Regards,

Tvrtko

>   
>   	if (rcs) {
> -		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1) | lri_base;
> -		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE, 0);
> +		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1) |
> +					 engine->lri_cmd_mode;
> +		CTX_REG(regs, CTX_R_PWR_CLK_STATE,
> +			GEN8_R_PWR_CLK_STATE, 0);
>   	}
>   
>   	regs[CTX_END] = MI_BATCH_BUFFER_END;
> @@ -3418,14 +3420,12 @@ static void gen12_init_reg_state(u32 * const regs,
>   {
>   	struct i915_ppgtt * const ppgtt = i915_vm_to_ppgtt(ce->vm);
>   	const bool rcs = engine->class == RENDER_CLASS;
> -	const u32 base = engine->mmio_base;
> -	const u32 lri_base =
> -		intel_engine_has_relative_mmio(engine) ? MI_LRI_CS_MMIO : 0;
> +	const u32 base = engine->lri_mmio_base;
>   
>   	regs[CTX_LRI_HEADER_0] =
>   		MI_LOAD_REGISTER_IMM(rcs ? 11 : 9) |
>   		MI_LRI_FORCE_POSTED |
> -		lri_base;
> +		engine->lri_cmd_mode;
>   
>   	init_common_reg_state(regs, ppgtt, engine, ring);
>   
> @@ -3435,15 +3435,15 @@ static void gen12_init_reg_state(u32 * const regs,
>   	regs[CTX_LRI_HEADER_1] =
>   		MI_LOAD_REGISTER_IMM(9) |
>   		MI_LRI_FORCE_POSTED |
> -		lri_base;
> +		engine->lri_cmd_mode;
>   
>   	CTX_REG(regs, CTX_CTX_TIMESTAMP, RING_CTX_TIMESTAMP(base), 0);
>   
> -	init_ppgtt_reg_state(regs, base, ppgtt);
> +	init_ppgtt_reg_state(regs, base, engine, ppgtt);
>   
>   	if (rcs) {
>   		regs[GEN12_CTX_LRI_HEADER_3] =
> -			MI_LOAD_REGISTER_IMM(1) | lri_base;
> +			MI_LOAD_REGISTER_IMM(1) | engine->lri_cmd_mode;
>   		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE, 0);
>   
>   		/* TODO: oa_init_reg_state ? */
> diff --git a/drivers/gpu/drm/i915/gt/intel_mocs.c b/drivers/gpu/drm/i915/gt/intel_mocs.c
> index 728704bbbe18..8e8fe3deb95c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_mocs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_mocs.c
> @@ -368,9 +368,6 @@ static u32 get_entry_control(const struct drm_i915_mocs_table *table,
>   /**
>    * intel_mocs_init_engine() - emit the mocs control table
>    * @engine:	The engine for whom to emit the registers.
> - *
> - * This function simply emits a MI_LOAD_REGISTER_IMM command for the
> - * given table starting at the given address.
>    */
>   void intel_mocs_init_engine(struct intel_engine_cs *engine)
>   {
> @@ -456,7 +453,8 @@ static int emit_mocs_control_table(struct i915_request *rq,
>   	if (IS_ERR(cs))
>   		return PTR_ERR(cs);
>   
> -	*cs++ = MI_LOAD_REGISTER_IMM(table->n_entries);
> +	*cs++ = MI_LOAD_REGISTER_IMM(table->n_entries) |
> +		rq->engine->lri_cmd_mode;
>   
>   	for (index = 0; index < table->size; index++) {
>   		u32 value = get_entry_control(table, index);
> diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
> index 0747b8c9f768..7027367b1f1a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
> @@ -1536,12 +1536,13 @@ static int load_pd_dir(struct i915_request *rq, const struct i915_ppgtt *ppgtt)
>   	if (IS_ERR(cs))
>   		return PTR_ERR(cs);
>   
> -	*cs++ = MI_LOAD_REGISTER_IMM(1);
> -	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_DCLV(engine->mmio_base));
> +	/* Can these not be merged into a single LRI??? */
> +	*cs++ = MI_LOAD_REGISTER_IMM(1) | engine->lri_cmd_mode;
> +	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_DCLV(engine->lri_mmio_base));
>   	*cs++ = PP_DIR_DCLV_2G;
>   
> -	*cs++ = MI_LOAD_REGISTER_IMM(1);
> -	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine->mmio_base));
> +	*cs++ = MI_LOAD_REGISTER_IMM(1) | engine->lri_cmd_mode;
> +	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine->lri_mmio_base));
>   	*cs++ = px_base(ppgtt->pd)->ggtt_offset << 10;
>   
>   	intel_ring_advance(rq, cs);
> @@ -1709,7 +1710,8 @@ static int remap_l3_slice(struct i915_request *rq, int slice)
>   	 * here because no other code should access these registers other than
>   	 * at initialization time.
>   	 */
> -	*cs++ = MI_LOAD_REGISTER_IMM(GEN7_L3LOG_SIZE/4);
> +	*cs++ = MI_LOAD_REGISTER_IMM(GEN7_L3LOG_SIZE / 4) |
> +		rq->engine->lri_cmd_mode;
>   	for (i = 0; i < GEN7_L3LOG_SIZE/4; i++) {
>   		*cs++ = i915_mmio_reg_offset(GEN7_L3LOG(slice, i));
>   		*cs++ = remap_info[i];
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index c1b764233761..8b85cdfada21 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1693,6 +1693,11 @@ gen8_update_reg_state_unlocked(struct i915_perf_stream *stream,
>   	};
>   	int i;
>   
> +	/*
> +	 * NB: The LRI instruction is generated by the hardware.
> +	 * Should we read it in and assert that the offset flag is set?
> +	 */
> +
>   	CTX_REG(reg_state, ctx_oactxctrl, GEN8_OACTXCONTROL,
>   		(stream->period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) |
>   		(stream->periodic ? GEN8_OA_TIMER_ENABLE : 0) |
> @@ -1752,8 +1757,9 @@ gen8_load_flex(struct i915_request *rq,
>   	if (IS_ERR(cs))
>   		return PTR_ERR(cs);
>   
> -	*cs++ = MI_LOAD_REGISTER_IMM(count);
> +	*cs++ = MI_LOAD_REGISTER_IMM(count) | rq->engine->lri_cmd_mode;
>   	do {
> +		/* FIXME: Is this table LRI remap/offset friendly? */
>   		*cs++ = i915_mmio_reg_offset(flex->reg);
>   		*cs++ = flex->value;
>   	} while (flex++, --count);
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* ✗ Fi.CI.IGT: failure for drm/i915: Engine relative MMIO (rev9)
  2019-09-23 23:51 [PATCH 0/2] drm/i915: Engine relative MMIO John.C.Harrison
                   ` (2 preceding siblings ...)
  2019-09-24  0:25 ` ✓ Fi.CI.BAT: success for drm/i915: Engine relative MMIO (rev9) Patchwork
@ 2019-09-24 14:57 ` Patchwork
  3 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2019-09-24 14:57 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Engine relative MMIO (rev9)
URL   : https://patchwork.freedesktop.org/series/57117/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_6945_full -> Patchwork_14509_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_14509_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_14509_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_14509_full:

### IGT changes ###

#### Possible regressions ####

  * igt@perf@blocking:
    - shard-iclb:         [PASS][1] -> [TIMEOUT][2] +2 similar issues
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/shard-iclb3/igt@perf@blocking.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/shard-iclb6/igt@perf@blocking.html

  * igt@perf@oa-formats:
    - shard-iclb:         [PASS][3] -> [FAIL][4] +1 similar issue
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/shard-iclb8/igt@perf@oa-formats.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/shard-iclb1/igt@perf@oa-formats.html

  * igt@prime_busy@basic-wait-after-default:
    - shard-apl:          [PASS][5] -> [WARN][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/shard-apl6/igt@prime_busy@basic-wait-after-default.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/shard-apl8/igt@prime_busy@basic-wait-after-default.html

  
Known issues
------------

  Here are the changes found in Patchwork_14509_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_schedule@in-order-bsd:
    - shard-iclb:         [PASS][7] -> [SKIP][8] ([fdo#111325]) +3 similar issues
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/shard-iclb5/igt@gem_exec_schedule@in-order-bsd.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/shard-iclb4/igt@gem_exec_schedule@in-order-bsd.html

  * igt@gem_exec_schedule@independent-bsd2:
    - shard-iclb:         [PASS][9] -> [SKIP][10] ([fdo#109276]) +12 similar issues
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/shard-iclb4/igt@gem_exec_schedule@independent-bsd2.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/shard-iclb5/igt@gem_exec_schedule@independent-bsd2.html

  * igt@i915_suspend@sysfs-reader:
    - shard-apl:          [PASS][11] -> [DMESG-WARN][12] ([fdo#108566]) +4 similar issues
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/shard-apl4/igt@i915_suspend@sysfs-reader.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/shard-apl1/igt@i915_suspend@sysfs-reader.html

  * igt@kms_cursor_crc@pipe-a-cursor-256x85-sliding:
    - shard-apl:          [PASS][13] -> [INCOMPLETE][14] ([fdo#103927]) +1 similar issue
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/shard-apl1/igt@kms_cursor_crc@pipe-a-cursor-256x85-sliding.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/shard-apl3/igt@kms_cursor_crc@pipe-a-cursor-256x85-sliding.html

  * igt@kms_cursor_crc@pipe-a-cursor-suspend:
    - shard-kbl:          [PASS][15] -> [INCOMPLETE][16] ([fdo#103665])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/shard-kbl2/igt@kms_cursor_crc@pipe-a-cursor-suspend.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/shard-kbl4/igt@kms_cursor_crc@pipe-a-cursor-suspend.html

  * igt@kms_cursor_crc@pipe-b-cursor-128x42-random:
    - shard-iclb:         [PASS][17] -> [INCOMPLETE][18] ([fdo#107713])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/shard-iclb3/igt@kms_cursor_crc@pipe-b-cursor-128x42-random.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/shard-iclb1/igt@kms_cursor_crc@pipe-b-cursor-128x42-random.html

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-indfb-draw-pwrite:
    - shard-iclb:         [PASS][19] -> [FAIL][20] ([fdo#103167]) +4 similar issues
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/shard-iclb3/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-indfb-draw-pwrite.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/shard-iclb6/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-indfb-draw-pwrite.html

  * igt@kms_plane_alpha_blend@pipe-b-coverage-7efc:
    - shard-skl:          [PASS][21] -> [FAIL][22] ([fdo#108145] / [fdo#110403])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/shard-skl2/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/shard-skl2/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html

  * igt@kms_psr@psr2_primary_page_flip:
    - shard-iclb:         [PASS][23] -> [SKIP][24] ([fdo#109441]) +1 similar issue
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/shard-iclb2/igt@kms_psr@psr2_primary_page_flip.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/shard-iclb7/igt@kms_psr@psr2_primary_page_flip.html

  * igt@tools_test@tools_test:
    - shard-hsw:          [PASS][25] -> [SKIP][26] ([fdo#109271])
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/shard-hsw7/igt@tools_test@tools_test.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/shard-hsw5/igt@tools_test@tools_test.html

  
#### Possible fixes ####

  * igt@gem_ctx_isolation@bcs0-s3:
    - shard-apl:          [DMESG-WARN][27] ([fdo#108566]) -> [PASS][28] +4 similar issues
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/shard-apl7/igt@gem_ctx_isolation@bcs0-s3.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/shard-apl2/igt@gem_ctx_isolation@bcs0-s3.html

  * igt@gem_exec_capture@capture-bsd2:
    - shard-iclb:         [SKIP][29] ([fdo#109276]) -> [PASS][30] +12 similar issues
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/shard-iclb6/igt@gem_exec_capture@capture-bsd2.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/shard-iclb1/igt@gem_exec_capture@capture-bsd2.html

  * igt@gem_exec_schedule@preempt-other-chain-bsd:
    - shard-iclb:         [SKIP][31] ([fdo#111325]) -> [PASS][32]
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/shard-iclb1/igt@gem_exec_schedule@preempt-other-chain-bsd.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/shard-iclb8/igt@gem_exec_schedule@preempt-other-chain-bsd.html

  * igt@kms_flip@modeset-vs-vblank-race:
    - shard-skl:          [FAIL][33] ([fdo#111609]) -> [PASS][34]
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/shard-skl5/igt@kms_flip@modeset-vs-vblank-race.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/shard-skl8/igt@kms_flip@modeset-vs-vblank-race.html

  * igt@kms_frontbuffer_tracking@fbc-1p-pri-indfb-multidraw:
    - shard-iclb:         [FAIL][35] ([fdo#103167]) -> [PASS][36] +3 similar issues
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/shard-iclb7/igt@kms_frontbuffer_tracking@fbc-1p-pri-indfb-multidraw.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/shard-iclb8/igt@kms_frontbuffer_tracking@fbc-1p-pri-indfb-multidraw.html

  * igt@kms_plane_alpha_blend@pipe-c-coverage-7efc:
    - shard-skl:          [FAIL][37] ([fdo#108145] / [fdo#110403]) -> [PASS][38]
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/shard-skl10/igt@kms_plane_alpha_blend@pipe-c-coverage-7efc.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/shard-skl10/igt@kms_plane_alpha_blend@pipe-c-coverage-7efc.html

  * igt@kms_psr@psr2_primary_blt:
    - shard-iclb:         [SKIP][39] ([fdo#109441]) -> [PASS][40] +1 similar issue
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/shard-iclb7/igt@kms_psr@psr2_primary_blt.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/shard-iclb2/igt@kms_psr@psr2_primary_blt.html

  
#### Warnings ####

  * igt@gem_ctx_isolation@vcs1-nonpriv:
    - shard-iclb:         [FAIL][41] ([fdo#111329]) -> [SKIP][42] ([fdo#109276])
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/shard-iclb1/igt@gem_ctx_isolation@vcs1-nonpriv.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/shard-iclb8/igt@gem_ctx_isolation@vcs1-nonpriv.html

  * igt@gem_mocs_settings@mocs-isolation-bsd2:
    - shard-iclb:         [SKIP][43] ([fdo#109276]) -> [FAIL][44] ([fdo#111330])
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/shard-iclb6/igt@gem_mocs_settings@mocs-isolation-bsd2.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/shard-iclb1/igt@gem_mocs_settings@mocs-isolation-bsd2.html

  * igt@i915_suspend@forcewake:
    - shard-apl:          [DMESG-WARN][45] ([fdo#108566]) -> [INCOMPLETE][46] ([fdo#103927])
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6945/shard-apl7/igt@i915_suspend@forcewake.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/shard-apl1/igt@i915_suspend@forcewake.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103665]: https://bugs.freedesktop.org/show_bug.cgi?id=103665
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#107713]: https://bugs.freedesktop.org/show_bug.cgi?id=107713
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#108566]: https://bugs.freedesktop.org/show_bug.cgi?id=108566
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109276]: https://bugs.freedesktop.org/show_bug.cgi?id=109276
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#110403]: https://bugs.freedesktop.org/show_bug.cgi?id=110403
  [fdo#110548]: https://bugs.freedesktop.org/show_bug.cgi?id=110548
  [fdo#111325]: https://bugs.freedesktop.org/show_bug.cgi?id=111325
  [fdo#111329]: https://bugs.freedesktop.org/show_bug.cgi?id=111329
  [fdo#111330]: https://bugs.freedesktop.org/show_bug.cgi?id=111330
  [fdo#111609]: https://bugs.freedesktop.org/show_bug.cgi?id=111609


Participating hosts (9 -> 9)
------------------------------

  No changes in participating hosts


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_6945 -> Patchwork_14509

  CI-20190529: 20190529
  CI_DRM_6945: f11d819264a3fab210498a4920ef34a891da39e0 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5197: aa534ff47fd2f455c8be9e59eae807695b87fcdd @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_14509: 4f9d4f7b382c34dfd5f4ced2e713a225f9745878 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14509/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] drm/i915: Engine relative MMIO
  2019-09-24  8:45   ` Tvrtko Ursulin
@ 2019-09-24 22:28     ` Rodrigo Vivi
  0 siblings, 0 replies; 11+ messages in thread
From: Rodrigo Vivi @ 2019-09-24 22:28 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-GFX, Chris P Wilson

On Tue, Sep 24, 2019 at 09:45:02AM +0100, Tvrtko Ursulin wrote:
> 
> On 24/09/2019 00:51, John.C.Harrison@Intel.com wrote:
> > From: John Harrison <John.C.Harrison@Intel.com>
> > 
> > With virtual engines, it is no longer possible to know which specific
> > physical engine a given request will be executed on at the time that
> > request is generated. This means that the request itself must be engine
> > agnostic - any direct register writes must be relative to the engine
> > and not absolute addresses.
> > 
> > The LRI command has support for engine relative addressing. However,
> > the mechanism is not transparent to the driver. The scheme for Gen11
> > (MI_LRI_ADD_CS_MMIO_START) requires the LRI address to have no
> > absolute engine base component. The hardware then adds on the correct
> > engine offset at execution time.
> > 
> > Due to the non-trivial and differing schemes on different hardware, it
> > is not possible to simply update the code that creates the LRI
> > commands to set a remap flag and let the hardware get on with it.
> > Instead, this patch adds function wrappers for generating the LRI
> > command itself and then for constructing the correct address to use
> > with the LRI.
> > 
> > v2: Fix build break in GVT. Remove flags parameter [review feedback
> > from Chris W].
> > 
> > v3: Fix build break in selftest. Rebase to newer base tree and fix
> > merge conflict.
> > 
> > v4: More rebasing. Rmoved relative addressing support from Gen7-9 only
> > code paths [review feedback from Chris W].
> > 
> > v5: More rebasing (new 'gt' directory). Fix white space issue. Use
> > COPY class rather than BCS0 id for checking against BCS engine.
> > 
> > v6: Change to MI_LRI for generic code and __MI_LRI for legacy code.
> > MI_LRI is now a macro wrapper around an engine function pointer to
> > reduce runtime checks on LRI relative support. [review feedback from
> > Tvrtko]
> > 
> > v7: Replace engine function pointer with a per engine flags field that
> > is added by a common function. [Daniele]
> > 
> > v8: Re-write on top of patch series by Mika.
> > 
> > v9: Fix workaround IGT failure. Not sure why but my local test runs
> > were passing by claiming no workarounds whereas the pre-commit check
> > failed. As all the workaround registers are currently engine
> > independent, it is safe to simply not add the engine relative flag
> > (which is what the selftest was expecting).
> > 
> > Bspec: 45606
> > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > CC: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > CC: Chris P Wilson <chris.p.wilson@intel.com>
> > CC: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_context.c  |  7 ++--
> >   drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 24 ++++++++++++
> >   drivers/gpu/drm/i915/gt/intel_engine_types.h |  3 ++
> >   drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  7 +++-
> >   drivers/gpu/drm/i915/gt/intel_lrc.c          | 40 ++++++++++----------
> >   drivers/gpu/drm/i915/gt/intel_mocs.c         |  6 +--
> >   drivers/gpu/drm/i915/gt/intel_ringbuffer.c   | 12 +++---
> >   drivers/gpu/drm/i915/i915_perf.c             |  8 +++-
> >   8 files changed, 73 insertions(+), 34 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > index 4a34c4f62065..c8d7075a481d 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > @@ -980,7 +980,7 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
> >   {
> >   	struct i915_address_space *vm = rq->hw_context->vm;
> >   	struct intel_engine_cs *engine = rq->engine;
> > -	u32 base = engine->mmio_base;
> > +	u32 base = engine->lri_mmio_base;
> >   	u32 *cs;
> >   	int i;
> > @@ -992,7 +992,7 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
> >   		if (IS_ERR(cs))
> >   			return PTR_ERR(cs);
> > -		*cs++ = MI_LOAD_REGISTER_IMM(2);
> > +		*cs++ = MI_LOAD_REGISTER_IMM(2) | engine->lri_cmd_mode;
> 
> I missed the part where we ended up with this new flavour and I can't say I
> am a fan.
> 
> What's the point in having to remember to or-in the flags at so many call
> sites? Are you now touching all mi_lri, or most, or how many? If all or most
> then why not MI_LOAD_REGISTER_IMM(engine, 2)? If not most then the same with
> _REL suffix as discussed before and leave the legacy untouched.

Yeap, I also believe that it is better to not touch legacy and create the
new one only with new cases. The real change gets much clear.

> 
> Don't know really on all the details since I lost track, but the current
> option leaves me concerned.
> 
> Regards,
> 
> Tvrtko
> 
> >   		*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(base, 0));
> >   		*cs++ = upper_32_bits(pd_daddr);
> > @@ -1014,7 +1014,8 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
> >   		if (IS_ERR(cs))
> >   			return PTR_ERR(cs);
> > -		*cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) | MI_LRI_FORCE_POSTED;
> > +		*cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) |
> > +			MI_LRI_FORCE_POSTED | engine->lri_cmd_mode;
> >   		for (i = GEN8_3LVL_PDPES; i--; ) {
> >   			const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > index f451d5076bde..5aa58e069806 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > @@ -236,6 +236,28 @@ static u32 __engine_mmio_base(struct drm_i915_private *i915,
> >   	return bases[i].base;
> >   }
> > +static bool i915_engine_has_relative_lri(const struct intel_engine_cs *engine)
> > +{
> > +	if (INTEL_GEN(engine->i915) < 11)
> > +		return false;
> > +
> > +	if (engine->class == COPY_ENGINE_CLASS)
> > +		return false;
> > +
> > +	return true;
> > +}
> > +
> > +static void lri_init(struct intel_engine_cs *engine)
> > +{
> > +	if (i915_engine_has_relative_lri(engine)) {
> > +		engine->lri_cmd_mode = MI_LRI_ADD_CS_MMIO_START_GEN11;
> > +		engine->lri_mmio_base = 0;
> > +	} else {
> > +		engine->lri_cmd_mode = 0;
> > +		engine->lri_mmio_base = engine->mmio_base;
> > +	}
> > +}
> > +
> >   static void __sprint_engine_name(struct intel_engine_cs *engine)
> >   {
> >   	/*
> > @@ -320,6 +342,8 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
> >   	/* Nothing to do here, execute in order of dependencies */
> >   	engine->schedule = NULL;
> > +	lri_init(engine);
> > +
> >   	seqlock_init(&engine->stats.lock);
> >   	ATOMIC_INIT_NOTIFIER_HEAD(&engine->context_status_notifier);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > index 943f0663837e..38f486f7f7e3 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > @@ -306,6 +306,9 @@ struct intel_engine_cs {
> >   	u32 context_size;
> >   	u32 mmio_base;
> > +	u32 lri_mmio_base;
> > +	u32 lri_cmd_mode;
> > +
> >   	u32 uabi_capabilities;
> >   	struct rb_node uabi_node;
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > index f78b13d74e17..bfb51d8d7ce2 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
> > @@ -133,11 +133,16 @@
> >    *   simply ignores the register load under certain conditions.
> >    * - One can actually load arbitrary many arbitrary registers: Simply issue x
> >    *   address/value pairs. Don't overdue it, though, x <= 2^4 must hold!
> > + * - Newer hardware supports engine relative addressing but older hardware does
> > + *   not. This is required for hw engine load balancing. Hence the MI_LRI
> > + *   instruction itself is prefixed with '__' and should only be used on
> > + *   legacy hardware code paths. Generic code must always use the MI_LRI
> > + *   and i915_get_lri_reg() helper functions instead.
> >    */
> >   #define MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*(x)-1)
> >   /* Gen11+. addr = base + (ctx_restore ? offset & GENMASK(12,2) : offset) */
> > -#define   MI_LRI_CS_MMIO		(1<<19)
> >   #define   MI_LRI_FORCE_POSTED		(1<<12)
> > +#define   MI_LRI_ADD_CS_MMIO_START_GEN11	BIT(19)
> >   #define MI_STORE_REGISTER_MEM        MI_INSTR(0x24, 1)
> >   #define MI_STORE_REGISTER_MEM_GEN8   MI_INSTR(0x24, 2)
> >   #define   MI_SRM_LRM_GLOBAL_GTT		(1<<22)
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index 6cfdc0f9f2b9..7cd59d29c4e4 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -2010,11 +2010,12 @@ struct lri {
> >   	u32 value;
> >   };
> > -static u32 *emit_lri(u32 *batch, const struct lri *lri, unsigned int count)
> > +static u32 *emit_lri(struct intel_engine_cs *engine, u32 *batch,
> > +		     const struct lri *lri, unsigned int count)
> >   {
> >   	GEM_BUG_ON(!count || count > 63);
> > -	*batch++ = MI_LOAD_REGISTER_IMM(count);
> > +	*batch++ = MI_LOAD_REGISTER_IMM(count) | engine->lri_cmd_mode;
> >   	do {
> >   		*batch++ = i915_mmio_reg_offset(lri->reg);
> >   		*batch++ = lri->value;
> > @@ -2054,7 +2055,7 @@ static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
> >   	/* WaFlushCoherentL3CacheLinesAtContextSwitch:skl,bxt,glk */
> >   	batch = gen8_emit_flush_coherentl3_wa(engine, batch);
> > -	batch = emit_lri(batch, lri, ARRAY_SIZE(lri));
> > +	batch = emit_lri(engine, batch, lri, ARRAY_SIZE(lri));
> >   	/* WaMediaPoolStateCmdInWABB:bxt,glk */
> >   	if (HAS_POOLED_EU(engine->i915)) {
> > @@ -3282,7 +3283,7 @@ static void init_common_reg_state(u32 * const regs,
> >   				  struct intel_engine_cs *engine,
> >   				  struct intel_ring *ring)
> >   {
> > -	const u32 base = engine->mmio_base;
> > +	const u32 base = engine->lri_mmio_base;
> >   	CTX_REG(regs, CTX_CONTEXT_CONTROL, RING_CONTEXT_CONTROL(base),
> >   		_MASKED_BIT_DISABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT) |
> > @@ -3307,7 +3308,7 @@ static void init_wa_bb_reg_state(u32 * const regs,
> >   				 u32 pos_bb_per_ctx)
> >   {
> >   	struct i915_ctx_workarounds * const wa_ctx = &engine->wa_ctx;
> > -	const u32 base = engine->mmio_base;
> > +	const u32 base = engine->lri_mmio_base;
> >   	const u32 pos_indirect_ctx = pos_bb_per_ctx + 2;
> >   	const u32 pos_indirect_ctx_offset = pos_indirect_ctx + 2;
> > @@ -3335,6 +3336,7 @@ static void init_wa_bb_reg_state(u32 * const regs,
> >   }
> >   static void init_ppgtt_reg_state(u32 *regs, u32 base,
> > +				 struct intel_engine_cs *engine,
> >   				 struct i915_ppgtt *ppgtt)
> >   {
> >   	/* PDP values well be assigned later if needed */
> > @@ -3376,14 +3378,12 @@ static void gen8_init_reg_state(u32 * const regs,
> >   {
> >   	struct i915_ppgtt * const ppgtt = vm_alias(ce->vm);
> >   	const bool rcs = engine->class == RENDER_CLASS;
> > -	const u32 base = engine->mmio_base;
> > -	const u32 lri_base =
> > -		intel_engine_has_relative_mmio(engine) ? MI_LRI_CS_MMIO : 0;
> > +	const u32 base = engine->lri_mmio_base;
> >   	regs[CTX_LRI_HEADER_0] =
> >   		MI_LOAD_REGISTER_IMM(rcs ? 14 : 11) |
> >   		MI_LRI_FORCE_POSTED |
> > -		lri_base;
> > +		engine->lri_cmd_mode;
> >   	init_common_reg_state(regs, ppgtt, engine, ring);
> >   	CTX_REG(regs, CTX_SECOND_BB_HEAD_U, RING_SBBADDR_UDW(base), 0);
> > @@ -3395,15 +3395,17 @@ static void gen8_init_reg_state(u32 * const regs,
> >   	regs[CTX_LRI_HEADER_1] =
> >   		MI_LOAD_REGISTER_IMM(9) |
> >   		MI_LRI_FORCE_POSTED |
> > -		lri_base;
> > +		engine->lri_cmd_mode;
> >   	CTX_REG(regs, CTX_CTX_TIMESTAMP, RING_CTX_TIMESTAMP(base), 0);
> > -	init_ppgtt_reg_state(regs, base, ppgtt);
> > +	init_ppgtt_reg_state(regs, base, engine, ppgtt);
> >   	if (rcs) {
> > -		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1) | lri_base;
> > -		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE, 0);
> > +		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1) |
> > +					 engine->lri_cmd_mode;
> > +		CTX_REG(regs, CTX_R_PWR_CLK_STATE,
> > +			GEN8_R_PWR_CLK_STATE, 0);
> >   	}
> >   	regs[CTX_END] = MI_BATCH_BUFFER_END;
> > @@ -3418,14 +3420,12 @@ static void gen12_init_reg_state(u32 * const regs,
> >   {
> >   	struct i915_ppgtt * const ppgtt = i915_vm_to_ppgtt(ce->vm);
> >   	const bool rcs = engine->class == RENDER_CLASS;
> > -	const u32 base = engine->mmio_base;
> > -	const u32 lri_base =
> > -		intel_engine_has_relative_mmio(engine) ? MI_LRI_CS_MMIO : 0;
> > +	const u32 base = engine->lri_mmio_base;
> >   	regs[CTX_LRI_HEADER_0] =
> >   		MI_LOAD_REGISTER_IMM(rcs ? 11 : 9) |
> >   		MI_LRI_FORCE_POSTED |
> > -		lri_base;
> > +		engine->lri_cmd_mode;
> >   	init_common_reg_state(regs, ppgtt, engine, ring);
> > @@ -3435,15 +3435,15 @@ static void gen12_init_reg_state(u32 * const regs,
> >   	regs[CTX_LRI_HEADER_1] =
> >   		MI_LOAD_REGISTER_IMM(9) |
> >   		MI_LRI_FORCE_POSTED |
> > -		lri_base;
> > +		engine->lri_cmd_mode;
> >   	CTX_REG(regs, CTX_CTX_TIMESTAMP, RING_CTX_TIMESTAMP(base), 0);
> > -	init_ppgtt_reg_state(regs, base, ppgtt);
> > +	init_ppgtt_reg_state(regs, base, engine, ppgtt);
> >   	if (rcs) {
> >   		regs[GEN12_CTX_LRI_HEADER_3] =
> > -			MI_LOAD_REGISTER_IMM(1) | lri_base;
> > +			MI_LOAD_REGISTER_IMM(1) | engine->lri_cmd_mode;
> >   		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE, 0);
> >   		/* TODO: oa_init_reg_state ? */
> > diff --git a/drivers/gpu/drm/i915/gt/intel_mocs.c b/drivers/gpu/drm/i915/gt/intel_mocs.c
> > index 728704bbbe18..8e8fe3deb95c 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_mocs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_mocs.c
> > @@ -368,9 +368,6 @@ static u32 get_entry_control(const struct drm_i915_mocs_table *table,
> >   /**
> >    * intel_mocs_init_engine() - emit the mocs control table
> >    * @engine:	The engine for whom to emit the registers.
> > - *
> > - * This function simply emits a MI_LOAD_REGISTER_IMM command for the
> > - * given table starting at the given address.
> >    */
> >   void intel_mocs_init_engine(struct intel_engine_cs *engine)
> >   {
> > @@ -456,7 +453,8 @@ static int emit_mocs_control_table(struct i915_request *rq,
> >   	if (IS_ERR(cs))
> >   		return PTR_ERR(cs);
> > -	*cs++ = MI_LOAD_REGISTER_IMM(table->n_entries);
> > +	*cs++ = MI_LOAD_REGISTER_IMM(table->n_entries) |
> > +		rq->engine->lri_cmd_mode;
> >   	for (index = 0; index < table->size; index++) {
> >   		u32 value = get_entry_control(table, index);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
> > index 0747b8c9f768..7027367b1f1a 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
> > @@ -1536,12 +1536,13 @@ static int load_pd_dir(struct i915_request *rq, const struct i915_ppgtt *ppgtt)
> >   	if (IS_ERR(cs))
> >   		return PTR_ERR(cs);
> > -	*cs++ = MI_LOAD_REGISTER_IMM(1);
> > -	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_DCLV(engine->mmio_base));
> > +	/* Can these not be merged into a single LRI??? */
> > +	*cs++ = MI_LOAD_REGISTER_IMM(1) | engine->lri_cmd_mode;
> > +	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_DCLV(engine->lri_mmio_base));
> >   	*cs++ = PP_DIR_DCLV_2G;
> > -	*cs++ = MI_LOAD_REGISTER_IMM(1);
> > -	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine->mmio_base));
> > +	*cs++ = MI_LOAD_REGISTER_IMM(1) | engine->lri_cmd_mode;
> > +	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine->lri_mmio_base));
> >   	*cs++ = px_base(ppgtt->pd)->ggtt_offset << 10;
> >   	intel_ring_advance(rq, cs);
> > @@ -1709,7 +1710,8 @@ static int remap_l3_slice(struct i915_request *rq, int slice)
> >   	 * here because no other code should access these registers other than
> >   	 * at initialization time.
> >   	 */
> > -	*cs++ = MI_LOAD_REGISTER_IMM(GEN7_L3LOG_SIZE/4);
> > +	*cs++ = MI_LOAD_REGISTER_IMM(GEN7_L3LOG_SIZE / 4) |
> > +		rq->engine->lri_cmd_mode;
> >   	for (i = 0; i < GEN7_L3LOG_SIZE/4; i++) {
> >   		*cs++ = i915_mmio_reg_offset(GEN7_L3LOG(slice, i));
> >   		*cs++ = remap_info[i];
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> > index c1b764233761..8b85cdfada21 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -1693,6 +1693,11 @@ gen8_update_reg_state_unlocked(struct i915_perf_stream *stream,
> >   	};
> >   	int i;
> > +	/*
> > +	 * NB: The LRI instruction is generated by the hardware.
> > +	 * Should we read it in and assert that the offset flag is set?
> > +	 */
> > +
> >   	CTX_REG(reg_state, ctx_oactxctrl, GEN8_OACTXCONTROL,
> >   		(stream->period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) |
> >   		(stream->periodic ? GEN8_OA_TIMER_ENABLE : 0) |
> > @@ -1752,8 +1757,9 @@ gen8_load_flex(struct i915_request *rq,
> >   	if (IS_ERR(cs))
> >   		return PTR_ERR(cs);
> > -	*cs++ = MI_LOAD_REGISTER_IMM(count);
> > +	*cs++ = MI_LOAD_REGISTER_IMM(count) | rq->engine->lri_cmd_mode;
> >   	do {
> > +		/* FIXME: Is this table LRI remap/offset friendly? */
> >   		*cs++ = i915_mmio_reg_offset(flex->reg);
> >   		*cs++ = flex->value;
> >   	} while (flex++, --count);
> > 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] drm/i915: Engine relative MMIO
  2019-09-19  6:47 [PATCH 0/2] drm/i915: Engine relative MMIO John.C.Harrison
@ 2019-09-19  6:47 ` John.C.Harrison
  0 siblings, 0 replies; 11+ messages in thread
From: John.C.Harrison @ 2019-09-19  6:47 UTC (permalink / raw)
  To: Intel-GFX; +Cc: Chris P Wilson

From: John Harrison <John.C.Harrison@Intel.com>

With virtual engines, it is no longer possible to know which specific
physical engine a given request will be executed on at the time that
request is generated. This means that the request itself must be engine
agnostic - any direct register writes must be relative to the engine
and not absolute addresses.

The LRI command has support for engine relative addressing. However,
the mechanism is not transparent to the driver. The scheme for Gen11
(MI_LRI_ADD_CS_MMIO_START) requires the LRI address to have no
absolute engine base component. The hardware then adds on the correct
engine offset at execution time.

Due to the non-trivial and differing schemes on different hardware, it
is not possible to simply set a remap flag and let the hardware get on
with it. Instead, this patch adds 'lri_cmd_mode' and 'lri_mmio_base'
fields to the engine structure. The former should be ORd in when
creating an LRI command. The latter should be used as the register
base address (as opposed to just engine->mmio_base). They are set per
engine according to the currently available hardware scheme.

v2: Fix build break in GVT. Remove flags parameter [review feedback
from Chris W].

v3: Fix build break in selftest. Rebase to newer base tree and fix
merge conflict.

v4: More rebasing. Rmoved relative addressing support from Gen7-9 only
code paths [review feedback from Chris W].

v5: More rebasing (new 'gt' directory). Fix white space issue. Use
COPY class rather than BCS0 id for checking against BCS engine.

v6: Change to MI_LRI for generic code and __MI_LRI for legacy code.
MI_LRI is now a macro wrapper around an engine function pointer to
reduce runtime checks on LRI relative support. [review feedback from
Tvrtko]

v7: Replace engine function pointer with a per engine flags field that
is added by a common function. [Daniele]

v8: Re-write on top of patch series by Mika - greatly simplified to
not bother with any of the abstraction of the previous version.

Bspec: 45606
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
CC: Rodrigo Vivi <rodrigo.vivi@intel.com>
CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
CC: Chris P Wilson <chris.p.wilson@intel.com>
CC: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c  |  7 ++--
 drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 24 ++++++++++++
 drivers/gpu/drm/i915/gt/intel_engine_types.h |  3 ++
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  7 +++-
 drivers/gpu/drm/i915/gt/intel_lrc.c          | 40 ++++++++++----------
 drivers/gpu/drm/i915/gt/intel_mocs.c         |  6 +--
 drivers/gpu/drm/i915/gt/intel_ringbuffer.c   | 12 +++---
 drivers/gpu/drm/i915/gt/intel_workarounds.c  |  2 +-
 drivers/gpu/drm/i915/i915_perf.c             |  8 +++-
 9 files changed, 74 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index f1c0e5d958f3..a17aa78017cb 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -980,7 +980,7 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
 {
 	struct i915_address_space *vm = rq->hw_context->vm;
 	struct intel_engine_cs *engine = rq->engine;
-	u32 base = engine->mmio_base;
+	u32 base = engine->lri_mmio_base;
 	u32 *cs;
 	int i;
 
@@ -992,7 +992,7 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
 		if (IS_ERR(cs))
 			return PTR_ERR(cs);
 
-		*cs++ = MI_LOAD_REGISTER_IMM(2);
+		*cs++ = MI_LOAD_REGISTER_IMM(2) | engine->lri_cmd_mode;
 
 		*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(base, 0));
 		*cs++ = upper_32_bits(pd_daddr);
@@ -1014,7 +1014,8 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
 		if (IS_ERR(cs))
 			return PTR_ERR(cs);
 
-		*cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) | MI_LRI_FORCE_POSTED;
+		*cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) |
+			MI_LRI_FORCE_POSTED | engine->lri_cmd_mode;
 		for (i = GEN8_3LVL_PDPES; i--; ) {
 			const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 3c176b0f4b45..6e51269ab2ec 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -236,6 +236,28 @@ static u32 __engine_mmio_base(struct drm_i915_private *i915,
 	return bases[i].base;
 }
 
+static bool i915_engine_has_relative_lri(const struct intel_engine_cs *engine)
+{
+	if (INTEL_GEN(engine->i915) < 11)
+		return false;
+
+	if (engine->class == COPY_ENGINE_CLASS)
+		return false;
+
+	return true;
+}
+
+static void lri_init(struct intel_engine_cs *engine)
+{
+	if (i915_engine_has_relative_lri(engine)) {
+		engine->lri_cmd_mode = MI_LRI_ADD_CS_MMIO_START_GEN11;
+		engine->lri_mmio_base = 0;
+	} else {
+		engine->lri_cmd_mode = 0;
+		engine->lri_mmio_base = engine->mmio_base;
+	}
+}
+
 static void __sprint_engine_name(struct intel_engine_cs *engine)
 {
 	/*
@@ -320,6 +342,8 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
 	/* Nothing to do here, execute in order of dependencies */
 	engine->schedule = NULL;
 
+	lri_init(engine);
+
 	seqlock_init(&engine->stats.lock);
 
 	ATOMIC_INIT_NOTIFIER_HEAD(&engine->context_status_notifier);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 943f0663837e..38f486f7f7e3 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -306,6 +306,9 @@ struct intel_engine_cs {
 	u32 context_size;
 	u32 mmio_base;
 
+	u32 lri_mmio_base;
+	u32 lri_cmd_mode;
+
 	u32 uabi_capabilities;
 
 	struct rb_node uabi_node;
diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index f78b13d74e17..bfb51d8d7ce2 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -133,11 +133,16 @@
  *   simply ignores the register load under certain conditions.
  * - One can actually load arbitrary many arbitrary registers: Simply issue x
  *   address/value pairs. Don't overdue it, though, x <= 2^4 must hold!
+ * - Newer hardware supports engine relative addressing but older hardware does
+ *   not. This is required for hw engine load balancing. Hence the MI_LRI
+ *   instruction itself is prefixed with '__' and should only be used on
+ *   legacy hardware code paths. Generic code must always use the MI_LRI
+ *   and i915_get_lri_reg() helper functions instead.
  */
 #define MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*(x)-1)
 /* Gen11+. addr = base + (ctx_restore ? offset & GENMASK(12,2) : offset) */
-#define   MI_LRI_CS_MMIO		(1<<19)
 #define   MI_LRI_FORCE_POSTED		(1<<12)
+#define   MI_LRI_ADD_CS_MMIO_START_GEN11	BIT(19)
 #define MI_STORE_REGISTER_MEM        MI_INSTR(0x24, 1)
 #define MI_STORE_REGISTER_MEM_GEN8   MI_INSTR(0x24, 2)
 #define   MI_SRM_LRM_GLOBAL_GTT		(1<<22)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index a99166a2d2eb..d5ea6f7d3b5c 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1986,11 +1986,12 @@ struct lri {
 	u32 value;
 };
 
-static u32 *emit_lri(u32 *batch, const struct lri *lri, unsigned int count)
+static u32 *emit_lri(struct intel_engine_cs *engine, u32 *batch,
+		     const struct lri *lri, unsigned int count)
 {
 	GEM_BUG_ON(!count || count > 63);
 
-	*batch++ = MI_LOAD_REGISTER_IMM(count);
+	*batch++ = MI_LOAD_REGISTER_IMM(count) | engine->lri_cmd_mode;
 	do {
 		*batch++ = i915_mmio_reg_offset(lri->reg);
 		*batch++ = lri->value;
@@ -2030,7 +2031,7 @@ static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
 	/* WaFlushCoherentL3CacheLinesAtContextSwitch:skl,bxt,glk */
 	batch = gen8_emit_flush_coherentl3_wa(engine, batch);
 
-	batch = emit_lri(batch, lri, ARRAY_SIZE(lri));
+	batch = emit_lri(engine, batch, lri, ARRAY_SIZE(lri));
 
 	/* WaMediaPoolStateCmdInWABB:bxt,glk */
 	if (HAS_POOLED_EU(engine->i915)) {
@@ -3186,7 +3187,7 @@ static void init_common_reg_state(u32 * const regs,
 				  struct intel_engine_cs *engine,
 				  struct intel_ring *ring)
 {
-	const u32 base = engine->mmio_base;
+	const u32 base = engine->lri_mmio_base;
 
 	CTX_REG(regs, CTX_CONTEXT_CONTROL, RING_CONTEXT_CONTROL(base),
 		_MASKED_BIT_DISABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT) |
@@ -3211,7 +3212,7 @@ static void init_wa_bb_reg_state(u32 * const regs,
 				 u32 pos_bb_per_ctx)
 {
 	struct i915_ctx_workarounds * const wa_ctx = &engine->wa_ctx;
-	const u32 base = engine->mmio_base;
+	const u32 base = engine->lri_mmio_base;
 	const u32 pos_indirect_ctx = pos_bb_per_ctx + 2;
 	const u32 pos_indirect_ctx_offset = pos_indirect_ctx + 2;
 
@@ -3239,6 +3240,7 @@ static void init_wa_bb_reg_state(u32 * const regs,
 }
 
 static void init_ppgtt_reg_state(u32 *regs, u32 base,
+				 struct intel_engine_cs *engine,
 				 struct i915_ppgtt *ppgtt)
 {
 	/* PDP values well be assigned later if needed */
@@ -3280,14 +3282,12 @@ static void gen8_init_reg_state(u32 * const regs,
 {
 	struct i915_ppgtt * const ppgtt = vm_alias(ce->vm);
 	const bool rcs = engine->class == RENDER_CLASS;
-	const u32 base = engine->mmio_base;
-	const u32 lri_base =
-		intel_engine_has_relative_mmio(engine) ? MI_LRI_CS_MMIO : 0;
+	const u32 base = engine->lri_mmio_base;
 
 	regs[CTX_LRI_HEADER_0] =
 		MI_LOAD_REGISTER_IMM(rcs ? 14 : 11) |
 		MI_LRI_FORCE_POSTED |
-		lri_base;
+		engine->lri_cmd_mode;
 
 	init_common_reg_state(regs, ppgtt, engine, ring);
 	CTX_REG(regs, CTX_SECOND_BB_HEAD_U, RING_SBBADDR_UDW(base), 0);
@@ -3299,15 +3299,17 @@ static void gen8_init_reg_state(u32 * const regs,
 	regs[CTX_LRI_HEADER_1] =
 		MI_LOAD_REGISTER_IMM(9) |
 		MI_LRI_FORCE_POSTED |
-		lri_base;
+		engine->lri_cmd_mode;
 
 	CTX_REG(regs, CTX_CTX_TIMESTAMP, RING_CTX_TIMESTAMP(base), 0);
 
-	init_ppgtt_reg_state(regs, base, ppgtt);
+	init_ppgtt_reg_state(regs, base, engine, ppgtt);
 
 	if (rcs) {
-		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1) | lri_base;
-		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE, 0);
+		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1) |
+					 engine->lri_cmd_mode;
+		CTX_REG(regs, CTX_R_PWR_CLK_STATE,
+			GEN8_R_PWR_CLK_STATE, 0);
 	}
 
 	regs[CTX_END] = MI_BATCH_BUFFER_END;
@@ -3322,14 +3324,12 @@ static void gen12_init_reg_state(u32 * const regs,
 {
 	struct i915_ppgtt * const ppgtt = i915_vm_to_ppgtt(ce->vm);
 	const bool rcs = engine->class == RENDER_CLASS;
-	const u32 base = engine->mmio_base;
-	const u32 lri_base =
-		intel_engine_has_relative_mmio(engine) ? MI_LRI_CS_MMIO : 0;
+	const u32 base = engine->lri_mmio_base;
 
 	regs[CTX_LRI_HEADER_0] =
 		MI_LOAD_REGISTER_IMM(rcs ? 11 : 9) |
 		MI_LRI_FORCE_POSTED |
-		lri_base;
+		engine->lri_cmd_mode;
 
 	init_common_reg_state(regs, ppgtt, engine, ring);
 
@@ -3339,15 +3339,15 @@ static void gen12_init_reg_state(u32 * const regs,
 	regs[CTX_LRI_HEADER_1] =
 		MI_LOAD_REGISTER_IMM(9) |
 		MI_LRI_FORCE_POSTED |
-		lri_base;
+		engine->lri_cmd_mode;
 
 	CTX_REG(regs, CTX_CTX_TIMESTAMP, RING_CTX_TIMESTAMP(base), 0);
 
-	init_ppgtt_reg_state(regs, base, ppgtt);
+	init_ppgtt_reg_state(regs, base, engine, ppgtt);
 
 	if (rcs) {
 		regs[GEN12_CTX_LRI_HEADER_3] =
-			MI_LOAD_REGISTER_IMM(1) | lri_base;
+			MI_LOAD_REGISTER_IMM(1) | engine->lri_cmd_mode;
 		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE, 0);
 
 		/* TODO: oa_init_reg_state ? */
diff --git a/drivers/gpu/drm/i915/gt/intel_mocs.c b/drivers/gpu/drm/i915/gt/intel_mocs.c
index 728704bbbe18..8e8fe3deb95c 100644
--- a/drivers/gpu/drm/i915/gt/intel_mocs.c
+++ b/drivers/gpu/drm/i915/gt/intel_mocs.c
@@ -368,9 +368,6 @@ static u32 get_entry_control(const struct drm_i915_mocs_table *table,
 /**
  * intel_mocs_init_engine() - emit the mocs control table
  * @engine:	The engine for whom to emit the registers.
- *
- * This function simply emits a MI_LOAD_REGISTER_IMM command for the
- * given table starting at the given address.
  */
 void intel_mocs_init_engine(struct intel_engine_cs *engine)
 {
@@ -456,7 +453,8 @@ static int emit_mocs_control_table(struct i915_request *rq,
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
-	*cs++ = MI_LOAD_REGISTER_IMM(table->n_entries);
+	*cs++ = MI_LOAD_REGISTER_IMM(table->n_entries) |
+		rq->engine->lri_cmd_mode;
 
 	for (index = 0; index < table->size; index++) {
 		u32 value = get_entry_control(table, index);
diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
index a25b84b12ef1..947f74161d91 100644
--- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
@@ -1535,12 +1535,13 @@ static int load_pd_dir(struct i915_request *rq, const struct i915_ppgtt *ppgtt)
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
-	*cs++ = MI_LOAD_REGISTER_IMM(1);
-	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_DCLV(engine->mmio_base));
+	/* Can these not be merged into a single LRI??? */
+	*cs++ = MI_LOAD_REGISTER_IMM(1) | engine->lri_cmd_mode;
+	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_DCLV(engine->lri_mmio_base));
 	*cs++ = PP_DIR_DCLV_2G;
 
-	*cs++ = MI_LOAD_REGISTER_IMM(1);
-	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine->mmio_base));
+	*cs++ = MI_LOAD_REGISTER_IMM(1) | engine->lri_cmd_mode;
+	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine->lri_mmio_base));
 	*cs++ = px_base(ppgtt->pd)->ggtt_offset << 10;
 
 	intel_ring_advance(rq, cs);
@@ -1708,7 +1709,8 @@ static int remap_l3_slice(struct i915_request *rq, int slice)
 	 * here because no other code should access these registers other than
 	 * at initialization time.
 	 */
-	*cs++ = MI_LOAD_REGISTER_IMM(GEN7_L3LOG_SIZE/4);
+	*cs++ = MI_LOAD_REGISTER_IMM(GEN7_L3LOG_SIZE / 4) |
+		rq->engine->lri_cmd_mode;
 	for (i = 0; i < GEN7_L3LOG_SIZE/4; i++) {
 		*cs++ = i915_mmio_reg_offset(GEN7_L3LOG(slice, i));
 		*cs++ = remap_info[i];
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index 41d0f786e06d..895f12009208 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -633,7 +633,7 @@ int intel_engine_emit_ctx_wa(struct i915_request *rq)
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
-	*cs++ = MI_LOAD_REGISTER_IMM(wal->count);
+	*cs++ = MI_LOAD_REGISTER_IMM(wal->count) | rq->engine->lri_cmd_mode;
 	for (i = 0, wa = wal->list; i < wal->count; i++, wa++) {
 		*cs++ = i915_mmio_reg_offset(wa->reg);
 		*cs++ = wa->val;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index c1b764233761..8b85cdfada21 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1693,6 +1693,11 @@ gen8_update_reg_state_unlocked(struct i915_perf_stream *stream,
 	};
 	int i;
 
+	/*
+	 * NB: The LRI instruction is generated by the hardware.
+	 * Should we read it in and assert that the offset flag is set?
+	 */
+
 	CTX_REG(reg_state, ctx_oactxctrl, GEN8_OACTXCONTROL,
 		(stream->period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) |
 		(stream->periodic ? GEN8_OA_TIMER_ENABLE : 0) |
@@ -1752,8 +1757,9 @@ gen8_load_flex(struct i915_request *rq,
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
-	*cs++ = MI_LOAD_REGISTER_IMM(count);
+	*cs++ = MI_LOAD_REGISTER_IMM(count) | rq->engine->lri_cmd_mode;
 	do {
+		/* FIXME: Is this table LRI remap/offset friendly? */
 		*cs++ = i915_mmio_reg_offset(flex->reg);
 		*cs++ = flex->value;
 	} while (flex++, --count);
-- 
2.21.0.5.gaeb582a983

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 1/2] drm/i915: Engine relative MMIO
  2019-08-22 18:02 [PATCH 0/2] " John.C.Harrison
@ 2019-08-22 18:02 ` John.C.Harrison
  0 siblings, 0 replies; 11+ messages in thread
From: John.C.Harrison @ 2019-08-22 18:02 UTC (permalink / raw)
  To: Intel-GFX; +Cc: Chris P Wilson

From: John Harrison <John.C.Harrison@Intel.com>

With virtual engines, it is no longer possible to know which specific
physical engine a given request will be executed on at the time that
request is generated. This means that the request itself must be engine
agnostic - any direct register writes must be relative to the engine
and not absolute addresses.

The LRI command has support for engine relative addressing. However,
the mechanism is not transparent to the driver. The scheme for Gen11
(MI_LRI_ADD_CS_MMIO_START) requires the LRI address to have no
absolute engine base component. The hardware then adds on the correct
engine offset at execution time.

Due to the non-trivial and differing schemes on different hardware, it
is not possible to simply update the code that creates the LRI
commands to set a remap flag and let the hardware get on with it.
Instead, this patch adds function wrappers for generating the LRI
command itself and then for constructing the correct address to use
with the LRI.

v2: Fix build break in GVT. Remove flags parameter [review feedback
from Chris W].

v3: Fix build break in selftest. Rebase to newer base tree and fix
merge conflict.

v4: More rebasing. Rmoved relative addressing support from Gen7-9 only
code paths [review feedback from Chris W].

v5: More rebasing (new 'gt' directory). Fix white space issue. Use
COPY class rather than BCS0 id for checking against BCS engine.

v6: Change to MI_LRI for generic code and __MI_LRI for legacy code.
MI_LRI is now a macro wrapper around an engine function pointer to
reduce runtime checks on LRI relative support. [review feedback from
Tvrtko]

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
CC: Rodrigo Vivi <rodrigo.vivi@intel.com>
CC: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
CC: Chris P Wilson <chris.p.wilson@intel.com>
CC: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 14 ++--
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  3 +-
 drivers/gpu/drm/i915/gt/intel_engine.h        |  5 ++
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 42 ++++++++++
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  3 +
 drivers/gpu/drm/i915/gt/intel_gpu_commands.h  |  8 +-
 drivers/gpu/drm/i915/gt/intel_lrc.c           | 77 ++++++++++---------
 drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |  4 +-
 drivers/gpu/drm/i915/gt/intel_mocs.c          | 17 ++--
 drivers/gpu/drm/i915/gt/intel_ringbuffer.c    | 25 ++++--
 drivers/gpu/drm/i915/gt/intel_workarounds.c   |  4 +-
 .../gpu/drm/i915/gt/selftest_workarounds.c    | 15 ++--
 drivers/gpu/drm/i915/gvt/mmio_context.c       | 16 +++-
 drivers/gpu/drm/i915/i915_cmd_parser.c        |  6 +-
 drivers/gpu/drm/i915/i915_perf.c              | 16 ++--
 15 files changed, 175 insertions(+), 80 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 1cdfe05514c3..b8f23148ac80 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -992,11 +992,11 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
 		if (IS_ERR(cs))
 			return PTR_ERR(cs);
 
-		*cs++ = MI_LOAD_REGISTER_IMM(2);
+		*cs++ = MI_LOAD_REGISTER_IMM(engine, 2);
 
-		*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(base, 0));
+		*cs++ = i915_get_lri_reg(engine, GEN8_RING_PDP_UDW(base, 0));
 		*cs++ = upper_32_bits(pd_daddr);
-		*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(base, 0));
+		*cs++ = i915_get_lri_reg(engine, GEN8_RING_PDP_LDW(base, 0));
 		*cs++ = lower_32_bits(pd_daddr);
 
 		*cs++ = MI_NOOP;
@@ -1008,13 +1008,15 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data)
 		if (IS_ERR(cs))
 			return PTR_ERR(cs);
 
-		*cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES);
+		*cs++ = MI_LOAD_REGISTER_IMM(engine, 2 * GEN8_3LVL_PDPES);
 		for (i = GEN8_3LVL_PDPES; i--; ) {
 			const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
 
-			*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(base, i));
+			*cs++ = i915_get_lri_reg(engine,
+						 GEN8_RING_PDP_UDW(base, i));
 			*cs++ = upper_32_bits(pd_daddr);
-			*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(base, i));
+			*cs++ = i915_get_lri_reg(engine,
+						 GEN8_RING_PDP_LDW(base, i));
 			*cs++ = lower_32_bits(pd_daddr);
 		}
 		*cs++ = MI_NOOP;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 2dca2962c73a..13bad7806f52 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1947,7 +1947,8 @@ static int i915_reset_gen7_sol_offsets(struct i915_request *rq)
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
-	*cs++ = MI_LOAD_REGISTER_IMM(4);
+	/* Gen7 only so no need to support relative offsets */
+	*cs++ = __MI_LOAD_REGISTER_IMM(4);
 	for (i = 0; i < 4; i++) {
 		*cs++ = i915_mmio_reg_offset(GEN7_SO_WRITE_OFFSET(i));
 		*cs++ = 0;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index d3c6993f4f46..09d7208b0127 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -511,4 +511,9 @@ void intel_engine_init_active(struct intel_engine_cs *engine,
 #define ENGINE_MOCK	1
 #define ENGINE_VIRTUAL	2
 
+#define MI_LOAD_REGISTER_IMM(engine, count)	\
+					(engine)->get_lri_cmd((engine), (count))
+
+u32 i915_get_lri_reg(const struct intel_engine_cs *engine, i915_reg_t reg);
+
 #endif /* _INTEL_RINGBUFFER_H_ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 82630db0394b..efe1c377d797 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -236,6 +236,46 @@ static u32 __engine_mmio_base(struct drm_i915_private *i915,
 	return bases[i].base;
 }
 
+static u32 i915_get_lri_cmd_legacy(const struct intel_engine_cs *engine,
+				   u32 word_count)
+{
+	return __MI_LOAD_REGISTER_IMM(word_count);
+}
+
+static u32 i915_get_lri_cmd_add_offset(const struct intel_engine_cs *engine,
+				       u32 word_count)
+{
+	return __MI_LOAD_REGISTER_IMM(word_count) |
+	       MI_LRI_ADD_CS_MMIO_START_GEN11;
+}
+
+static bool i915_engine_has_relative_lri(const struct intel_engine_cs *engine)
+{
+	if (INTEL_GEN(engine->i915) < 11)
+		return false;
+
+	if (engine->class == COPY_ENGINE_CLASS)
+		return false;
+
+	return true;
+}
+
+static void lri_init(struct intel_engine_cs *engine)
+{
+	if (i915_engine_has_relative_lri(engine))
+		engine->get_lri_cmd = i915_get_lri_cmd_add_offset;
+	else
+		engine->get_lri_cmd = i915_get_lri_cmd_legacy;
+}
+
+u32 i915_get_lri_reg(const struct intel_engine_cs *engine, i915_reg_t reg)
+{
+	if (!i915_engine_has_relative_lri(engine))
+		return i915_mmio_reg_offset(reg);
+
+	return i915_mmio_reg_offset(reg) - engine->mmio_base;
+}
+
 static void __sprint_engine_name(struct intel_engine_cs *engine)
 {
 	/*
@@ -320,6 +360,8 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
 	/* Nothing to do here, execute in order of dependencies */
 	engine->schedule = NULL;
 
+	lri_init(engine);
+
 	seqlock_init(&engine->stats.lock);
 
 	ATOMIC_INIT_NOTIFIER_HEAD(&engine->context_status_notifier);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index a82cea95c2f2..7ca6c86a33f6 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -306,6 +306,9 @@ struct intel_engine_cs {
 	u32 context_size;
 	u32 mmio_base;
 
+	u32 (*get_lri_cmd)(const struct intel_engine_cs *engine,
+			   u32 word_count);
+
 	u32 uabi_capabilities;
 
 	struct rb_node uabi_node;
diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index 86e00a2db8a4..eaa019df0ce7 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -130,9 +130,15 @@
  *   simply ignores the register load under certain conditions.
  * - One can actually load arbitrary many arbitrary registers: Simply issue x
  *   address/value pairs. Don't overdue it, though, x <= 2^4 must hold!
+ * - Newer hardware supports engine relative addressing but older hardware does
+ *   not. This is required for hw engine load balancing. Hence the MI_LRI
+ *   instruction itself is prefixed with '__' and should only be used on
+ *   legacy hardware code paths. Generic code must always use the MI_LRI
+ *   and i915_get_lri_reg() helper functions instead.
  */
-#define MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*(x)-1)
+#define __MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*(x)-1)
 #define   MI_LRI_FORCE_POSTED		(1<<12)
+#define   MI_LRI_ADD_CS_MMIO_START_GEN11	(1<<19)
 #define MI_STORE_REGISTER_MEM        MI_INSTR(0x24, 1)
 #define MI_STORE_REGISTER_MEM_GEN8   MI_INSTR(0x24, 2)
 #define   MI_SRM_LRM_GLOBAL_GTT		(1<<22)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index d42584439f51..e154786814bc 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1902,14 +1902,15 @@ static int emit_pdps(struct i915_request *rq)
 		return PTR_ERR(cs);
 
 	/* Ensure the LRI have landed before we invalidate & continue */
-	*cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) | MI_LRI_FORCE_POSTED;
+	*cs++ = MI_LOAD_REGISTER_IMM(engine, 2 * GEN8_3LVL_PDPES) |
+				 MI_LRI_FORCE_POSTED;
 	for (i = GEN8_3LVL_PDPES; i--; ) {
 		const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
 		u32 base = engine->mmio_base;
 
-		*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(base, i));
+		*cs++ = i915_get_lri_reg(engine, GEN8_RING_PDP_UDW(base, i));
 		*cs++ = upper_32_bits(pd_daddr);
-		*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(base, i));
+		*cs++ = i915_get_lri_reg(engine, GEN8_RING_PDP_LDW(base, i));
 		*cs++ = lower_32_bits(pd_daddr);
 	}
 	*cs++ = MI_NOOP;
@@ -1984,7 +1985,8 @@ gen8_emit_flush_coherentl3_wa(struct intel_engine_cs *engine, u32 *batch)
 					   INTEL_GT_SCRATCH_FIELD_COHERENTL3_WA);
 	*batch++ = 0;
 
-	*batch++ = MI_LOAD_REGISTER_IMM(1);
+	/* Gen8/9 only so no need to support relative offsets */
+	*batch++ = __MI_LOAD_REGISTER_IMM(1);
 	*batch++ = i915_mmio_reg_offset(GEN8_L3SQCREG4);
 	*batch++ = 0x40400000 | GEN8_LQSC_FLUSH_COHERENT_LINES;
 
@@ -2061,13 +2063,14 @@ struct lri {
 	u32 value;
 };
 
-static u32 *emit_lri(u32 *batch, const struct lri *lri, unsigned int count)
+static u32 *emit_lri(struct intel_engine_cs *engine, u32 *batch,
+		     const struct lri *lri, unsigned int count)
 {
 	GEM_BUG_ON(!count || count > 63);
 
-	*batch++ = MI_LOAD_REGISTER_IMM(count);
+	*batch++ = MI_LOAD_REGISTER_IMM(engine, count);
 	do {
-		*batch++ = i915_mmio_reg_offset(lri->reg);
+		*batch++ = i915_get_lri_reg(engine, lri->reg);
 		*batch++ = lri->value;
 	} while (lri++, --count);
 	*batch++ = MI_NOOP;
@@ -2105,7 +2108,7 @@ static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
 	/* WaFlushCoherentL3CacheLinesAtContextSwitch:skl,bxt,glk */
 	batch = gen8_emit_flush_coherentl3_wa(engine, batch);
 
-	batch = emit_lri(batch, lri, ARRAY_SIZE(lri));
+	batch = emit_lri(engine, batch, lri, ARRAY_SIZE(lri));
 
 	/* WaMediaPoolStateCmdInWABB:bxt,glk */
 	if (HAS_POOLED_EU(engine->i915)) {
@@ -3168,10 +3171,10 @@ static void execlists_init_reg_state(u32 *regs,
 	 *
 	 * Must keep consistent with virtual_update_register_offsets().
 	 */
-	regs[CTX_LRI_HEADER_0] = MI_LOAD_REGISTER_IMM(rcs ? 14 : 11) |
+	regs[CTX_LRI_HEADER_0] = MI_LOAD_REGISTER_IMM(engine, rcs ? 14 : 11) |
 				 MI_LRI_FORCE_POSTED;
 
-	CTX_REG(regs, CTX_CONTEXT_CONTROL, RING_CONTEXT_CONTROL(base),
+	CTX_REG(engine, regs, CTX_CONTEXT_CONTROL, RING_CONTEXT_CONTROL(base),
 		_MASKED_BIT_DISABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT) |
 		_MASKED_BIT_ENABLE(CTX_CTRL_INHIBIT_SYN_CTX_SWITCH));
 	if (INTEL_GEN(engine->i915) < 11) {
@@ -3179,22 +3182,23 @@ static void execlists_init_reg_state(u32 *regs,
 			_MASKED_BIT_DISABLE(CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT |
 					    CTX_CTRL_RS_CTX_ENABLE);
 	}
-	CTX_REG(regs, CTX_RING_HEAD, RING_HEAD(base), 0);
-	CTX_REG(regs, CTX_RING_TAIL, RING_TAIL(base), 0);
-	CTX_REG(regs, CTX_RING_BUFFER_START, RING_START(base), 0);
-	CTX_REG(regs, CTX_RING_BUFFER_CONTROL, RING_CTL(base),
+	CTX_REG(engine, regs, CTX_RING_HEAD, RING_HEAD(base), 0);
+	CTX_REG(engine, regs, CTX_RING_TAIL, RING_TAIL(base), 0);
+	CTX_REG(engine, regs, CTX_RING_BUFFER_START, RING_START(base), 0);
+	CTX_REG(engine, regs, CTX_RING_BUFFER_CONTROL, RING_CTL(base),
 		RING_CTL_SIZE(ring->size) | RING_VALID);
-	CTX_REG(regs, CTX_BB_HEAD_U, RING_BBADDR_UDW(base), 0);
-	CTX_REG(regs, CTX_BB_HEAD_L, RING_BBADDR(base), 0);
-	CTX_REG(regs, CTX_BB_STATE, RING_BBSTATE(base), RING_BB_PPGTT);
-	CTX_REG(regs, CTX_SECOND_BB_HEAD_U, RING_SBBADDR_UDW(base), 0);
-	CTX_REG(regs, CTX_SECOND_BB_HEAD_L, RING_SBBADDR(base), 0);
-	CTX_REG(regs, CTX_SECOND_BB_STATE, RING_SBBSTATE(base), 0);
+	CTX_REG(engine, regs, CTX_BB_HEAD_U, RING_BBADDR_UDW(base), 0);
+	CTX_REG(engine, regs, CTX_BB_HEAD_L, RING_BBADDR(base), 0);
+	CTX_REG(engine, regs, CTX_BB_STATE, RING_BBSTATE(base), RING_BB_PPGTT);
+	CTX_REG(engine, regs, CTX_SECOND_BB_HEAD_U, RING_SBBADDR_UDW(base), 0);
+	CTX_REG(engine, regs, CTX_SECOND_BB_HEAD_L, RING_SBBADDR(base), 0);
+	CTX_REG(engine, regs, CTX_SECOND_BB_STATE, RING_SBBSTATE(base), 0);
 	if (rcs) {
 		struct i915_ctx_workarounds *wa_ctx = &engine->wa_ctx;
 
-		CTX_REG(regs, CTX_RCS_INDIRECT_CTX, RING_INDIRECT_CTX(base), 0);
-		CTX_REG(regs, CTX_RCS_INDIRECT_CTX_OFFSET,
+		CTX_REG(engine, regs, CTX_RCS_INDIRECT_CTX,
+			RING_INDIRECT_CTX(base), 0);
+		CTX_REG(engine, regs, CTX_RCS_INDIRECT_CTX_OFFSET,
 			RING_INDIRECT_CTX_OFFSET(base), 0);
 		if (wa_ctx->indirect_ctx.size) {
 			u32 ggtt_offset = i915_ggtt_offset(wa_ctx->vma);
@@ -3207,7 +3211,8 @@ static void execlists_init_reg_state(u32 *regs,
 				intel_lr_indirect_ctx_offset(engine) << 6;
 		}
 
-		CTX_REG(regs, CTX_BB_PER_CTX_PTR, RING_BB_PER_CTX_PTR(base), 0);
+		CTX_REG(engine, regs, CTX_BB_PER_CTX_PTR,
+			RING_BB_PER_CTX_PTR(base), 0);
 		if (wa_ctx->per_ctx.size) {
 			u32 ggtt_offset = i915_ggtt_offset(wa_ctx->vma);
 
@@ -3216,18 +3221,19 @@ static void execlists_init_reg_state(u32 *regs,
 		}
 	}
 
-	regs[CTX_LRI_HEADER_1] = MI_LOAD_REGISTER_IMM(9) | MI_LRI_FORCE_POSTED;
+	regs[CTX_LRI_HEADER_1] = MI_LOAD_REGISTER_IMM(engine, 9) |
+				 MI_LRI_FORCE_POSTED;
 
-	CTX_REG(regs, CTX_CTX_TIMESTAMP, RING_CTX_TIMESTAMP(base), 0);
+	CTX_REG(engine, regs, CTX_CTX_TIMESTAMP, RING_CTX_TIMESTAMP(base), 0);
 	/* PDP values well be assigned later if needed */
-	CTX_REG(regs, CTX_PDP3_UDW, GEN8_RING_PDP_UDW(base, 3), 0);
-	CTX_REG(regs, CTX_PDP3_LDW, GEN8_RING_PDP_LDW(base, 3), 0);
-	CTX_REG(regs, CTX_PDP2_UDW, GEN8_RING_PDP_UDW(base, 2), 0);
-	CTX_REG(regs, CTX_PDP2_LDW, GEN8_RING_PDP_LDW(base, 2), 0);
-	CTX_REG(regs, CTX_PDP1_UDW, GEN8_RING_PDP_UDW(base, 1), 0);
-	CTX_REG(regs, CTX_PDP1_LDW, GEN8_RING_PDP_LDW(base, 1), 0);
-	CTX_REG(regs, CTX_PDP0_UDW, GEN8_RING_PDP_UDW(base, 0), 0);
-	CTX_REG(regs, CTX_PDP0_LDW, GEN8_RING_PDP_LDW(base, 0), 0);
+	CTX_REG(engine, regs, CTX_PDP3_UDW, GEN8_RING_PDP_UDW(base, 3), 0);
+	CTX_REG(engine, regs, CTX_PDP3_LDW, GEN8_RING_PDP_LDW(base, 3), 0);
+	CTX_REG(engine, regs, CTX_PDP2_UDW, GEN8_RING_PDP_UDW(base, 2), 0);
+	CTX_REG(engine, regs, CTX_PDP2_LDW, GEN8_RING_PDP_LDW(base, 2), 0);
+	CTX_REG(engine, regs, CTX_PDP1_UDW, GEN8_RING_PDP_UDW(base, 1), 0);
+	CTX_REG(engine, regs, CTX_PDP1_LDW, GEN8_RING_PDP_LDW(base, 1), 0);
+	CTX_REG(engine, regs, CTX_PDP0_UDW, GEN8_RING_PDP_UDW(base, 0), 0);
+	CTX_REG(engine, regs, CTX_PDP0_LDW, GEN8_RING_PDP_LDW(base, 0), 0);
 
 	if (i915_vm_is_4lvl(&ppgtt->vm)) {
 		/* 64b PPGTT (48bit canonical)
@@ -3243,8 +3249,9 @@ static void execlists_init_reg_state(u32 *regs,
 	}
 
 	if (rcs) {
-		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
-		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE, 0);
+		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(engine, 1);
+		CTX_REG(engine, regs, CTX_R_PWR_CLK_STATE,
+			GEN8_R_PWR_CLK_STATE, 0);
 	}
 
 	regs[CTX_END] = MI_BATCH_BUFFER_END;
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
index b8f20ad71169..e56a27697b6f 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
+++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
@@ -39,10 +39,10 @@
 #define CTX_R_PWR_CLK_STATE		0x42
 #define CTX_END				0x44
 
-#define CTX_REG(reg_state, pos, reg, val) do { \
+#define CTX_REG(engine, reg_state, pos, reg, val) do { \
 	u32 *reg_state__ = (reg_state); \
 	const u32 pos__ = (pos); \
-	(reg_state__)[(pos__) + 0] = i915_mmio_reg_offset(reg); \
+	(reg_state__)[(pos__) + 0] = i915_get_lri_reg((engine), (reg)); \
 	(reg_state__)[(pos__) + 1] = (val); \
 } while (0)
 
diff --git a/drivers/gpu/drm/i915/gt/intel_mocs.c b/drivers/gpu/drm/i915/gt/intel_mocs.c
index 728704bbbe18..ba29a0bdae28 100644
--- a/drivers/gpu/drm/i915/gt/intel_mocs.c
+++ b/drivers/gpu/drm/i915/gt/intel_mocs.c
@@ -368,9 +368,6 @@ static u32 get_entry_control(const struct drm_i915_mocs_table *table,
 /**
  * intel_mocs_init_engine() - emit the mocs control table
  * @engine:	The engine for whom to emit the registers.
- *
- * This function simply emits a MI_LOAD_REGISTER_IMM command for the
- * given table starting at the given address.
  */
 void intel_mocs_init_engine(struct intel_engine_cs *engine)
 {
@@ -456,18 +453,20 @@ static int emit_mocs_control_table(struct i915_request *rq,
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
-	*cs++ = MI_LOAD_REGISTER_IMM(table->n_entries);
+	*cs++ = MI_LOAD_REGISTER_IMM(rq->engine, table->n_entries);
 
 	for (index = 0; index < table->size; index++) {
 		u32 value = get_entry_control(table, index);
 
-		*cs++ = i915_mmio_reg_offset(mocs_register(engine, index));
+		*cs++ = i915_get_lri_reg(rq->engine,
+					 mocs_register(engine, index));
 		*cs++ = value;
 	}
 
 	/* All remaining entries are also unused */
 	for (; index < table->n_entries; index++) {
-		*cs++ = i915_mmio_reg_offset(mocs_register(engine, index));
+		*cs++ = i915_get_lri_reg(rq->engine,
+					 mocs_register(engine, index));
 		*cs++ = unused_value;
 	}
 
@@ -514,7 +513,11 @@ static int emit_mocs_l3cc_table(struct i915_request *rq,
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
-	*cs++ = MI_LOAD_REGISTER_IMM(table->n_entries / 2);
+	/*
+	 * GEN9_LNCFCMOCS is not engine relative, therefore there is no
+	 * need for relative addressing?
+	 */
+	*cs++ = __MI_LOAD_REGISTER_IMM(table->n_entries / 2);
 
 	for (i = 0; i < table->size / 2; i++) {
 		u16 low = get_entry_l3cc(table, 2 * i);
diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
index 601c16239fdf..58997953ea9a 100644
--- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
@@ -1534,12 +1534,13 @@ static int load_pd_dir(struct i915_request *rq, const struct i915_ppgtt *ppgtt)
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
-	*cs++ = MI_LOAD_REGISTER_IMM(1);
-	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_DCLV(engine->mmio_base));
+	/* Can these not be merged into a single LRI??? */
+	*cs++ = MI_LOAD_REGISTER_IMM(engine, 1);
+	*cs++ = i915_get_lri_reg(engine, RING_PP_DIR_DCLV(engine->mmio_base));
 	*cs++ = PP_DIR_DCLV_2G;
 
-	*cs++ = MI_LOAD_REGISTER_IMM(1);
-	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine->mmio_base));
+	*cs++ = MI_LOAD_REGISTER_IMM(engine, 1);
+	*cs++ = i915_get_lri_reg(engine, RING_PP_DIR_BASE(engine->mmio_base));
 	*cs++ = px_base(ppgtt->pd)->ggtt_offset << 10;
 
 	intel_ring_advance(rq, cs);
@@ -1608,7 +1609,11 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
 		if (num_engines) {
 			struct intel_engine_cs *signaller;
 
-			*cs++ = MI_LOAD_REGISTER_IMM(num_engines);
+			/*
+			 * Must use absolute engine address as the register
+			 * write is targeting a different engine.
+			 */
+			*cs++ = __MI_LOAD_REGISTER_IMM(num_engines);
 			for_each_engine(signaller, i915, id) {
 				if (signaller == engine)
 					continue;
@@ -1662,7 +1667,11 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
 			struct intel_engine_cs *signaller;
 			i915_reg_t last_reg = {}; /* keep gcc quiet */
 
-			*cs++ = MI_LOAD_REGISTER_IMM(num_engines);
+			/*
+			 * Must use absolute engine address as the register
+			 * write is targeting a different engine.
+			 */
+			*cs++ = __MI_LOAD_REGISTER_IMM(num_engines);
 			for_each_engine(signaller, i915, id) {
 				if (signaller == engine)
 					continue;
@@ -1707,9 +1716,9 @@ static int remap_l3_slice(struct i915_request *rq, int slice)
 	 * here because no other code should access these registers other than
 	 * at initialization time.
 	 */
-	*cs++ = MI_LOAD_REGISTER_IMM(GEN7_L3LOG_SIZE/4);
+	*cs++ = MI_LOAD_REGISTER_IMM(rq->engine, GEN7_L3LOG_SIZE/4);
 	for (i = 0; i < GEN7_L3LOG_SIZE/4; i++) {
-		*cs++ = i915_mmio_reg_offset(GEN7_L3LOG(slice, i));
+		*cs++ = i915_get_lri_reg(rq->engine, GEN7_L3LOG(slice, i));
 		*cs++ = remap_info[i];
 	}
 	*cs++ = MI_NOOP;
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index 126ab3667919..44369798ecc5 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -638,9 +638,9 @@ int intel_engine_emit_ctx_wa(struct i915_request *rq)
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
-	*cs++ = MI_LOAD_REGISTER_IMM(wal->count);
+	*cs++ = MI_LOAD_REGISTER_IMM(rq->engine, wal->count);
 	for (i = 0, wa = wal->list; i < wal->count; i++, wa++) {
-		*cs++ = i915_mmio_reg_offset(wa->reg);
+		*cs++ = i915_get_lri_reg(rq->engine, wa->reg);
 		*cs++ = wa->val;
 	}
 	*cs++ = MI_NOOP;
diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
index d06d68ac2a3b..c31ea02ab91c 100644
--- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
@@ -480,6 +480,8 @@ static int check_dirty_whitelist(struct i915_gem_context *ctx,
 
 	for (i = 0; i < engine->whitelist.count; i++) {
 		u32 reg = i915_mmio_reg_offset(engine->whitelist.list[i].reg);
+		u32 regLRI = i915_get_lri_reg(engine,
+					      engine->whitelist.list[i].reg);
 		u64 addr = scratch->node.start;
 		struct i915_request *rq;
 		u32 srm, lrm, rsvd;
@@ -515,8 +517,8 @@ static int check_dirty_whitelist(struct i915_gem_context *ctx,
 		idx = 1;
 		for (v = 0; v < ARRAY_SIZE(values); v++) {
 			/* LRI garbage */
-			*cs++ = MI_LOAD_REGISTER_IMM(1);
-			*cs++ = reg;
+			*cs++ = MI_LOAD_REGISTER_IMM(engine, 1);
+			*cs++ = regLRI;
 			*cs++ = values[v];
 
 			/* SRM result */
@@ -528,8 +530,8 @@ static int check_dirty_whitelist(struct i915_gem_context *ctx,
 		}
 		for (v = 0; v < ARRAY_SIZE(values); v++) {
 			/* LRI garbage */
-			*cs++ = MI_LOAD_REGISTER_IMM(1);
-			*cs++ = reg;
+			*cs++ = MI_LOAD_REGISTER_IMM(engine, 1);
+			*cs++ = regLRI;
 			*cs++ = ~values[v];
 
 			/* SRM result */
@@ -823,9 +825,10 @@ static int scrub_whitelisted_registers(struct i915_gem_context *ctx,
 		goto err_batch;
 	}
 
-	*cs++ = MI_LOAD_REGISTER_IMM(whitelist_writable_count(engine));
+	*cs++ = MI_LOAD_REGISTER_IMM(engine, whitelist_writable_count(engine));
 	for (i = 0; i < engine->whitelist.count; i++) {
-		u32 reg = i915_mmio_reg_offset(engine->whitelist.list[i].reg);
+		u32 reg = i915_get_lri_reg(engine,
+					   engine->whitelist.list[i].reg);
 
 		if (ro_register(reg))
 			continue;
diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.c b/drivers/gpu/drm/i915/gvt/mmio_context.c
index 4208e40445b1..e44e1d14285c 100644
--- a/drivers/gpu/drm/i915/gvt/mmio_context.c
+++ b/drivers/gpu/drm/i915/gvt/mmio_context.c
@@ -210,14 +210,14 @@ restore_context_mmio_for_inhibit(struct intel_vgpu *vgpu,
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
-	*cs++ = MI_LOAD_REGISTER_IMM(count);
+	*cs++ = MI_LOAD_REGISTER_IMM(req->engine, count);
 	for (mmio = gvt->engine_mmio_list.mmio;
 	     i915_mmio_reg_valid(mmio->reg); mmio++) {
 		if (mmio->ring_id != ring_id ||
 		    !mmio->in_context)
 			continue;
 
-		*cs++ = i915_mmio_reg_offset(mmio->reg);
+		*cs++ = i915_get_lri_reg(req->engine, mmio->reg);
 		*cs++ = vgpu_vreg_t(vgpu, mmio->reg) |
 				(mmio->mask << 16);
 		gvt_dbg_core("add lri reg pair 0x%x:0x%x in inhibit ctx, vgpu:%d, rind_id:%d\n",
@@ -245,7 +245,11 @@ restore_render_mocs_control_for_inhibit(struct intel_vgpu *vgpu,
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
-	*cs++ = MI_LOAD_REGISTER_IMM(GEN9_MOCS_SIZE);
+	/*
+	 * GEN9_GFX_MOCS is not engine relative, therefore there is no
+	 * need for relative addressing.
+	 */
+	*cs++ = __MI_LOAD_REGISTER_IMM(GEN9_MOCS_SIZE);
 
 	for (index = 0; index < GEN9_MOCS_SIZE; index++) {
 		*cs++ = i915_mmio_reg_offset(GEN9_GFX_MOCS(index));
@@ -272,7 +276,11 @@ restore_render_mocs_l3cc_for_inhibit(struct intel_vgpu *vgpu,
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
-	*cs++ = MI_LOAD_REGISTER_IMM(GEN9_MOCS_SIZE / 2);
+	/*
+	 * GEN9_LNCFCMOCS is not engine relative, therefore there is no
+	 * need for relative addressing.
+	 */
+	*cs++ = __MI_LOAD_REGISTER_IMM(GEN9_MOCS_SIZE / 2);
 
 	for (index = 0; index < GEN9_MOCS_SIZE / 2; index++) {
 		*cs++ = i915_mmio_reg_offset(GEN9_LNCFCMOCS(index));
diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 24555102e198..15ba08ab8180 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -222,7 +222,7 @@ static const struct drm_i915_cmd_descriptor common_cmds[] = {
 	CMD(  MI_SUSPEND_FLUSH,                 SMI,    F,  1,      S  ),
 	CMD(  MI_SEMAPHORE_MBOX,                SMI,   !F,  0xFF,   R  ),
 	CMD(  MI_STORE_DWORD_INDEX,             SMI,   !F,  0xFF,   R  ),
-	CMD(  MI_LOAD_REGISTER_IMM(1),          SMI,   !F,  0xFF,   W,
+	CMD(  __MI_LOAD_REGISTER_IMM(1),        SMI,   !F,  0xFF,   W,
 	      .reg = { .offset = 1, .mask = 0x007FFFFC, .step = 2 }    ),
 	CMD(  MI_STORE_REGISTER_MEM,            SMI,    F,  3,     W | B,
 	      .reg = { .offset = 1, .mask = 0x007FFFFC },
@@ -1183,8 +1183,8 @@ static bool check_cmd(const struct intel_engine_cs *engine,
 					return false;
 				}
 
-				if (desc->cmd.value == MI_LOAD_REGISTER_IMM(1) &&
-				    (offset + 2 > length ||
+				if (desc->cmd.value == __MI_LOAD_REGISTER_IMM(1)
+				    && (offset + 2 > length ||
 				     (cmd[offset + 1] & reg->mask) != reg->value)) {
 					DRM_DEBUG_DRIVER("CMD: Rejected LRI to masked register 0x%08X\n",
 							 reg_addr);
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index e42b86827d6b..83abdda05ba2 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1692,17 +1692,23 @@ gen8_update_reg_state_unlocked(struct i915_perf_stream *stream,
 	};
 	int i;
 
-	CTX_REG(reg_state, ctx_oactxctrl, GEN8_OACTXCONTROL,
+	/*
+	 * NB: The LRI instruction is generated by the hardware.
+	 * Should we read it in and assert that the offset flag is set?
+	 */
+
+	CTX_REG(ce->engine, reg_state, ctx_oactxctrl, GEN8_OACTXCONTROL,
 		(stream->period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) |
 		(stream->periodic ? GEN8_OA_TIMER_ENABLE : 0) |
 		GEN8_OA_COUNTER_RESUME);
 
 	for (i = 0; i < ARRAY_SIZE(flex_regs); i++) {
-		CTX_REG(reg_state, ctx_flexeu0 + i * 2, flex_regs[i],
+		CTX_REG(ce->engine, reg_state, ctx_flexeu0 + i * 2,
+			flex_regs[i],
 			oa_config_flex_reg(oa_config, flex_regs[i]));
 	}
 
-	CTX_REG(reg_state,
+	CTX_REG(ce->engine, reg_state,
 		CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
 		intel_sseu_make_rpcs(i915, &ce->sseu));
 }
@@ -1751,9 +1757,9 @@ gen8_load_flex(struct i915_request *rq,
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
-	*cs++ = MI_LOAD_REGISTER_IMM(count);
+	*cs++ = MI_LOAD_REGISTER_IMM(rq->engine, count);
 	do {
-		*cs++ = i915_mmio_reg_offset(flex->reg);
+		*cs++ = i915_get_lri_reg(rq->engine, flex->reg);
 		*cs++ = flex->value;
 	} while (flex++, --count);
 	*cs++ = MI_NOOP;
-- 
2.21.0.5.gaeb582a983

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-09-24 22:28 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-23 23:51 [PATCH 0/2] drm/i915: Engine relative MMIO John.C.Harrison
2019-09-23 23:51 ` [PATCH 1/2] " John.C.Harrison
2019-09-24  8:45   ` Tvrtko Ursulin
2019-09-24 22:28     ` Rodrigo Vivi
2019-09-24  9:16   ` Tvrtko Ursulin
2019-09-23 23:51 ` [PATCH 2/2] drm/i915: Engine relative MMIO for Gen12 John.C.Harrison
2019-09-24  9:12   ` Tvrtko Ursulin
2019-09-24  0:25 ` ✓ Fi.CI.BAT: success for drm/i915: Engine relative MMIO (rev9) Patchwork
2019-09-24 14:57 ` ✗ Fi.CI.IGT: failure " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2019-09-19  6:47 [PATCH 0/2] drm/i915: Engine relative MMIO John.C.Harrison
2019-09-19  6:47 ` [PATCH 1/2] " John.C.Harrison
2019-08-22 18:02 [PATCH 0/2] " John.C.Harrison
2019-08-22 18:02 ` [PATCH 1/2] " John.C.Harrison

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.