All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Matthew Brost <matthew.brost@intel.com>,
	John.C.Harrison@Intel.com, Intel-GFX@Lists.FreeDesktop.Org,
	Chris Wilson <chris@chris-wilson.co.uk>,
	Chris P <chris.p.wilson@intel.com>
Subject: Re: [PATCH] drm/i915: Engine relative MMIO
Date: Thu, 20 Jun 2019 17:33:12 +0100	[thread overview]
Message-ID: <43ff893f-a53a-a808-676b-ea5b2eed2ff3@linux.intel.com> (raw)
In-Reply-To: <20190620072444.GA6272@sdutt>


[fixed Chris' email]

On 20/06/2019 08:24, Matthew Brost wrote:
> On Mon, May 13, 2019 at 12:45:52PM -0700, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> With virtual engines, it is no longer possible to know which specific
>> physical engine a given request will be executed on at the time that
>> request is generated. This means that the request itself must be engine
>> agnostic - any direct register writes must be relative to the engine
>> and not absolute addresses.
>>
>> The LRI command has support for engine relative addressing. However,
>> the mechanism is not transparent to the driver. The scheme for Gen11
>> (MI_LRI_ADD_CS_MMIO_START) requires the LRI address to have no
>> absolute engine base component. The hardware then adds on the correct
>> engine offset at execution time.
>>
>> Due to the non-trivial and differing schemes on different hardware, it
>> is not possible to simply update the code that creates the LRI
>> commands to set a remap flag and let the hardware get on with it.
>> Instead, this patch adds function wrappers for generating the LRI
>> command itself and then for constructing the correct address to use
>> with the LRI.
>>
>> v2: Fix build break in GVT. Remove flags parameter [review feedback
>> from Chris W].
>>
>> v3: Fix build break in selftest. Rebase to newer base tree and fix
>> merge conflict.
>>
>> v4: More rebasing. Rmoved relative addressing support from Gen7-9 only
>> code paths [review feedback from Chris W].
>>
>> v5: More rebasing (new 'gt' directory). Fix white space issue. Use
>> COPY class rather than BCS0 id for checking against BCS engine.
>>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> CC: Rodrigo Vivi <rodrigo.vivi@intel.com>
>> CC: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>> CC: Wilson, Chris P <chris.p.wilson@intel.com>
> 
> I've reviewed this series and to me it looks like all the concerns have 
> been
> addressed + CI is passing. Do not feel comfortable give a R-B but this 
> patch is
> needed for a series I'm working on so I'd like to see it merged. Gentle 
> ping to
> get this merged.

AFAIR the conclusion between Rodrigo and me was to leave the existing 
macro as is and leave a new one (in uppercase) for relative addressing.

So leave MI_LOAD_REGISTER_IMM as is, add MI_LOAD_REGISTER_IMM_REL for 
usage on relevant (new) paths.

Chris did not say anything but I expect this would also satisfy his wish 
to keep the patch as small as possible eg not touch legacy code paths.

Regards,

Tvrtko

> 
> Thanks,
> Matt
> 
>> ---
>> drivers/gpu/drm/i915/gt/intel_engine.h        |  4 +
>> drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 11 +++
>> drivers/gpu/drm/i915/gt/intel_gpu_commands.h  |  6 +-
>> drivers/gpu/drm/i915/gt/intel_lrc.c           | 79 ++++++++++---------
>> drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |  4 +-
>> drivers/gpu/drm/i915/gt/intel_mocs.c          | 17 ++--
>> drivers/gpu/drm/i915/gt/intel_ringbuffer.c    | 45 +++++++++--
>> drivers/gpu/drm/i915/gt/intel_workarounds.c   |  4 +-
>> .../gpu/drm/i915/gt/selftest_workarounds.c    |  9 ++-
>> drivers/gpu/drm/i915/gvt/mmio_context.c       | 16 +++-
>> drivers/gpu/drm/i915/i915_cmd_parser.c        |  4 +-
>> drivers/gpu/drm/i915/i915_gem_context.c       | 12 +--
>> drivers/gpu/drm/i915/i915_gem_execbuffer.c    |  3 +-
>> drivers/gpu/drm/i915/i915_perf.c              | 19 +++--
>> 14 files changed, 154 insertions(+), 79 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h 
>> b/drivers/gpu/drm/i915/gt/intel_engine.h
>> index 9359b3a7ad9c..3506c992182c 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_engine.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
>> @@ -546,4 +546,8 @@ static inline bool inject_preempt_hang(struct 
>> intel_engine_execlists *execlists)
>>
>> #endif
>>
>> +bool i915_engine_has_relative_lri(const struct intel_engine_cs *engine);
>> +u32 i915_get_lri_cmd(const struct intel_engine_cs *engine, u32 
>> word_count);
>> +u32 i915_get_lri_reg(const struct intel_engine_cs *engine, i915_reg_t 
>> reg);
>> +
>> #endif /* _INTEL_RINGBUFFER_H_ */
>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
>> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>> index 4c3753c1b573..233295d689d2 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>> @@ -253,6 +253,17 @@ static u32 __engine_mmio_base(struct 
>> drm_i915_private *i915,
>>     return bases[i].base;
>> }
>>
>> +bool i915_engine_has_relative_lri(const struct intel_engine_cs *engine)
>> +{
>> +    if (INTEL_GEN(engine->i915) < 11)
>> +        return false;
>> +
>> +    if (engine->class == COPY_ENGINE_CLASS)
>> +        return false;
>> +
>> +    return true;
>> +}
>> +
>> static void __sprint_engine_name(char *name, const struct engine_info 
>> *info)
>> {
>>     WARN_ON(snprintf(name, INTEL_ENGINE_CS_MAX_NAME, "%s%u",
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h 
>> b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
>> index a34ece53a771..e7784b3fb759 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
>> @@ -123,9 +123,13 @@
>>  *   simply ignores the register load under certain conditions.
>>  * - One can actually load arbitrary many arbitrary registers: Simply 
>> issue x
>>  *   address/value pairs. Don't overdue it, though, x <= 2^4 must hold!
>> + * - Newer hardware supports engine relative addresses but older 
>> hardware does
>> + *   not. So never call MI_LRI directly, always use the 
>> i915_get_lri_cmd()
>> + *   and i915_get_lri_reg() helper functions.
>>  */
>> -#define MI_LOAD_REGISTER_IMM(x)    MI_INSTR(0x22, 2*(x)-1)
>> +#define __MI_LOAD_REGISTER_IMM(x)    MI_INSTR(0x22, 2*(x)-1)
>> #define   MI_LRI_FORCE_POSTED        (1<<12)
>> +#define   MI_LRI_ADD_CS_MMIO_START_GEN11    (1<<19)
>> #define MI_STORE_REGISTER_MEM        MI_INSTR(0x24, 1)
>> #define MI_STORE_REGISTER_MEM_GEN8   MI_INSTR(0x24, 2)
>> #define   MI_SRM_LRM_GLOBAL_GTT        (1<<22)
>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
>> b/drivers/gpu/drm/i915/gt/intel_lrc.c
>> index e18623def282..49a9a6648b9c 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>> @@ -1388,14 +1388,15 @@ static int emit_pdps(struct i915_request *rq)
>>         return PTR_ERR(cs);
>>
>>     /* Ensure the LRI have landed before we invalidate & continue */
>> -    *cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES) | 
>> MI_LRI_FORCE_POSTED;
>> +    *cs++ = i915_get_lri_cmd(engine, 2 * GEN8_3LVL_PDPES) |
>> +                 MI_LRI_FORCE_POSTED;
>>     for (i = GEN8_3LVL_PDPES; i--; ) {
>>         const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
>>         u32 base = engine->mmio_base;
>>
>> -        *cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(base, i));
>> +        *cs++ = i915_get_lri_reg(engine, GEN8_RING_PDP_UDW(base, i));
>>         *cs++ = upper_32_bits(pd_daddr);
>> -        *cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(base, i));
>> +        *cs++ = i915_get_lri_reg(engine, GEN8_RING_PDP_LDW(base, i));
>>         *cs++ = lower_32_bits(pd_daddr);
>>     }
>>     *cs++ = MI_NOOP;
>> @@ -1469,7 +1470,8 @@ gen8_emit_flush_coherentl3_wa(struct 
>> intel_engine_cs *engine, u32 *batch)
>>     *batch++ = i915_scratch_offset(engine->i915) + 256;
>>     *batch++ = 0;
>>
>> -    *batch++ = MI_LOAD_REGISTER_IMM(1);
>> +    /* Gen8/9 only so no need to support relative offsets */
>> +    *batch++ = __MI_LOAD_REGISTER_IMM(1);
>>     *batch++ = i915_mmio_reg_offset(GEN8_L3SQCREG4);
>>     *batch++ = 0x40400000 | GEN8_LQSC_FLUSH_COHERENT_LINES;
>>
>> @@ -1540,13 +1542,14 @@ struct lri {
>>     u32 value;
>> };
>>
>> -static u32 *emit_lri(u32 *batch, const struct lri *lri, unsigned int 
>> count)
>> +static u32 *emit_lri(struct intel_engine_cs *engine, u32 *batch,
>> +             const struct lri *lri, unsigned int count)
>> {
>>     GEM_BUG_ON(!count || count > 63);
>>
>> -    *batch++ = MI_LOAD_REGISTER_IMM(count);
>> +    *batch++ = i915_get_lri_cmd(engine, count);
>>     do {
>> -        *batch++ = i915_mmio_reg_offset(lri->reg);
>> +        *batch++ = i915_get_lri_reg(engine, lri->reg);
>>         *batch++ = lri->value;
>>     } while (lri++, --count);
>>     *batch++ = MI_NOOP;
>> @@ -1584,7 +1587,7 @@ static u32 *gen9_init_indirectctx_bb(struct 
>> intel_engine_cs *engine, u32 *batch)
>>     /* WaFlushCoherentL3CacheLinesAtContextSwitch:skl,bxt,glk */
>>     batch = gen8_emit_flush_coherentl3_wa(engine, batch);
>>
>> -    batch = emit_lri(batch, lri, ARRAY_SIZE(lri));
>> +    batch = emit_lri(engine, batch, lri, ARRAY_SIZE(lri));
>>
>>     /* WaMediaPoolStateCmdInWABB:bxt,glk */
>>     if (HAS_POOLED_EU(engine->i915)) {
>> @@ -2537,10 +2540,10 @@ static void execlists_init_reg_state(u32 *regs,
>>      * values (including all the missing MI_LOAD_REGISTER_IMM commands 
>> that
>>      * we are not initializing here).
>>      */
>> -    regs[CTX_LRI_HEADER_0] = MI_LOAD_REGISTER_IMM(rcs ? 14 : 11) |
>> -                 MI_LRI_FORCE_POSTED;
>> +    regs[CTX_LRI_HEADER_0] = i915_get_lri_cmd(engine, rcs ? 14 : 11) |
>> +                          MI_LRI_FORCE_POSTED;
>>
>> -    CTX_REG(regs, CTX_CONTEXT_CONTROL, RING_CONTEXT_CONTROL(base),
>> +    CTX_REG(engine, regs, CTX_CONTEXT_CONTROL, 
>> RING_CONTEXT_CONTROL(base),
>>         _MASKED_BIT_DISABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT) |
>>         _MASKED_BIT_ENABLE(CTX_CTRL_INHIBIT_SYN_CTX_SWITCH));
>>     if (INTEL_GEN(engine->i915) < 11) {
>> @@ -2548,22 +2551,23 @@ static void execlists_init_reg_state(u32 *regs,
>>             _MASKED_BIT_DISABLE(CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT |
>>                         CTX_CTRL_RS_CTX_ENABLE);
>>     }
>> -    CTX_REG(regs, CTX_RING_HEAD, RING_HEAD(base), 0);
>> -    CTX_REG(regs, CTX_RING_TAIL, RING_TAIL(base), 0);
>> -    CTX_REG(regs, CTX_RING_BUFFER_START, RING_START(base), 0);
>> -    CTX_REG(regs, CTX_RING_BUFFER_CONTROL, RING_CTL(base),
>> +    CTX_REG(engine, regs, CTX_RING_HEAD, RING_HEAD(base), 0);
>> +    CTX_REG(engine, regs, CTX_RING_TAIL, RING_TAIL(base), 0);
>> +    CTX_REG(engine, regs, CTX_RING_BUFFER_START, RING_START(base), 0);
>> +    CTX_REG(engine, regs, CTX_RING_BUFFER_CONTROL, RING_CTL(base),
>>         RING_CTL_SIZE(ring->size) | RING_VALID);
>> -    CTX_REG(regs, CTX_BB_HEAD_U, RING_BBADDR_UDW(base), 0);
>> -    CTX_REG(regs, CTX_BB_HEAD_L, RING_BBADDR(base), 0);
>> -    CTX_REG(regs, CTX_BB_STATE, RING_BBSTATE(base), RING_BB_PPGTT);
>> -    CTX_REG(regs, CTX_SECOND_BB_HEAD_U, RING_SBBADDR_UDW(base), 0);
>> -    CTX_REG(regs, CTX_SECOND_BB_HEAD_L, RING_SBBADDR(base), 0);
>> -    CTX_REG(regs, CTX_SECOND_BB_STATE, RING_SBBSTATE(base), 0);
>> +    CTX_REG(engine, regs, CTX_BB_HEAD_U, RING_BBADDR_UDW(base), 0);
>> +    CTX_REG(engine, regs, CTX_BB_HEAD_L, RING_BBADDR(base), 0);
>> +    CTX_REG(engine, regs, CTX_BB_STATE, RING_BBSTATE(base), 
>> RING_BB_PPGTT);
>> +    CTX_REG(engine, regs, CTX_SECOND_BB_HEAD_U, 
>> RING_SBBADDR_UDW(base), 0);
>> +    CTX_REG(engine, regs, CTX_SECOND_BB_HEAD_L, RING_SBBADDR(base), 0);
>> +    CTX_REG(engine, regs, CTX_SECOND_BB_STATE, RING_SBBSTATE(base), 0);
>>     if (rcs) {
>>         struct i915_ctx_workarounds *wa_ctx = &engine->wa_ctx;
>>
>> -        CTX_REG(regs, CTX_RCS_INDIRECT_CTX, RING_INDIRECT_CTX(base), 0);
>> -        CTX_REG(regs, CTX_RCS_INDIRECT_CTX_OFFSET,
>> +        CTX_REG(engine, regs, CTX_RCS_INDIRECT_CTX,
>> +            RING_INDIRECT_CTX(base), 0);
>> +        CTX_REG(engine, regs, CTX_RCS_INDIRECT_CTX_OFFSET,
>>             RING_INDIRECT_CTX_OFFSET(base), 0);
>>         if (wa_ctx->indirect_ctx.size) {
>>             u32 ggtt_offset = i915_ggtt_offset(wa_ctx->vma);
>> @@ -2576,7 +2580,8 @@ static void execlists_init_reg_state(u32 *regs,
>>                 intel_lr_indirect_ctx_offset(engine) << 6;
>>         }
>>
>> -        CTX_REG(regs, CTX_BB_PER_CTX_PTR, RING_BB_PER_CTX_PTR(base), 0);
>> +        CTX_REG(engine, regs, CTX_BB_PER_CTX_PTR,
>> +            RING_BB_PER_CTX_PTR(base), 0);
>>         if (wa_ctx->per_ctx.size) {
>>             u32 ggtt_offset = i915_ggtt_offset(wa_ctx->vma);
>>
>> @@ -2585,18 +2590,19 @@ static void execlists_init_reg_state(u32 *regs,
>>         }
>>     }
>>
>> -    regs[CTX_LRI_HEADER_1] = MI_LOAD_REGISTER_IMM(9) | 
>> MI_LRI_FORCE_POSTED;
>> +    regs[CTX_LRI_HEADER_1] = i915_get_lri_cmd(engine, 9) |
>> +                          MI_LRI_FORCE_POSTED;
>>
>> -    CTX_REG(regs, CTX_CTX_TIMESTAMP, RING_CTX_TIMESTAMP(base), 0);
>> +    CTX_REG(engine, regs, CTX_CTX_TIMESTAMP, 
>> RING_CTX_TIMESTAMP(base), 0);
>>     /* PDP values well be assigned later if needed */
>> -    CTX_REG(regs, CTX_PDP3_UDW, GEN8_RING_PDP_UDW(base, 3), 0);
>> -    CTX_REG(regs, CTX_PDP3_LDW, GEN8_RING_PDP_LDW(base, 3), 0);
>> -    CTX_REG(regs, CTX_PDP2_UDW, GEN8_RING_PDP_UDW(base, 2), 0);
>> -    CTX_REG(regs, CTX_PDP2_LDW, GEN8_RING_PDP_LDW(base, 2), 0);
>> -    CTX_REG(regs, CTX_PDP1_UDW, GEN8_RING_PDP_UDW(base, 1), 0);
>> -    CTX_REG(regs, CTX_PDP1_LDW, GEN8_RING_PDP_LDW(base, 1), 0);
>> -    CTX_REG(regs, CTX_PDP0_UDW, GEN8_RING_PDP_UDW(base, 0), 0);
>> -    CTX_REG(regs, CTX_PDP0_LDW, GEN8_RING_PDP_LDW(base, 0), 0);
>> +    CTX_REG(engine, regs, CTX_PDP3_UDW, GEN8_RING_PDP_UDW(base, 3), 0);
>> +    CTX_REG(engine, regs, CTX_PDP3_LDW, GEN8_RING_PDP_LDW(base, 3), 0);
>> +    CTX_REG(engine, regs, CTX_PDP2_UDW, GEN8_RING_PDP_UDW(base, 2), 0);
>> +    CTX_REG(engine, regs, CTX_PDP2_LDW, GEN8_RING_PDP_LDW(base, 2), 0);
>> +    CTX_REG(engine, regs, CTX_PDP1_UDW, GEN8_RING_PDP_UDW(base, 1), 0);
>> +    CTX_REG(engine, regs, CTX_PDP1_LDW, GEN8_RING_PDP_LDW(base, 1), 0);
>> +    CTX_REG(engine, regs, CTX_PDP0_UDW, GEN8_RING_PDP_UDW(base, 0), 0);
>> +    CTX_REG(engine, regs, CTX_PDP0_LDW, GEN8_RING_PDP_LDW(base, 0), 0);
>>
>>     if (i915_vm_is_4lvl(&ppgtt->vm)) {
>>         /* 64b PPGTT (48bit canonical)
>> @@ -2612,8 +2618,9 @@ static void execlists_init_reg_state(u32 *regs,
>>     }
>>
>>     if (rcs) {
>> -        regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
>> -        CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE, 0);
>> +        regs[CTX_LRI_HEADER_2] = i915_get_lri_cmd(engine, 1);
>> +        CTX_REG(engine, regs, CTX_R_PWR_CLK_STATE,
>> +            GEN8_R_PWR_CLK_STATE, 0);
>>
>>         i915_oa_init_reg_state(engine, ce, regs);
>>     }
>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h 
>> b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
>> index 5ef932d810a7..40b1142d0d74 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
>> @@ -39,10 +39,10 @@
>> #define CTX_R_PWR_CLK_STATE        0x42
>> #define CTX_END                0x44
>>
>> -#define CTX_REG(reg_state, pos, reg, val) do { \
>> +#define CTX_REG(engine, reg_state, pos, reg, val) do { \
>>     u32 *reg_state__ = (reg_state); \
>>     const u32 pos__ = (pos); \
>> -    (reg_state__)[(pos__) + 0] = i915_mmio_reg_offset(reg); \
>> +    (reg_state__)[(pos__) + 0] = i915_get_lri_reg((engine), (reg)); \
>>     (reg_state__)[(pos__) + 1] = (val); \
>> } while (0)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_mocs.c 
>> b/drivers/gpu/drm/i915/gt/intel_mocs.c
>> index 79df66022d3a..5dae6333481d 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_mocs.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_mocs.c
>> @@ -324,9 +324,6 @@ static u32 get_entry_control(const struct 
>> drm_i915_mocs_table *table,
>> /**
>>  * intel_mocs_init_engine() - emit the mocs control table
>>  * @engine:    The engine for whom to emit the registers.
>> - *
>> - * This function simply emits a MI_LOAD_REGISTER_IMM command for the
>> - * given table starting at the given address.
>>  */
>> void intel_mocs_init_engine(struct intel_engine_cs *engine)
>> {
>> @@ -380,18 +377,20 @@ static int emit_mocs_control_table(struct 
>> i915_request *rq,
>>     if (IS_ERR(cs))
>>         return PTR_ERR(cs);
>>
>> -    *cs++ = MI_LOAD_REGISTER_IMM(table->n_entries);
>> +    *cs++ = i915_get_lri_cmd(rq->engine, table->n_entries);
>>
>>     for (index = 0; index < table->size; index++) {
>>         u32 value = get_entry_control(table, index);
>>
>> -        *cs++ = i915_mmio_reg_offset(mocs_register(engine, index));
>> +        *cs++ = i915_get_lri_reg(rq->engine,
>> +                     mocs_register(engine, index));
>>         *cs++ = value;
>>     }
>>
>>     /* All remaining entries are also unused */
>>     for (; index < table->n_entries; index++) {
>> -        *cs++ = i915_mmio_reg_offset(mocs_register(engine, index));
>> +        *cs++ = i915_get_lri_reg(rq->engine,
>> +                     mocs_register(engine, index));
>>         *cs++ = unused_value;
>>     }
>>
>> @@ -449,7 +448,11 @@ static int emit_mocs_l3cc_table(struct 
>> i915_request *rq,
>>     if (IS_ERR(cs))
>>         return PTR_ERR(cs);
>>
>> -    *cs++ = MI_LOAD_REGISTER_IMM(table->n_entries / 2);
>> +    /*
>> +     * GEN9_LNCFCMOCS is not engine relative, therefore there is no
>> +     * need for relative addressing?
>> +     */
>> +    *cs++ = __MI_LOAD_REGISTER_IMM(table->n_entries / 2);
>>
>>     for (i = 0; i < table->size / 2; i++) {
>>         u16 low = get_entry_l3cc(table, 2 * i);
>> diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c 
>> b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
>> index f0d60affdba3..e98c2fe727a5 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
>> @@ -1516,12 +1516,13 @@ static int load_pd_dir(struct i915_request *rq,
>>     if (IS_ERR(cs))
>>         return PTR_ERR(cs);
>>
>> -    *cs++ = MI_LOAD_REGISTER_IMM(1);
>> -    *cs++ = i915_mmio_reg_offset(RING_PP_DIR_DCLV(engine->mmio_base));
>> +    /* Can these not be merged into a single LRI??? */
>> +    *cs++ = i915_get_lri_cmd(engine, 1);
>> +    *cs++ = i915_get_lri_reg(engine, 
>> RING_PP_DIR_DCLV(engine->mmio_base));
>>     *cs++ = PP_DIR_DCLV_2G;
>>
>> -    *cs++ = MI_LOAD_REGISTER_IMM(1);
>> -    *cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine->mmio_base));
>> +    *cs++ = i915_get_lri_cmd(engine, 1);
>> +    *cs++ = i915_get_lri_reg(engine, 
>> RING_PP_DIR_BASE(engine->mmio_base));
>>     *cs++ = ppgtt->pd.base.ggtt_offset << 10;
>>
>>     intel_ring_advance(rq, cs);
>> @@ -1589,7 +1590,11 @@ static inline int mi_set_context(struct 
>> i915_request *rq, u32 flags)
>>         if (num_engines) {
>>             struct intel_engine_cs *signaller;
>>
>> -            *cs++ = MI_LOAD_REGISTER_IMM(num_engines);
>> +            /*
>> +             * Must use absolute engine address as the register
>> +             * write is targeting a different engine.
>> +             */
>> +            *cs++ = __MI_LOAD_REGISTER_IMM(num_engines);
>>             for_each_engine(signaller, i915, id) {
>>                 if (signaller == engine)
>>                     continue;
>> @@ -1643,7 +1648,11 @@ static inline int mi_set_context(struct 
>> i915_request *rq, u32 flags)
>>             struct intel_engine_cs *signaller;
>>             i915_reg_t last_reg = {}; /* keep gcc quiet */
>>
>> -            *cs++ = MI_LOAD_REGISTER_IMM(num_engines);
>> +            /*
>> +             * Must use absolute engine address as the register
>> +             * write is targeting a different engine.
>> +             */
>> +            *cs++ = __MI_LOAD_REGISTER_IMM(num_engines);
>>             for_each_engine(signaller, i915, id) {
>>                 if (signaller == engine)
>>                     continue;
>> @@ -1687,9 +1696,9 @@ static int remap_l3(struct i915_request *rq, int 
>> slice)
>>      * here because no other code should access these registers other 
>> than
>>      * at initialization time.
>>      */
>> -    *cs++ = MI_LOAD_REGISTER_IMM(GEN7_L3LOG_SIZE/4);
>> +    *cs++ = i915_get_lri_cmd(rq->engine, GEN7_L3LOG_SIZE/4);
>>     for (i = 0; i < GEN7_L3LOG_SIZE/4; i++) {
>> -        *cs++ = i915_mmio_reg_offset(GEN7_L3LOG(slice, i));
>> +        *cs++ = i915_get_lri_reg(rq->engine, GEN7_L3LOG(slice, i));
>>         *cs++ = remap_info[i];
>>     }
>>     *cs++ = MI_NOOP;
>> @@ -2335,3 +2344,23 @@ int intel_ring_submission_init(struct 
>> intel_engine_cs *engine)
>>     intel_engine_cleanup_common(engine);
>>     return err;
>> }
>> +
>> +u32 i915_get_lri_cmd(const struct intel_engine_cs *engine, u32 
>> word_count)
>> +{
>> +    u32 word;
>> +
>> +    word = __MI_LOAD_REGISTER_IMM(word_count);
>> +
>> +    if (i915_engine_has_relative_lri(engine))
>> +        word |= MI_LRI_ADD_CS_MMIO_START_GEN11;
>> +
>> +    return word;
>> +}
>> +
>> +u32 i915_get_lri_reg(const struct intel_engine_cs *engine, i915_reg_t 
>> reg)
>> +{
>> +    if (!i915_engine_has_relative_lri(engine))
>> +        return i915_mmio_reg_offset(reg);
>> +
>> +    return i915_mmio_reg_offset(reg) - engine->mmio_base;
>> +}
>> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
>> b/drivers/gpu/drm/i915/gt/intel_workarounds.c
>> index 43e290306551..d5edc10c860c 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
>> @@ -625,9 +625,9 @@ int intel_engine_emit_ctx_wa(struct i915_request *rq)
>>     if (IS_ERR(cs))
>>         return PTR_ERR(cs);
>>
>> -    *cs++ = MI_LOAD_REGISTER_IMM(wal->count);
>> +    *cs++ = i915_get_lri_cmd(rq->engine, wal->count);
>>     for (i = 0, wa = wal->list; i < wal->count; i++, wa++) {
>> -        *cs++ = i915_mmio_reg_offset(wa->reg);
>> +        *cs++ = i915_get_lri_reg(rq->engine, wa->reg);
>>         *cs++ = wa->val;
>>     }
>>     *cs++ = MI_NOOP;
>> diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c 
>> b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>> index 9f7680b9984b..b0513c1de53c 100644
>> --- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>> +++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
>> @@ -442,6 +442,7 @@ static int check_dirty_whitelist(struct 
>> i915_gem_context *ctx,
>>
>>     for (i = 0; i < engine->whitelist.count; i++) {
>>         u32 reg = i915_mmio_reg_offset(engine->whitelist.list[i].reg);
>> +        u32 regLRI = i915_get_lri_reg(engine, 
>> engine->whitelist.list[i].reg);
>>         u64 addr = scratch->node.start;
>>         struct i915_request *rq;
>>         u32 srm, lrm, rsvd;
>> @@ -474,8 +475,8 @@ static int check_dirty_whitelist(struct 
>> i915_gem_context *ctx,
>>         idx = 1;
>>         for (v = 0; v < ARRAY_SIZE(values); v++) {
>>             /* LRI garbage */
>> -            *cs++ = MI_LOAD_REGISTER_IMM(1);
>> -            *cs++ = reg;
>> +            *cs++ = i915_get_lri_cmd(engine, 1);
>> +            *cs++ = regLRI;
>>             *cs++ = values[v];
>>
>>             /* SRM result */
>> @@ -487,8 +488,8 @@ static int check_dirty_whitelist(struct 
>> i915_gem_context *ctx,
>>         }
>>         for (v = 0; v < ARRAY_SIZE(values); v++) {
>>             /* LRI garbage */
>> -            *cs++ = MI_LOAD_REGISTER_IMM(1);
>> -            *cs++ = reg;
>> +            *cs++ = i915_get_lri_cmd(engine, 1);
>> +            *cs++ = regLRI;
>>             *cs++ = ~values[v];
>>
>>             /* SRM result */
>> diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.c 
>> b/drivers/gpu/drm/i915/gvt/mmio_context.c
>> index a27bdd3951f6..3807ce5fe564 100644
>> --- a/drivers/gpu/drm/i915/gvt/mmio_context.c
>> +++ b/drivers/gpu/drm/i915/gvt/mmio_context.c
>> @@ -200,14 +200,14 @@ restore_context_mmio_for_inhibit(struct 
>> intel_vgpu *vgpu,
>>     if (IS_ERR(cs))
>>         return PTR_ERR(cs);
>>
>> -    *cs++ = MI_LOAD_REGISTER_IMM(count);
>> +    *cs++ = i915_get_lri_cmd(req->engine, count);
>>     for (mmio = gvt->engine_mmio_list.mmio;
>>          i915_mmio_reg_valid(mmio->reg); mmio++) {
>>         if (mmio->ring_id != ring_id ||
>>             !mmio->in_context)
>>             continue;
>>
>> -        *cs++ = i915_mmio_reg_offset(mmio->reg);
>> +        *cs++ = i915_get_lri_reg(req->engine, mmio->reg);
>>         *cs++ = vgpu_vreg_t(vgpu, mmio->reg) |
>>                 (mmio->mask << 16);
>>         gvt_dbg_core("add lri reg pair 0x%x:0x%x in inhibit ctx, 
>> vgpu:%d, rind_id:%d\n",
>> @@ -235,7 +235,11 @@ restore_render_mocs_control_for_inhibit(struct 
>> intel_vgpu *vgpu,
>>     if (IS_ERR(cs))
>>         return PTR_ERR(cs);
>>
>> -    *cs++ = MI_LOAD_REGISTER_IMM(GEN9_MOCS_SIZE);
>> +    /*
>> +     * GEN9_GFX_MOCS is not engine relative, therefore there is no
>> +     * need for relative addressing.
>> +     */
>> +    *cs++ = __MI_LOAD_REGISTER_IMM(GEN9_MOCS_SIZE);
>>
>>     for (index = 0; index < GEN9_MOCS_SIZE; index++) {
>>         *cs++ = i915_mmio_reg_offset(GEN9_GFX_MOCS(index));
>> @@ -262,7 +266,11 @@ restore_render_mocs_l3cc_for_inhibit(struct 
>> intel_vgpu *vgpu,
>>     if (IS_ERR(cs))
>>         return PTR_ERR(cs);
>>
>> -    *cs++ = MI_LOAD_REGISTER_IMM(GEN9_MOCS_SIZE / 2);
>> +    /*
>> +     * GEN9_LNCFCMOCS is not engine relative, therefore there is no
>> +     * need for relative addressing.
>> +     */
>> +    *cs++ = __MI_LOAD_REGISTER_IMM(GEN9_MOCS_SIZE / 2);
>>
>>     for (index = 0; index < GEN9_MOCS_SIZE / 2; index++) {
>>         *cs++ = i915_mmio_reg_offset(GEN9_LNCFCMOCS(index));
>> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c 
>> b/drivers/gpu/drm/i915/i915_cmd_parser.c
>> index e9fadcb4d592..fd183e72dace 100644
>> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
>> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
>> @@ -221,7 +221,7 @@ static const struct drm_i915_cmd_descriptor 
>> common_cmds[] = {
>>     CMD(  MI_SUSPEND_FLUSH,                 SMI,    F,  1,      S  ),
>>     CMD(  MI_SEMAPHORE_MBOX,                SMI,   !F,  0xFF,   R  ),
>>     CMD(  MI_STORE_DWORD_INDEX,             SMI,   !F,  0xFF,   R  ),
>> -    CMD(  MI_LOAD_REGISTER_IMM(1),          SMI,   !F,  0xFF,   W,
>> +    CMD(  __MI_LOAD_REGISTER_IMM(1),        SMI,   !F,  0xFF,   W,
>>           .reg = { .offset = 1, .mask = 0x007FFFFC, .step = 2 }    ),
>>     CMD(  MI_STORE_REGISTER_MEM,            SMI,    F,  3,     W | B,
>>           .reg = { .offset = 1, .mask = 0x007FFFFC },
>> @@ -1183,7 +1183,7 @@ static bool check_cmd(const struct 
>> intel_engine_cs *engine,
>>                     return false;
>>                 }
>>
>> -                if (desc->cmd.value == MI_LOAD_REGISTER_IMM(1) &&
>> +                if (desc->cmd.value == __MI_LOAD_REGISTER_IMM(1) &&
>>                     (offset + 2 > length ||
>>                      (cmd[offset + 1] & reg->mask) != reg->value)) {
>>                     DRM_DEBUG_DRIVER("CMD: Rejected LRI to masked 
>> register 0x%08X\n",
>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
>> b/drivers/gpu/drm/i915/i915_gem_context.c
>> index 65cefc520e79..98260d8a45be 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>> @@ -1026,11 +1026,11 @@ static int emit_ppgtt_update(struct 
>> i915_request *rq, void *data)
>>         if (IS_ERR(cs))
>>             return PTR_ERR(cs);
>>
>> -        *cs++ = MI_LOAD_REGISTER_IMM(2);
>> +        *cs++ = i915_get_lri_cmd(engine, 2);
>>
>> -        *cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(base, 0));
>> +        *cs++ = i915_get_lri_reg(engine, GEN8_RING_PDP_UDW(base, 0));
>>         *cs++ = upper_32_bits(pd_daddr);
>> -        *cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(base, 0));
>> +        *cs++ = i915_get_lri_reg(engine, GEN8_RING_PDP_LDW(base, 0));
>>         *cs++ = lower_32_bits(pd_daddr);
>>
>>         *cs++ = MI_NOOP;
>> @@ -1040,13 +1040,13 @@ static int emit_ppgtt_update(struct 
>> i915_request *rq, void *data)
>>         if (IS_ERR(cs))
>>             return PTR_ERR(cs);
>>
>> -        *cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES);
>> +        *cs++ = i915_get_lri_cmd(engine, 2 * GEN8_3LVL_PDPES);
>>         for (i = GEN8_3LVL_PDPES; i--; ) {
>>             const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
>>
>> -            *cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(base, i));
>> +            *cs++ = i915_get_lri_reg(engine, GEN8_RING_PDP_UDW(base, 
>> i));
>>             *cs++ = upper_32_bits(pd_daddr);
>> -            *cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(base, i));
>> +            *cs++ = i915_get_lri_reg(engine, GEN8_RING_PDP_LDW(base, 
>> i));
>>             *cs++ = lower_32_bits(pd_daddr);
>>         }
>>         *cs++ = MI_NOOP;
>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
>> b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> index 679f7c1561ba..ac5b06d2ffdc 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> @@ -1963,7 +1963,8 @@ static int i915_reset_gen7_sol_offsets(struct 
>> i915_request *rq)
>>     if (IS_ERR(cs))
>>         return PTR_ERR(cs);
>>
>> -    *cs++ = MI_LOAD_REGISTER_IMM(4);
>> +    /* Gen7 only so no need to support relative offsets */
>> +    *cs++ = __MI_LOAD_REGISTER_IMM(4);
>>     for (i = 0; i < 4; i++) {
>>         *cs++ = i915_mmio_reg_offset(GEN7_SO_WRITE_OFFSET(i));
>>         *cs++ = 0;
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>> b/drivers/gpu/drm/i915/i915_perf.c
>> index c4995d5a16d2..86facbccdb02 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -1636,7 +1636,8 @@ static void hsw_disable_metric_set(struct 
>> drm_i915_private *dev_priv)
>>  * in the case that the OA unit has been disabled.
>>  */
>> static void
>> -gen8_update_reg_state_unlocked(struct intel_context *ce,
>> +gen8_update_reg_state_unlocked(struct intel_engine_cs *engine,
>> +                   struct intel_context *ce,
>>                    u32 *reg_state,
>>                    const struct i915_oa_config *oa_config)
>> {
>> @@ -1655,7 +1656,12 @@ gen8_update_reg_state_unlocked(struct 
>> intel_context *ce,
>>     };
>>     int i;
>>
>> -    CTX_REG(reg_state, ctx_oactxctrl, GEN8_OACTXCONTROL,
>> +    /*
>> +     * NB: The LRI instruction is generated by the hardware.
>> +     * Should we read it in and assert that the offset flag is set?
>> +     */
>> +
>> +    CTX_REG(engine, reg_state, ctx_oactxctrl, GEN8_OACTXCONTROL,
>>         (i915->perf.oa.period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) |
>>         (i915->perf.oa.periodic ? GEN8_OA_TIMER_ENABLE : 0) |
>>         GEN8_OA_COUNTER_RESUME);
>> @@ -1682,10 +1688,10 @@ gen8_update_reg_state_unlocked(struct 
>> intel_context *ce,
>>             }
>>         }
>>
>> -        CTX_REG(reg_state, state_offset, flex_regs[i], value);
>> +        CTX_REG(engine, reg_state, state_offset, flex_regs[i], value);
>>     }
>>
>> -    CTX_REG(reg_state,
>> +    CTX_REG(engine, reg_state,
>>         CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
>>         intel_sseu_make_rpcs(i915, &ce->sseu));
>> }
>> @@ -1770,7 +1776,7 @@ static int gen8_configure_all_contexts(struct 
>> drm_i915_private *dev_priv,
>>             ce->state->obj->mm.dirty = true;
>>             regs += LRC_STATE_PN * PAGE_SIZE / sizeof(*regs);
>>
>> -            gen8_update_reg_state_unlocked(ce, regs, oa_config);
>> +            gen8_update_reg_state_unlocked(dev_priv->engine[RCS0], 
>> ce, regs, oa_config);
>>
>>             i915_gem_object_unpin_map(ce->state->obj);
>>         }
>> @@ -2166,7 +2172,8 @@ void i915_oa_init_reg_state(struct 
>> intel_engine_cs *engine,
>>
>>     stream = engine->i915->perf.oa.exclusive_stream;
>>     if (stream)
>> -        gen8_update_reg_state_unlocked(ce, regs, stream->oa_config);
>> +        gen8_update_reg_state_unlocked(engine, ce, regs,
>> +                           stream->oa_config);
>> }
>>
>> /**
>> -- 
>> 2.21.0.5.gaeb582a983
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2019-06-20 16:33 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-13 19:45 [PATCH] drm/i915: Engine relative MMIO John.C.Harrison
2019-05-13 20:03 ` ✗ Fi.CI.BAT: failure for drm/i915: Engine relative MMIO (rev5) Patchwork
2019-05-13 21:10   ` John Harrison
2019-05-15  8:52 ` [PATCH] drm/i915: Engine relative MMIO Tvrtko Ursulin
2019-05-17  1:25   ` John Harrison
2019-05-20  6:19     ` Tvrtko Ursulin
2019-06-12  0:24       ` Rodrigo Vivi
2019-06-20  7:24 ` Matthew Brost
2019-06-20 16:33   ` Tvrtko Ursulin [this message]
2019-06-20 19:15     ` Rodrigo Vivi
  -- strict thread matches above, loose matches on Subject: below --
2019-05-13 21:09 John.C.Harrison
2019-04-24  1:50 John.C.Harrison
2019-05-06 21:36 ` Rodrigo Vivi
2019-05-07 18:55   ` John Harrison
2019-05-08  6:06     ` Rodrigo Vivi
2019-03-30  0:10 John.C.Harrison
2019-03-30  7:59 ` Chris Wilson
2019-04-01 21:02   ` John Harrison
2019-04-01 21:10     ` Chris Wilson
2019-03-19 23:22 John.C.Harrison
2019-03-20 11:39 ` kbuild test robot
2019-02-22 23:49 John.C.Harrison
2019-02-22 23:57 ` Chris Wilson
2019-02-23  7:37 ` kbuild test robot
2019-02-23  9:27 ` kbuild test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43ff893f-a53a-a808-676b-ea5b2eed2ff3@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=Intel-GFX@Lists.FreeDesktop.Org \
    --cc=John.C.Harrison@Intel.com \
    --cc=chris.p.wilson@intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=matthew.brost@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.