All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 13/20] drm/i915/guc: Provide register list to be saved/restored during engine reset
       [not found] <02fb15a-d90a-7d82-ab57-032a77636ba8@intel.com>
@ 2017-04-19 18:35 ` Michel Thierry
  2017-04-20 11:33   ` Joonas Lahtinen
  0 siblings, 1 reply; 11+ messages in thread
From: Michel Thierry @ 2017-04-19 18:35 UTC (permalink / raw)
  To: intel-gfx

From: Arun Siluvery <arun.siluvery@linux.intel.com>

GuC expects a list of registers from the driver which are saved/restored
during engine reset. The type of value to be saved is controlled by
flags. We provide a minimal set of registers that we want GuC to save and
restore. This is not an issue in case of engine reset as driver initializes
most of them following an engine reset, but in case of media reset (aka
watchdog reset) which is completely internal to GuC (including resubmission
of hung workload), it is necessary to provide this list, otherwise GuC won't
be able to schedule further workloads after a reset. This is the minimal
set of registers identified for things to work as expected but if we see
any new issues, this register list can be expanded.

In order to not loose any existing workarounds, we have to let GuC know
the registers and its values. These will be reapplied after the reset.
Note that we can't just read the current value because most of these
registers are masked (so we have a workaround for a workaround for a
workaround).

v2: REGSET_MASKED is too difficult for GuC, use REGSET_SAVE_DEFAULT_VALUE
and current value from RING_MODE reg instead; no need to preserve
head/tail either, be extra paranoid and save whitelisted registers (Daniele).

v3: Workarounds added only once during _init_workarounds also have to
been restored, or we risk loosing them after internal GuC reset
(Daniele).

Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            |  3 ++
 drivers/gpu/drm/i915/i915_guc_submission.c | 68 +++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_engine_cs.c     | 65 +++++++++++++++++++---------
 3 files changed, 114 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index fa3988c5033b..1ba1ac016973 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1913,7 +1913,10 @@ struct i915_wa_reg {
 
 struct i915_workarounds {
 	struct i915_wa_reg reg[I915_MAX_WA_REGS];
+	/* list of registers (and their values) that GuC will have to restore */
+	struct i915_wa_reg guc_reg[GUC_REGSET_MAX_REGISTERS];
 	u32 count;
+	u32 guc_count;
 	u32 hw_whitelist_count[I915_NUM_ENGINES];
 };
 
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 1ea36a88d2fb..f4081da88df2 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -1003,6 +1003,24 @@ static void guc_policies_init(struct guc_policies *policies)
 	policies->is_valid = 1;
 }
 
+/*
+ * In this macro it is highly unlikely to exceed max value but even if we did
+ * it is not an error so just throw a warning and continue. Only side effect
+ * in continuing further means some registers won't be added to save/restore
+ * list.
+ */
+#define GUC_ADD_MMIO_REG_ADS(node, reg_addr, _flags, defvalue)		\
+	do {								\
+		u32 __count = node->number_of_registers;		\
+		if (WARN_ON(__count >= GUC_REGSET_MAX_REGISTERS))	\
+			continue;					\
+		node->registers[__count].offset = reg_addr.reg;		\
+		node->registers[__count].flags = (_flags);		\
+		if (defvalue)						\
+			node->registers[__count].value = (defvalue);	\
+		node->number_of_registers++;				\
+	} while (0)
+
 static int guc_ads_create(struct intel_guc *guc)
 {
 	struct drm_i915_private *dev_priv = guc_to_i915(guc);
@@ -1016,6 +1034,7 @@ static int guc_ads_create(struct intel_guc *guc)
 		u8 reg_state_buffer[GUC_S3_SAVE_SPACE_PAGES * PAGE_SIZE];
 	} __packed *blob;
 	struct intel_engine_cs *engine;
+	struct i915_workarounds *workarounds = &dev_priv->workarounds;
 	enum intel_engine_id id;
 	u32 base;
 
@@ -1035,6 +1054,47 @@ static int guc_ads_create(struct intel_guc *guc)
 
 	/* MMIO reg state */
 	for_each_engine(engine, dev_priv, id) {
+		u32 i;
+		struct guc_mmio_regset *eng_reg =
+			&blob->reg_state.engine_reg[engine->guc_id];
+
+		/*
+		 * Provide a list of registers to be saved/restored during gpu
+		 * reset. This is mainly required for Media reset (aka watchdog
+		 * timeout) which is completely under the control of GuC
+		 * (resubmission of hung workload is handled inside GuC).
+		 */
+		GUC_ADD_MMIO_REG_ADS(eng_reg, RING_HWS_PGA(engine->mmio_base),
+				     GUC_REGSET_ENGINERESET |
+				     GUC_REGSET_SAVE_CURRENT_VALUE, 0);
+
+		/*
+		 * Workaround the guc issue with masked registers, note that
+		 * at this point guc submission is still disabled and the mode
+		 * register doesnt have the irq_steering bit set, which we
+		 * need to fwd irqs to GuC.
+		 */
+		GUC_ADD_MMIO_REG_ADS(eng_reg, RING_MODE_GEN7(engine),
+				     GUC_REGSET_ENGINERESET |
+				     GUC_REGSET_SAVE_DEFAULT_VALUE,
+				     I915_READ(RING_MODE_GEN7(engine)) |
+				     GFX_INTERRUPT_STEERING | (0xFFFF<<16));
+
+		GUC_ADD_MMIO_REG_ADS(eng_reg, RING_IMR(engine->mmio_base),
+				     GUC_REGSET_ENGINERESET |
+				     GUC_REGSET_SAVE_CURRENT_VALUE, 0);
+
+		/* ask guc to re-apply workarounds set in *_init_workarounds */
+		for (i = 0; i < workarounds->guc_count; i++)
+			GUC_ADD_MMIO_REG_ADS(eng_reg,
+					     workarounds->guc_reg[i].addr,
+					     GUC_REGSET_ENGINERESET |
+					     GUC_REGSET_SAVE_DEFAULT_VALUE,
+					     workarounds->guc_reg[i].value);
+
+		DRM_DEBUG_DRIVER("%s register save/restore count: %u\n",
+				 engine->name, eng_reg->number_of_registers);
+
 		blob->reg_state.white_list[engine->guc_id].mmio_start =
 			i915_mmio_reg_offset(RING_FORCE_TO_NONPRIV(engine->mmio_base, 0));
 
@@ -1044,9 +1104,13 @@ static int guc_ads_create(struct intel_guc *guc)
 		 * inconsistencies with the handling of FORCE_TO_NONPRIV
 		 * registers.
 		 */
-		blob->reg_state.white_list[engine->guc_id].count = 0;
+		blob->reg_state.white_list[engine->guc_id].count =
+					workarounds->hw_whitelist_count[id];
 
-		/* Nothing to be saved or restored for now. */
+		for (i = 0; i < workarounds->hw_whitelist_count[id]; i++) {
+			blob->reg_state.white_list[engine->guc_id].offsets[i] =
+				I915_READ(RING_FORCE_TO_NONPRIV(engine->mmio_base, i));
+		}
 	}
 
 	/*
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 402769d9d840..601cdb0a3694 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -625,6 +625,29 @@ static int wa_ring_whitelist_reg(struct intel_engine_cs *engine,
 	return 0;
 }
 
+static int guc_wa_add(struct drm_i915_private *dev_priv,
+		      i915_reg_t addr, const u32 val)
+{
+	const u32 idx = dev_priv->workarounds.guc_count;
+
+	I915_WRITE(addr, val);
+	if (WARN_ON(idx >= GUC_REGSET_MAX_REGISTERS))
+		return -ENOSPC;
+
+	dev_priv->workarounds.guc_reg[idx].addr = addr;
+	/* GuC can't handle masked regs, so we store the value|mask together */
+	dev_priv->workarounds.guc_reg[idx].value = val;
+	dev_priv->workarounds.guc_count++;
+
+	return 0;
+}
+
+#define I915_GUC_REG_WRITE(addr, val) do { \
+		const int r = guc_wa_add(dev_priv, (addr), (val)); \
+		if (r) \
+			return r; \
+	} while (0)
+
 static int gen8_init_workarounds(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
@@ -732,15 +755,16 @@ static int gen9_init_workarounds(struct intel_engine_cs *engine)
 	int ret;
 
 	/* WaConextSwitchWithConcurrentTLBInvalidate:skl,bxt,kbl,glk */
-	I915_WRITE(GEN9_CSFE_CHICKEN1_RCS, _MASKED_BIT_ENABLE(GEN9_PREEMPT_GPGPU_SYNC_SWITCH_DISABLE));
+	I915_GUC_REG_WRITE(GEN9_CSFE_CHICKEN1_RCS,
+			   _MASKED_BIT_ENABLE(GEN9_PREEMPT_GPGPU_SYNC_SWITCH_DISABLE));
 
 	/* WaEnableLbsSlaRetryTimerDecrement:skl,bxt,kbl,glk */
-	I915_WRITE(BDW_SCRATCH1, I915_READ(BDW_SCRATCH1) |
-		   GEN9_LBS_SLA_RETRY_TIMER_DECREMENT_ENABLE);
+	I915_GUC_REG_WRITE(BDW_SCRATCH1, I915_READ(BDW_SCRATCH1) |
+			   GEN9_LBS_SLA_RETRY_TIMER_DECREMENT_ENABLE);
 
 	/* WaDisableKillLogic:bxt,skl,kbl */
-	I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) |
-		   ECOCHK_DIS_TLB);
+	I915_GUC_REG_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) |
+			   ECOCHK_DIS_TLB);
 
 	/* WaClearFlowControlGpgpuContextSave:skl,bxt,kbl,glk */
 	/* WaDisablePartialInstShootdown:skl,bxt,kbl,glk */
@@ -809,8 +833,8 @@ static int gen9_init_workarounds(struct intel_engine_cs *engine)
 			  HDC_FORCE_NON_COHERENT);
 
 	/* WaDisableHDCInvalidation:skl,bxt,kbl */
-	I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) |
-		   BDW_DISABLE_HDC_INVALIDATION);
+	I915_GUC_REG_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) |
+			   BDW_DISABLE_HDC_INVALIDATION);
 
 	/* WaDisableSamplerPowerBypassForSOPingPong:skl,bxt,kbl */
 	if (IS_SKYLAKE(dev_priv) ||
@@ -823,8 +847,8 @@ static int gen9_init_workarounds(struct intel_engine_cs *engine)
 	WA_SET_BIT_MASKED(HALF_SLICE_CHICKEN2, GEN8_ST_PO_DISABLE);
 
 	/* WaOCLCoherentLineFlush:skl,bxt,kbl */
-	I915_WRITE(GEN8_L3SQCREG4, (I915_READ(GEN8_L3SQCREG4) |
-				    GEN8_LQSC_FLUSH_COHERENT_LINES));
+	I915_GUC_REG_WRITE(GEN8_L3SQCREG4, (I915_READ(GEN8_L3SQCREG4) |
+					    GEN8_LQSC_FLUSH_COHERENT_LINES));
 
 	/* WaVFEStateAfterPipeControlwithMediaStateClear:skl,bxt,glk */
 	ret = wa_ring_whitelist_reg(engine, GEN9_CTX_PREEMPT_REG);
@@ -899,12 +923,12 @@ static int skl_init_workarounds(struct intel_engine_cs *engine)
 	 * until D0 which is the default case so this is equivalent to
 	 * !WaDisablePerCtxtPreemptionGranularityControl:skl
 	 */
-	I915_WRITE(GEN7_FF_SLICE_CS_CHICKEN1,
-		   _MASKED_BIT_ENABLE(GEN9_FFSC_PERCTX_PREEMPT_CTRL));
+	I915_GUC_REG_WRITE(GEN7_FF_SLICE_CS_CHICKEN1,
+			   _MASKED_BIT_ENABLE(GEN9_FFSC_PERCTX_PREEMPT_CTRL));
 
 	/* WaEnableGapsTsvCreditFix:skl */
-	I915_WRITE(GEN8_GARBCNTL, (I915_READ(GEN8_GARBCNTL) |
-				   GEN9_GAPS_TSV_CREDIT_DISABLE));
+	I915_GUC_REG_WRITE(GEN8_GARBCNTL, (I915_READ(GEN8_GARBCNTL) |
+					   GEN9_GAPS_TSV_CREDIT_DISABLE));
 
 	/* WaDisableGafsUnitClkGating:skl */
 	WA_SET_BIT(GEN7_UCGCTL4, GEN8_EU_GAUNIT_CLOCK_GATE_DISABLE);
@@ -934,12 +958,12 @@ static int bxt_init_workarounds(struct intel_engine_cs *engine)
 	/* WaStoreMultiplePTEenable:bxt */
 	/* This is a requirement according to Hardware specification */
 	if (IS_BXT_REVID(dev_priv, 0, BXT_REVID_A1))
-		I915_WRITE(TILECTL, I915_READ(TILECTL) | TILECTL_TLBPF);
+		I915_GUC_REG_WRITE(TILECTL, I915_READ(TILECTL) | TILECTL_TLBPF);
 
 	/* WaSetClckGatingDisableMedia:bxt */
 	if (IS_BXT_REVID(dev_priv, 0, BXT_REVID_A1)) {
-		I915_WRITE(GEN7_MISCCPCTL, (I915_READ(GEN7_MISCCPCTL) &
-					    ~GEN8_DOP_CLOCK_GATE_MEDIA_ENABLE));
+		I915_GUC_REG_WRITE(GEN7_MISCCPCTL, (I915_READ(GEN7_MISCCPCTL) &
+						    ~GEN8_DOP_CLOCK_GATE_MEDIA_ENABLE));
 	}
 
 	/* WaDisableThreadStallDopClockGating:bxt */
@@ -975,8 +999,8 @@ static int bxt_init_workarounds(struct intel_engine_cs *engine)
 
 	/* WaProgramL3SqcReg1DefaultForPerf:bxt */
 	if (IS_BXT_REVID(dev_priv, BXT_REVID_B0, REVID_FOREVER))
-		I915_WRITE(GEN8_L3SQCREG1, L3_GENERAL_PRIO_CREDITS(62) |
-					   L3_HIGH_PRIO_CREDITS(2));
+		I915_GUC_REG_WRITE(GEN8_L3SQCREG1, L3_GENERAL_PRIO_CREDITS(62) |
+						   L3_HIGH_PRIO_CREDITS(2));
 
 	/* WaToEnableHwFixForPushConstHWBug:bxt */
 	if (IS_BXT_REVID(dev_priv, BXT_REVID_C0, REVID_FOREVER))
@@ -1001,8 +1025,8 @@ static int kbl_init_workarounds(struct intel_engine_cs *engine)
 		return ret;
 
 	/* WaEnableGapsTsvCreditFix:kbl */
-	I915_WRITE(GEN8_GARBCNTL, (I915_READ(GEN8_GARBCNTL) |
-				   GEN9_GAPS_TSV_CREDIT_DISABLE));
+	I915_GUC_REG_WRITE(GEN8_GARBCNTL, (I915_READ(GEN8_GARBCNTL) |
+					   GEN9_GAPS_TSV_CREDIT_DISABLE));
 
 	/* WaDisableDynamicCreditSharing:kbl */
 	if (IS_KBL_REVID(dev_priv, 0, KBL_REVID_B0))
@@ -1063,6 +1087,7 @@ int init_workarounds_ring(struct intel_engine_cs *engine)
 	WARN_ON(engine->id != RCS);
 
 	dev_priv->workarounds.count = 0;
+	dev_priv->workarounds.guc_count = 0;
 	dev_priv->workarounds.hw_whitelist_count[engine->id] = 0;
 
 	if (IS_BROADWELL(dev_priv))
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 13/20] drm/i915/guc: Provide register list to be saved/restored during engine reset
  2017-04-19 18:35 ` [PATCH v6 13/20] drm/i915/guc: Provide register list to be saved/restored during engine reset Michel Thierry
@ 2017-04-20 11:33   ` Joonas Lahtinen
  2017-04-20 16:39     ` Daniele Ceraolo Spurio
  0 siblings, 1 reply; 11+ messages in thread
From: Joonas Lahtinen @ 2017-04-20 11:33 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

On ke, 2017-04-19 at 11:35 -0700, Michel Thierry wrote:
> From: Arun Siluvery <arun.siluvery@linux.intel.com>
> 
> GuC expects a list of registers from the driver which are saved/restored
> during engine reset. The type of value to be saved is controlled by
> flags. We provide a minimal set of registers that we want GuC to save and
> restore. This is not an issue in case of engine reset as driver initializes
> most of them following an engine reset, but in case of media reset (aka
> watchdog reset) which is completely internal to GuC (including resubmission
> of hung workload), it is necessary to provide this list, otherwise GuC won't
> be able to schedule further workloads after a reset. This is the minimal
> set of registers identified for things to work as expected but if we see
> any new issues, this register list can be expanded.
> 
> In order to not loose any existing workarounds, we have to let GuC know
> the registers and its values. These will be reapplied after the reset.
> Note that we can't just read the current value because most of these
> registers are masked (so we have a workaround for a workaround for a
> workaround).
> 
> v2: REGSET_MASKED is too difficult for GuC, use REGSET_SAVE_DEFAULT_VALUE
> and current value from RING_MODE reg instead; no need to preserve
> head/tail either, be extra paranoid and save whitelisted registers (Daniele).
> 
> v3: Workarounds added only once during _init_workarounds also have to
> been restored, or we risk loosing them after internal GuC reset
> (Daniele).
> 
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>

<SNIP>

> @@ -732,15 +755,16 @@ static int gen9_init_workarounds(struct intel_engine_cs *engine)
> >  	int ret;
>  
>  	/* WaConextSwitchWithConcurrentTLBInvalidate:skl,bxt,kbl,glk */
> -	I915_WRITE(GEN9_CSFE_CHICKEN1_RCS, _MASKED_BIT_ENABLE(GEN9_PREEMPT_GPGPU_SYNC_SWITCH_DISABLE));
> +	I915_GUC_REG_WRITE(GEN9_CSFE_CHICKEN1_RCS,
> +			   _MASKED_BIT_ENABLE(GEN9_PREEMPT_GPGPU_SYNC_SWITCH_DISABLE));

To make grepping easier, how about?

	I915_WRITE(GUC_REG(GEN9_CSFE_CHICKEN1_RCS),
		   ...);

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 13/20] drm/i915/guc: Provide register list to be saved/restored during engine reset
  2017-04-20 11:33   ` Joonas Lahtinen
@ 2017-04-20 16:39     ` Daniele Ceraolo Spurio
  2017-04-20 17:29       ` Michel Thierry
  0 siblings, 1 reply; 11+ messages in thread
From: Daniele Ceraolo Spurio @ 2017-04-20 16:39 UTC (permalink / raw)
  To: Joonas Lahtinen, Michel Thierry, intel-gfx



On 20/04/17 04:33, Joonas Lahtinen wrote:
> On ke, 2017-04-19 at 11:35 -0700, Michel Thierry wrote:
>> From: Arun Siluvery <arun.siluvery@linux.intel.com>
>>
>> GuC expects a list of registers from the driver which are saved/restored
>> during engine reset. The type of value to be saved is controlled by
>> flags. We provide a minimal set of registers that we want GuC to save and
>> restore. This is not an issue in case of engine reset as driver initializes
>> most of them following an engine reset, but in case of media reset (aka
>> watchdog reset) which is completely internal to GuC (including resubmission
>> of hung workload), it is necessary to provide this list, otherwise GuC won't
>> be able to schedule further workloads after a reset. This is the minimal
>> set of registers identified for things to work as expected but if we see
>> any new issues, this register list can be expanded.
>>
>> In order to not loose any existing workarounds, we have to let GuC know
>> the registers and its values. These will be reapplied after the reset.
>> Note that we can't just read the current value because most of these
>> registers are masked (so we have a workaround for a workaround for a
>> workaround).
>>
>> v2: REGSET_MASKED is too difficult for GuC, use REGSET_SAVE_DEFAULT_VALUE
>> and current value from RING_MODE reg instead; no need to preserve
>> head/tail either, be extra paranoid and save whitelisted registers (Daniele).
>>
>> v3: Workarounds added only once during _init_workarounds also have to
>> been restored, or we risk loosing them after internal GuC reset
>> (Daniele).
>>
>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>> Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
>> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
>
> <SNIP>
>
>> @@ -732,15 +755,16 @@ static int gen9_init_workarounds(struct intel_engine_cs *engine)
>>>  	int ret;
>>
>>  	/* WaConextSwitchWithConcurrentTLBInvalidate:skl,bxt,kbl,glk */
>> -	I915_WRITE(GEN9_CSFE_CHICKEN1_RCS, _MASKED_BIT_ENABLE(GEN9_PREEMPT_GPGPU_SYNC_SWITCH_DISABLE));
>> +	I915_GUC_REG_WRITE(GEN9_CSFE_CHICKEN1_RCS,
>> +			   _MASKED_BIT_ENABLE(GEN9_PREEMPT_GPGPU_SYNC_SWITCH_DISABLE));
>
> To make grepping easier, how about?
>
> 	I915_WRITE(GUC_REG(GEN9_CSFE_CHICKEN1_RCS),
> 		   ...);
>
> Regards, Joonas
>

GUC_REG makes it sound like it is somehow related to GuC, while it 
isn't, we just want GuC to restore its value. What about GUC_REG_RESTORE?

Thanks,
Daniele
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 13/20] drm/i915/guc: Provide register list to be saved/restored during engine reset
  2017-04-20 16:39     ` Daniele Ceraolo Spurio
@ 2017-04-20 17:29       ` Michel Thierry
  2017-04-21 20:07         ` Michel Thierry
  0 siblings, 1 reply; 11+ messages in thread
From: Michel Thierry @ 2017-04-20 17:29 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, Joonas Lahtinen, intel-gfx



On 20/04/17 09:39, Daniele Ceraolo Spurio wrote:
>
>
> On 20/04/17 04:33, Joonas Lahtinen wrote:
>> On ke, 2017-04-19 at 11:35 -0700, Michel Thierry wrote:
>>> From: Arun Siluvery <arun.siluvery@linux.intel.com>
>>>
>>> GuC expects a list of registers from the driver which are saved/restored
>>> during engine reset. The type of value to be saved is controlled by
>>> flags. We provide a minimal set of registers that we want GuC to save
>>> and
>>> restore. This is not an issue in case of engine reset as driver
>>> initializes
>>> most of them following an engine reset, but in case of media reset (aka
>>> watchdog reset) which is completely internal to GuC (including
>>> resubmission
>>> of hung workload), it is necessary to provide this list, otherwise
>>> GuC won't
>>> be able to schedule further workloads after a reset. This is the minimal
>>> set of registers identified for things to work as expected but if we see
>>> any new issues, this register list can be expanded.
>>>
>>> In order to not loose any existing workarounds, we have to let GuC know
>>> the registers and its values. These will be reapplied after the reset.
>>> Note that we can't just read the current value because most of these
>>> registers are masked (so we have a workaround for a workaround for a
>>> workaround).
>>>
>>> v2: REGSET_MASKED is too difficult for GuC, use
>>> REGSET_SAVE_DEFAULT_VALUE
>>> and current value from RING_MODE reg instead; no need to preserve
>>> head/tail either, be extra paranoid and save whitelisted registers
>>> (Daniele).
>>>
>>> v3: Workarounds added only once during _init_workarounds also have to
>>> been restored, or we risk loosing them after internal GuC reset
>>> (Daniele).
>>>
>>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>>> Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
>>> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
>>
>> <SNIP>
>>
>>> @@ -732,15 +755,16 @@ static int gen9_init_workarounds(struct
>>> intel_engine_cs *engine)
>>>>      int ret;
>>>
>>>      /* WaConextSwitchWithConcurrentTLBInvalidate:skl,bxt,kbl,glk */
>>> -    I915_WRITE(GEN9_CSFE_CHICKEN1_RCS,
>>> _MASKED_BIT_ENABLE(GEN9_PREEMPT_GPGPU_SYNC_SWITCH_DISABLE));
>>> +    I915_GUC_REG_WRITE(GEN9_CSFE_CHICKEN1_RCS,
>>> +
>>> _MASKED_BIT_ENABLE(GEN9_PREEMPT_GPGPU_SYNC_SWITCH_DISABLE));
>>
>> To make grepping easier, how about?
>>
>>     I915_WRITE(GUC_REG(GEN9_CSFE_CHICKEN1_RCS),
>>            ...);
>>
>> Regards, Joonas
>>
>
> GUC_REG makes it sound like it is somehow related to GuC, while it
> isn't, we just want GuC to restore its value. What about GUC_REG_RESTORE?
>

Honestly, I dont care about names, pick one and I add it.
Just a reminder, we not only need the reg offset, we want to save the 
value too.

-Michel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 13/20] drm/i915/guc: Provide register list to be saved/restored during engine reset
  2017-04-20 17:29       ` Michel Thierry
@ 2017-04-21 20:07         ` Michel Thierry
  2017-04-21 20:10           ` Daniele Ceraolo Spurio
  0 siblings, 1 reply; 11+ messages in thread
From: Michel Thierry @ 2017-04-21 20:07 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, Joonas Lahtinen, intel-gfx



On 20/04/17 10:29, Michel Thierry wrote:
>
>
> On 20/04/17 09:39, Daniele Ceraolo Spurio wrote:
>>
>>
>> On 20/04/17 04:33, Joonas Lahtinen wrote:
>>> On ke, 2017-04-19 at 11:35 -0700, Michel Thierry wrote:
>>>> From: Arun Siluvery <arun.siluvery@linux.intel.com>
>>>>
>>>> GuC expects a list of registers from the driver which are
>>>> saved/restored
>>>> during engine reset. The type of value to be saved is controlled by
>>>> flags. We provide a minimal set of registers that we want GuC to save
>>>> and
>>>> restore. This is not an issue in case of engine reset as driver
>>>> initializes
>>>> most of them following an engine reset, but in case of media reset (aka
>>>> watchdog reset) which is completely internal to GuC (including
>>>> resubmission
>>>> of hung workload), it is necessary to provide this list, otherwise
>>>> GuC won't
>>>> be able to schedule further workloads after a reset. This is the
>>>> minimal
>>>> set of registers identified for things to work as expected but if we
>>>> see
>>>> any new issues, this register list can be expanded.
>>>>
>>>> In order to not loose any existing workarounds, we have to let GuC know
>>>> the registers and its values. These will be reapplied after the reset.
>>>> Note that we can't just read the current value because most of these
>>>> registers are masked (so we have a workaround for a workaround for a
>>>> workaround).
>>>>
>>>> v2: REGSET_MASKED is too difficult for GuC, use
>>>> REGSET_SAVE_DEFAULT_VALUE
>>>> and current value from RING_MODE reg instead; no need to preserve
>>>> head/tail either, be extra paranoid and save whitelisted registers
>>>> (Daniele).
>>>>
>>>> v3: Workarounds added only once during _init_workarounds also have to
>>>> been restored, or we risk loosing them after internal GuC reset
>>>> (Daniele).
>>>>
>>>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>>>> Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
>>>> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
>>>
>>> <SNIP>
>>>
>>>> @@ -732,15 +755,16 @@ static int gen9_init_workarounds(struct
>>>> intel_engine_cs *engine)
>>>>>      int ret;
>>>>
>>>>      /* WaConextSwitchWithConcurrentTLBInvalidate:skl,bxt,kbl,glk */
>>>> -    I915_WRITE(GEN9_CSFE_CHICKEN1_RCS,
>>>> _MASKED_BIT_ENABLE(GEN9_PREEMPT_GPGPU_SYNC_SWITCH_DISABLE));
>>>> +    I915_GUC_REG_WRITE(GEN9_CSFE_CHICKEN1_RCS,
>>>> +
>>>> _MASKED_BIT_ENABLE(GEN9_PREEMPT_GPGPU_SYNC_SWITCH_DISABLE));
>>>
>>> To make grepping easier, how about?
>>>
>>>     I915_WRITE(GUC_REG(GEN9_CSFE_CHICKEN1_RCS),
>>>            ...);
>>>
>>> Regards, Joonas
>>>
>>
>> GUC_REG makes it sound like it is somehow related to GuC, while it
>> isn't, we just want GuC to restore its value. What about GUC_REG_RESTORE?
>>
>
> Honestly, I dont care about names, pick one and I add it.
> Just a reminder, we not only need the reg offset, we want to save the
> value too.
>

I915_WRITE_GUC_RESTORE(reg, value) ?

That would be inline to the others we have, e.g. I915_WRITE_FW, 
I915_WRITE_CTL, I915_WRITE_HEAD/TAIL.

-Michel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 13/20] drm/i915/guc: Provide register list to be saved/restored during engine reset
  2017-04-21 20:07         ` Michel Thierry
@ 2017-04-21 20:10           ` Daniele Ceraolo Spurio
  2017-04-21 20:21             ` Chris Wilson
  0 siblings, 1 reply; 11+ messages in thread
From: Daniele Ceraolo Spurio @ 2017-04-21 20:10 UTC (permalink / raw)
  To: Michel Thierry, Joonas Lahtinen, intel-gfx



On 21/04/17 13:07, Michel Thierry wrote:
>
>
> On 20/04/17 10:29, Michel Thierry wrote:
>>
>>
>> On 20/04/17 09:39, Daniele Ceraolo Spurio wrote:
>>>
>>>
>>> On 20/04/17 04:33, Joonas Lahtinen wrote:
>>>> On ke, 2017-04-19 at 11:35 -0700, Michel Thierry wrote:
>>>>> From: Arun Siluvery <arun.siluvery@linux.intel.com>
>>>>>
>>>>> GuC expects a list of registers from the driver which are
>>>>> saved/restored
>>>>> during engine reset. The type of value to be saved is controlled by
>>>>> flags. We provide a minimal set of registers that we want GuC to save
>>>>> and
>>>>> restore. This is not an issue in case of engine reset as driver
>>>>> initializes
>>>>> most of them following an engine reset, but in case of media reset
>>>>> (aka
>>>>> watchdog reset) which is completely internal to GuC (including
>>>>> resubmission
>>>>> of hung workload), it is necessary to provide this list, otherwise
>>>>> GuC won't
>>>>> be able to schedule further workloads after a reset. This is the
>>>>> minimal
>>>>> set of registers identified for things to work as expected but if we
>>>>> see
>>>>> any new issues, this register list can be expanded.
>>>>>
>>>>> In order to not loose any existing workarounds, we have to let GuC
>>>>> know
>>>>> the registers and its values. These will be reapplied after the reset.
>>>>> Note that we can't just read the current value because most of these
>>>>> registers are masked (so we have a workaround for a workaround for a
>>>>> workaround).
>>>>>
>>>>> v2: REGSET_MASKED is too difficult for GuC, use
>>>>> REGSET_SAVE_DEFAULT_VALUE
>>>>> and current value from RING_MODE reg instead; no need to preserve
>>>>> head/tail either, be extra paranoid and save whitelisted registers
>>>>> (Daniele).
>>>>>
>>>>> v3: Workarounds added only once during _init_workarounds also have to
>>>>> been restored, or we risk loosing them after internal GuC reset
>>>>> (Daniele).
>>>>>
>>>>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>>>>> Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
>>>>> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
>>>>
>>>> <SNIP>
>>>>
>>>>> @@ -732,15 +755,16 @@ static int gen9_init_workarounds(struct
>>>>> intel_engine_cs *engine)
>>>>>>      int ret;
>>>>>
>>>>>      /* WaConextSwitchWithConcurrentTLBInvalidate:skl,bxt,kbl,glk */
>>>>> -    I915_WRITE(GEN9_CSFE_CHICKEN1_RCS,
>>>>> _MASKED_BIT_ENABLE(GEN9_PREEMPT_GPGPU_SYNC_SWITCH_DISABLE));
>>>>> +    I915_GUC_REG_WRITE(GEN9_CSFE_CHICKEN1_RCS,
>>>>> +
>>>>> _MASKED_BIT_ENABLE(GEN9_PREEMPT_GPGPU_SYNC_SWITCH_DISABLE));
>>>>
>>>> To make grepping easier, how about?
>>>>
>>>>     I915_WRITE(GUC_REG(GEN9_CSFE_CHICKEN1_RCS),
>>>>            ...);
>>>>
>>>> Regards, Joonas
>>>>
>>>
>>> GUC_REG makes it sound like it is somehow related to GuC, while it
>>> isn't, we just want GuC to restore its value. What about
>>> GUC_REG_RESTORE?
>>>
>>
>> Honestly, I dont care about names, pick one and I add it.
>> Just a reminder, we not only need the reg offset, we want to save the
>> value too.
>>
>
> I915_WRITE_GUC_RESTORE(reg, value) ?
>
> That would be inline to the others we have, e.g. I915_WRITE_FW,
> I915_WRITE_CTL, I915_WRITE_HEAD/TAIL.
>
> -Michel

LGTM.

Daniele
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 13/20] drm/i915/guc: Provide register list to be saved/restored during engine reset
  2017-04-21 20:10           ` Daniele Ceraolo Spurio
@ 2017-04-21 20:21             ` Chris Wilson
  2017-04-21 20:31               ` Michel Thierry
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Wilson @ 2017-04-21 20:21 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-gfx

On Fri, Apr 21, 2017 at 01:10:37PM -0700, Daniele Ceraolo Spurio wrote:
> 
> 
> On 21/04/17 13:07, Michel Thierry wrote:
> >
> >
> >On 20/04/17 10:29, Michel Thierry wrote:
> >>
> >>
> >>On 20/04/17 09:39, Daniele Ceraolo Spurio wrote:
> >>>
> >>>
> >>>On 20/04/17 04:33, Joonas Lahtinen wrote:
> >>>>On ke, 2017-04-19 at 11:35 -0700, Michel Thierry wrote:
> >>>>>From: Arun Siluvery <arun.siluvery@linux.intel.com>
> >>>>>
> >>>>>GuC expects a list of registers from the driver which are
> >>>>>saved/restored
> >>>>>during engine reset. The type of value to be saved is controlled by
> >>>>>flags. We provide a minimal set of registers that we want GuC to save
> >>>>>and
> >>>>>restore. This is not an issue in case of engine reset as driver
> >>>>>initializes
> >>>>>most of them following an engine reset, but in case of media reset
> >>>>>(aka
> >>>>>watchdog reset) which is completely internal to GuC (including
> >>>>>resubmission
> >>>>>of hung workload), it is necessary to provide this list, otherwise
> >>>>>GuC won't
> >>>>>be able to schedule further workloads after a reset. This is the
> >>>>>minimal
> >>>>>set of registers identified for things to work as expected but if we
> >>>>>see
> >>>>>any new issues, this register list can be expanded.
> >>>>>
> >>>>>In order to not loose any existing workarounds, we have to let GuC
> >>>>>know
> >>>>>the registers and its values. These will be reapplied after the reset.
> >>>>>Note that we can't just read the current value because most of these
> >>>>>registers are masked (so we have a workaround for a workaround for a
> >>>>>workaround).
> >>>>>
> >>>>>v2: REGSET_MASKED is too difficult for GuC, use
> >>>>>REGSET_SAVE_DEFAULT_VALUE
> >>>>>and current value from RING_MODE reg instead; no need to preserve
> >>>>>head/tail either, be extra paranoid and save whitelisted registers
> >>>>>(Daniele).
> >>>>>
> >>>>>v3: Workarounds added only once during _init_workarounds also have to
> >>>>>been restored, or we risk loosing them after internal GuC reset
> >>>>>(Daniele).
> >>>>>
> >>>>>Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> >>>>>Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> >>>>>Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
> >>>>>Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> >>>>
> >>>><SNIP>
> >>>>
> >>>>>@@ -732,15 +755,16 @@ static int gen9_init_workarounds(struct
> >>>>>intel_engine_cs *engine)
> >>>>>>     int ret;
> >>>>>
> >>>>>     /* WaConextSwitchWithConcurrentTLBInvalidate:skl,bxt,kbl,glk */
> >>>>>-    I915_WRITE(GEN9_CSFE_CHICKEN1_RCS,
> >>>>>_MASKED_BIT_ENABLE(GEN9_PREEMPT_GPGPU_SYNC_SWITCH_DISABLE));
> >>>>>+    I915_GUC_REG_WRITE(GEN9_CSFE_CHICKEN1_RCS,
> >>>>>+
> >>>>>_MASKED_BIT_ENABLE(GEN9_PREEMPT_GPGPU_SYNC_SWITCH_DISABLE));
> >>>>
> >>>>To make grepping easier, how about?
> >>>>
> >>>>    I915_WRITE(GUC_REG(GEN9_CSFE_CHICKEN1_RCS),
> >>>>           ...);
> >>>>
> >>>>Regards, Joonas
> >>>>
> >>>
> >>>GUC_REG makes it sound like it is somehow related to GuC, while it
> >>>isn't, we just want GuC to restore its value. What about
> >>>GUC_REG_RESTORE?
> >>>
> >>
> >>Honestly, I dont care about names, pick one and I add it.
> >>Just a reminder, we not only need the reg offset, we want to save the
> >>value too.
> >>
> >
> >I915_WRITE_GUC_RESTORE(reg, value) ?
> >
> >That would be inline to the others we have, e.g. I915_WRITE_FW,
> >I915_WRITE_CTL, I915_WRITE_HEAD/TAIL.

I915_WRITE_FW is not the same class as I915_WRITE_CTL/HEAD/TAIL, and I
can say from experience the I915_*_CTL/HEAD/TAIL were a mistake (special
casing one particular access to the ring mmio, but we often deviate from
that pattern).

Looking at the above I see you are falling for the same trap as the ring
shorthand... So are you sure the convenience will not be lost later? And
in particular avoid using I915_WRITE_*() naming style as I would rather
that was earmarked for the different mmio accessors.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 13/20] drm/i915/guc: Provide register list to be saved/restored during engine reset
  2017-04-21 20:21             ` Chris Wilson
@ 2017-04-21 20:31               ` Michel Thierry
  0 siblings, 0 replies; 11+ messages in thread
From: Michel Thierry @ 2017-04-21 20:31 UTC (permalink / raw)
  To: Chris Wilson, Daniele Ceraolo Spurio, Joonas Lahtinen, intel-gfx



On 21/04/17 13:21, Chris Wilson wrote:
> On Fri, Apr 21, 2017 at 01:10:37PM -0700, Daniele Ceraolo Spurio wrote:
>>
>>
>> On 21/04/17 13:07, Michel Thierry wrote:
>>>
>>>
>>> On 20/04/17 10:29, Michel Thierry wrote:
>>>>
>>>>
>>>> On 20/04/17 09:39, Daniele Ceraolo Spurio wrote:
>>>>>
>>>>>
>>>>> On 20/04/17 04:33, Joonas Lahtinen wrote:
>>>>>> On ke, 2017-04-19 at 11:35 -0700, Michel Thierry wrote:
>>>>>>> From: Arun Siluvery <arun.siluvery@linux.intel.com>
>>>>>>>
>>>>>>> GuC expects a list of registers from the driver which are
>>>>>>> saved/restored
>>>>>>> during engine reset. The type of value to be saved is controlled by
>>>>>>> flags. We provide a minimal set of registers that we want GuC to save
>>>>>>> and
>>>>>>> restore. This is not an issue in case of engine reset as driver
>>>>>>> initializes
>>>>>>> most of them following an engine reset, but in case of media reset
>>>>>>> (aka
>>>>>>> watchdog reset) which is completely internal to GuC (including
>>>>>>> resubmission
>>>>>>> of hung workload), it is necessary to provide this list, otherwise
>>>>>>> GuC won't
>>>>>>> be able to schedule further workloads after a reset. This is the
>>>>>>> minimal
>>>>>>> set of registers identified for things to work as expected but if we
>>>>>>> see
>>>>>>> any new issues, this register list can be expanded.
>>>>>>>
>>>>>>> In order to not loose any existing workarounds, we have to let GuC
>>>>>>> know
>>>>>>> the registers and its values. These will be reapplied after the reset.
>>>>>>> Note that we can't just read the current value because most of these
>>>>>>> registers are masked (so we have a workaround for a workaround for a
>>>>>>> workaround).
>>>>>>>
>>>>>>> v2: REGSET_MASKED is too difficult for GuC, use
>>>>>>> REGSET_SAVE_DEFAULT_VALUE
>>>>>>> and current value from RING_MODE reg instead; no need to preserve
>>>>>>> head/tail either, be extra paranoid and save whitelisted registers
>>>>>>> (Daniele).
>>>>>>>
>>>>>>> v3: Workarounds added only once during _init_workarounds also have to
>>>>>>> been restored, or we risk loosing them after internal GuC reset
>>>>>>> (Daniele).
>>>>>>>
>>>>>>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>>>>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>>>>>>> Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
>>>>>>> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
>>>>>>
>>>>>> <SNIP>
>>>>>>
>>>>>>> @@ -732,15 +755,16 @@ static int gen9_init_workarounds(struct
>>>>>>> intel_engine_cs *engine)
>>>>>>>>     int ret;
>>>>>>>
>>>>>>>     /* WaConextSwitchWithConcurrentTLBInvalidate:skl,bxt,kbl,glk */
>>>>>>> -    I915_WRITE(GEN9_CSFE_CHICKEN1_RCS,
>>>>>>> _MASKED_BIT_ENABLE(GEN9_PREEMPT_GPGPU_SYNC_SWITCH_DISABLE));
>>>>>>> +    I915_GUC_REG_WRITE(GEN9_CSFE_CHICKEN1_RCS,
>>>>>>> +
>>>>>>> _MASKED_BIT_ENABLE(GEN9_PREEMPT_GPGPU_SYNC_SWITCH_DISABLE));
>>>>>>
>>>>>> To make grepping easier, how about?
>>>>>>
>>>>>>    I915_WRITE(GUC_REG(GEN9_CSFE_CHICKEN1_RCS),
>>>>>>           ...);
>>>>>>
>>>>>> Regards, Joonas
>>>>>>
>>>>>
>>>>> GUC_REG makes it sound like it is somehow related to GuC, while it
>>>>> isn't, we just want GuC to restore its value. What about
>>>>> GUC_REG_RESTORE?
>>>>>
>>>>
>>>> Honestly, I dont care about names, pick one and I add it.
>>>> Just a reminder, we not only need the reg offset, we want to save the
>>>> value too.
>>>>
>>>
>>> I915_WRITE_GUC_RESTORE(reg, value) ?
>>>
>>> That would be inline to the others we have, e.g. I915_WRITE_FW,
>>> I915_WRITE_CTL, I915_WRITE_HEAD/TAIL.
>
> I915_WRITE_FW is not the same class as I915_WRITE_CTL/HEAD/TAIL, and I
> can say from experience the I915_*_CTL/HEAD/TAIL were a mistake (special
> casing one particular access to the ring mmio, but we often deviate from
> that pattern).
>
> Looking at the above I see you are falling for the same trap as the ring
> shorthand... So are you sure the convenience will not be lost later? And
> in particular avoid using I915_WRITE_*() naming style as I would rather
> that was earmarked for the different mmio accessors.

Ok, then can follow the pattern of the other workarounds & whitelist reg 
code?

E.g. WA_REG_GUC_RESTORE or WA_MMIO_REG_GUC_RESTORE (to make it clearer 
that these are not registers in the ctx image).



_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 13/20] drm/i915/guc: Provide register list to be saved/restored during engine reset
  2017-04-19  0:26   ` Daniele Ceraolo Spurio
@ 2017-04-19  0:44     ` Michel Thierry
  0 siblings, 0 replies; 11+ messages in thread
From: Michel Thierry @ 2017-04-19  0:44 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-gfx, Argenziano, Antonio

On 18/04/17 17:26, Daniele Ceraolo Spurio wrote:
>
>
> On 18/04/17 13:23, Michel Thierry wrote:
>> From: Arun Siluvery <arun.siluvery@linux.intel.com>
>>
>> GuC expects a list of registers from the driver which are saved/restored
>> during engine reset. The type of value to be saved is controlled by
>> flags. We provide a minimal set of registers that we want GuC to save and
>> restore. This is not an issue in case of engine reset as driver
>> initializes
>> most of them following an engine reset, but in case of media reset (aka
>> watchdog reset) which is completely internal to GuC (including
>> resubmission
>> of hung workload), it is necessary to provide this list, otherwise GuC
>> won't
>> be able to schedule further workloads after a reset. This is the minimal
>> set of registers identified for things to work as expected but if we see
>> any new issues, this register list can be expanded.
>>
>> v2: REGSET_MASKED is too difficult for GuC, use REGSET_SAVE_DEFAULT_VALUE
>> and current value from RING_MODE reg instead; no need to preserve
>> head/tail either, be extra paranoid and save whitelisted registers
>> (Daniele).
>>
>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>> Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
>> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
>> ---
>>  drivers/gpu/drm/i915/i915_guc_submission.c | 60
>> +++++++++++++++++++++++++++++-
>>  1 file changed, 58 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c
>> b/drivers/gpu/drm/i915/i915_guc_submission.c
>> index 1ea36a88d2fb..d772718861df 100644
>> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
>> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
>> @@ -1003,6 +1003,24 @@ static void guc_policies_init(struct
>> guc_policies *policies)
>>      policies->is_valid = 1;
>>  }
>>
>> +/*
>> + * In this macro it is highly unlikely to exceed max value but even
>> if we did
>> + * it is not an error so just throw a warning and continue. Only side
>> effect
>> + * in continuing further means some registers won't be added to
>> save/restore
>> + * list.
>> + */
>> +#define GUC_ADD_MMIO_REG_ADS(node, reg_addr, _flags, defvalue)        \
>> +    do {                                \
>> +        u32 __count = node->number_of_registers;        \
>> +        if (WARN_ON(__count >= GUC_REGSET_MAX_REGISTERS))    \
>> +            continue;                    \
>> +        node->registers[__count].offset = reg_addr.reg;        \
>> +        node->registers[__count].flags = (_flags);        \
>> +        if (defvalue)                        \
>> +            node->registers[__count].value = (defvalue);    \
>> +        node->number_of_registers++;                \
>> +    } while (0)
>> +
>>  static int guc_ads_create(struct intel_guc *guc)
>>  {
>>      struct drm_i915_private *dev_priv = guc_to_i915(guc);
>> @@ -1016,6 +1034,7 @@ static int guc_ads_create(struct intel_guc *guc)
>>          u8 reg_state_buffer[GUC_S3_SAVE_SPACE_PAGES * PAGE_SIZE];
>>      } __packed *blob;
>>      struct intel_engine_cs *engine;
>> +    struct i915_workarounds *workarounds = &dev_priv->workarounds;
>>      enum intel_engine_id id;
>>      u32 base;
>>
>> @@ -1035,6 +1054,39 @@ static int guc_ads_create(struct intel_guc *guc)
>>
>>      /* MMIO reg state */
>>      for_each_engine(engine, dev_priv, id) {
>> +        u32 i;
>> +        struct guc_mmio_regset *eng_reg =
>> +            &blob->reg_state.engine_reg[engine->guc_id];
>> +
>> +        /*
>> +         * Provide a list of registers to be saved/restored during gpu
>> +         * reset. This is mainly required for Media reset (aka watchdog
>> +         * timeout) which is completely under the control of GuC
>> +         * (resubmission of hung workload is handled inside GuC).
>> +         */
>> +        GUC_ADD_MMIO_REG_ADS(eng_reg, RING_HWS_PGA(engine->mmio_base),
>> +                     GUC_REGSET_ENGINERESET |
>> +                     GUC_REGSET_SAVE_CURRENT_VALUE, 0);
>> +
>> +        /*
>> +         * Workaround the guc issue with masked registers, note that
>> +         * at this point guc submission is still disabled and the mode
>> +         * register doesnt have the irq_steering bit set, which we
>> +         * need to fwd irqs to GuC.
>> +         */
>> +        GUC_ADD_MMIO_REG_ADS(eng_reg, RING_MODE_GEN7(engine),
>> +                     GUC_REGSET_ENGINERESET |
>> +                     GUC_REGSET_SAVE_DEFAULT_VALUE,
>> +                     I915_READ(RING_MODE_GEN7(engine)) |
>> +                     GFX_INTERRUPT_STEERING | (0xFFFF<<16));
>> +
>> +        GUC_ADD_MMIO_REG_ADS(eng_reg, RING_IMR(engine->mmio_base),
>> +                     GUC_REGSET_ENGINERESET |
>> +                     GUC_REGSET_SAVE_CURRENT_VALUE, 0);
>> +
>
> I might just be too paranoid, but I think that we also have to add the
> registers that we use for WAs via mmio (i.e. not using an LRI in the
> ringbuffer). I did a quick test for the registers in the
> gen9_init_workarounds and skl_init_workarounds functions that we write
> using I915_WRITE and it looks like some of them lose their value after
> and RCS media reset:
>
>  REG       WA BITS    VAL PRE-MR    VAL POST-MR
> 0x20D4    (1<<2)        0x00000004    0x00000000
> 0xb11c    (1<<2)        0x00000005    0x00000001
> 0x4090    (1<<8) (1<<25)    0x02000100    0x02000100
> 0xb118    (1<<21)        0x40600000    0x40400000
> 0x20e0    (1<<14)        0x00004000    0x00000000
> 0xB004    (1<<7)        0x29124180    0x29124100
>

Indeed. Kind of makes sense since i915 won't be aware of the reset at 
all (compared to the non-guc version). Btw it only seems to happen if 
the watchdog is triggered in the render engine.

> For some reason the media reset count in debugfs is still showing zero
> though, so I might be doing something wrong. Can you double check what
> happens to those registers?
>

You need fw version 9.x+

> Thanks,
> Daniele
>
>> +        DRM_DEBUG_DRIVER("%s register save/restore count: %u\n",
>> +                 engine->name, eng_reg->number_of_registers);
>> +
>>          blob->reg_state.white_list[engine->guc_id].mmio_start =
>>
>> i915_mmio_reg_offset(RING_FORCE_TO_NONPRIV(engine->mmio_base, 0));
>>
>> @@ -1044,9 +1096,13 @@ static int guc_ads_create(struct intel_guc *guc)
>>           * inconsistencies with the handling of FORCE_TO_NONPRIV
>>           * registers.
>>           */
>> -        blob->reg_state.white_list[engine->guc_id].count = 0;
>> +        blob->reg_state.white_list[engine->guc_id].count =
>> +                    workarounds->hw_whitelist_count[id];
>>
>> -        /* Nothing to be saved or restored for now. */
>> +        for (i = 0; i < workarounds->hw_whitelist_count[id]; i++) {
>> +            blob->reg_state.white_list[engine->guc_id].offsets[i] =
>> +                I915_READ(RING_FORCE_TO_NONPRIV(engine->mmio_base, i));
>> +        }
>>      }
>>
>>      /*
>>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 13/20] drm/i915/guc: Provide register list to be saved/restored during engine reset
  2017-04-18 20:23 ` [PATCH v6 13/20] drm/i915/guc: Provide register list to be saved/restored during engine reset Michel Thierry
@ 2017-04-19  0:26   ` Daniele Ceraolo Spurio
  2017-04-19  0:44     ` Michel Thierry
  0 siblings, 1 reply; 11+ messages in thread
From: Daniele Ceraolo Spurio @ 2017-04-19  0:26 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx, Argenziano, Antonio



On 18/04/17 13:23, Michel Thierry wrote:
> From: Arun Siluvery <arun.siluvery@linux.intel.com>
>
> GuC expects a list of registers from the driver which are saved/restored
> during engine reset. The type of value to be saved is controlled by
> flags. We provide a minimal set of registers that we want GuC to save and
> restore. This is not an issue in case of engine reset as driver initializes
> most of them following an engine reset, but in case of media reset (aka
> watchdog reset) which is completely internal to GuC (including resubmission
> of hung workload), it is necessary to provide this list, otherwise GuC won't
> be able to schedule further workloads after a reset. This is the minimal
> set of registers identified for things to work as expected but if we see
> any new issues, this register list can be expanded.
>
> v2: REGSET_MASKED is too difficult for GuC, use REGSET_SAVE_DEFAULT_VALUE
> and current value from RING_MODE reg instead; no need to preserve
> head/tail either, be extra paranoid and save whitelisted registers (Daniele).
>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_guc_submission.c | 60 +++++++++++++++++++++++++++++-
>  1 file changed, 58 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> index 1ea36a88d2fb..d772718861df 100644
> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> @@ -1003,6 +1003,24 @@ static void guc_policies_init(struct guc_policies *policies)
>  	policies->is_valid = 1;
>  }
>
> +/*
> + * In this macro it is highly unlikely to exceed max value but even if we did
> + * it is not an error so just throw a warning and continue. Only side effect
> + * in continuing further means some registers won't be added to save/restore
> + * list.
> + */
> +#define GUC_ADD_MMIO_REG_ADS(node, reg_addr, _flags, defvalue)		\
> +	do {								\
> +		u32 __count = node->number_of_registers;		\
> +		if (WARN_ON(__count >= GUC_REGSET_MAX_REGISTERS))	\
> +			continue;					\
> +		node->registers[__count].offset = reg_addr.reg;		\
> +		node->registers[__count].flags = (_flags);		\
> +		if (defvalue)						\
> +			node->registers[__count].value = (defvalue);	\
> +		node->number_of_registers++;				\
> +	} while (0)
> +
>  static int guc_ads_create(struct intel_guc *guc)
>  {
>  	struct drm_i915_private *dev_priv = guc_to_i915(guc);
> @@ -1016,6 +1034,7 @@ static int guc_ads_create(struct intel_guc *guc)
>  		u8 reg_state_buffer[GUC_S3_SAVE_SPACE_PAGES * PAGE_SIZE];
>  	} __packed *blob;
>  	struct intel_engine_cs *engine;
> +	struct i915_workarounds *workarounds = &dev_priv->workarounds;
>  	enum intel_engine_id id;
>  	u32 base;
>
> @@ -1035,6 +1054,39 @@ static int guc_ads_create(struct intel_guc *guc)
>
>  	/* MMIO reg state */
>  	for_each_engine(engine, dev_priv, id) {
> +		u32 i;
> +		struct guc_mmio_regset *eng_reg =
> +			&blob->reg_state.engine_reg[engine->guc_id];
> +
> +		/*
> +		 * Provide a list of registers to be saved/restored during gpu
> +		 * reset. This is mainly required for Media reset (aka watchdog
> +		 * timeout) which is completely under the control of GuC
> +		 * (resubmission of hung workload is handled inside GuC).
> +		 */
> +		GUC_ADD_MMIO_REG_ADS(eng_reg, RING_HWS_PGA(engine->mmio_base),
> +				     GUC_REGSET_ENGINERESET |
> +				     GUC_REGSET_SAVE_CURRENT_VALUE, 0);
> +
> +		/*
> +		 * Workaround the guc issue with masked registers, note that
> +		 * at this point guc submission is still disabled and the mode
> +		 * register doesnt have the irq_steering bit set, which we
> +		 * need to fwd irqs to GuC.
> +		 */
> +		GUC_ADD_MMIO_REG_ADS(eng_reg, RING_MODE_GEN7(engine),
> +				     GUC_REGSET_ENGINERESET |
> +				     GUC_REGSET_SAVE_DEFAULT_VALUE,
> +				     I915_READ(RING_MODE_GEN7(engine)) |
> +				     GFX_INTERRUPT_STEERING | (0xFFFF<<16));
> +
> +		GUC_ADD_MMIO_REG_ADS(eng_reg, RING_IMR(engine->mmio_base),
> +				     GUC_REGSET_ENGINERESET |
> +				     GUC_REGSET_SAVE_CURRENT_VALUE, 0);
> +

I might just be too paranoid, but I think that we also have to add the 
registers that we use for WAs via mmio (i.e. not using an LRI in the 
ringbuffer). I did a quick test for the registers in the 
gen9_init_workarounds and skl_init_workarounds functions that we write 
using I915_WRITE and it looks like some of them lose their value after 
and RCS media reset:

  REG	   WA BITS	VAL PRE-MR	VAL POST-MR
0x20D4	(1<<2)		0x00000004	0x00000000
0xb11c	(1<<2)		0x00000005	0x00000001
0x4090	(1<<8) (1<<25)	0x02000100	0x02000100
0xb118	(1<<21)		0x40600000	0x40400000
0x20e0	(1<<14)		0x00004000	0x00000000
0xB004	(1<<7)		0x29124180	0x29124100

For some reason the media reset count in debugfs is still showing zero 
though, so I might be doing something wrong. Can you double check what 
happens to those registers?

Thanks,
Daniele

> +		DRM_DEBUG_DRIVER("%s register save/restore count: %u\n",
> +				 engine->name, eng_reg->number_of_registers);
> +
>  		blob->reg_state.white_list[engine->guc_id].mmio_start =
>  			i915_mmio_reg_offset(RING_FORCE_TO_NONPRIV(engine->mmio_base, 0));
>
> @@ -1044,9 +1096,13 @@ static int guc_ads_create(struct intel_guc *guc)
>  		 * inconsistencies with the handling of FORCE_TO_NONPRIV
>  		 * registers.
>  		 */
> -		blob->reg_state.white_list[engine->guc_id].count = 0;
> +		blob->reg_state.white_list[engine->guc_id].count =
> +					workarounds->hw_whitelist_count[id];
>
> -		/* Nothing to be saved or restored for now. */
> +		for (i = 0; i < workarounds->hw_whitelist_count[id]; i++) {
> +			blob->reg_state.white_list[engine->guc_id].offsets[i] =
> +				I915_READ(RING_FORCE_TO_NONPRIV(engine->mmio_base, i));
> +		}
>  	}
>
>  	/*
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v6 13/20] drm/i915/guc: Provide register list to be saved/restored during engine reset
  2017-04-18 20:23 [PATCH v6 00/20] Gen8+ engine-reset Michel Thierry
@ 2017-04-18 20:23 ` Michel Thierry
  2017-04-19  0:26   ` Daniele Ceraolo Spurio
  0 siblings, 1 reply; 11+ messages in thread
From: Michel Thierry @ 2017-04-18 20:23 UTC (permalink / raw)
  To: intel-gfx

From: Arun Siluvery <arun.siluvery@linux.intel.com>

GuC expects a list of registers from the driver which are saved/restored
during engine reset. The type of value to be saved is controlled by
flags. We provide a minimal set of registers that we want GuC to save and
restore. This is not an issue in case of engine reset as driver initializes
most of them following an engine reset, but in case of media reset (aka
watchdog reset) which is completely internal to GuC (including resubmission
of hung workload), it is necessary to provide this list, otherwise GuC won't
be able to schedule further workloads after a reset. This is the minimal
set of registers identified for things to work as expected but if we see
any new issues, this register list can be expanded.

v2: REGSET_MASKED is too difficult for GuC, use REGSET_SAVE_DEFAULT_VALUE
and current value from RING_MODE reg instead; no need to preserve
head/tail either, be extra paranoid and save whitelisted registers (Daniele).

Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_guc_submission.c | 60 +++++++++++++++++++++++++++++-
 1 file changed, 58 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 1ea36a88d2fb..d772718861df 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -1003,6 +1003,24 @@ static void guc_policies_init(struct guc_policies *policies)
 	policies->is_valid = 1;
 }
 
+/*
+ * In this macro it is highly unlikely to exceed max value but even if we did
+ * it is not an error so just throw a warning and continue. Only side effect
+ * in continuing further means some registers won't be added to save/restore
+ * list.
+ */
+#define GUC_ADD_MMIO_REG_ADS(node, reg_addr, _flags, defvalue)		\
+	do {								\
+		u32 __count = node->number_of_registers;		\
+		if (WARN_ON(__count >= GUC_REGSET_MAX_REGISTERS))	\
+			continue;					\
+		node->registers[__count].offset = reg_addr.reg;		\
+		node->registers[__count].flags = (_flags);		\
+		if (defvalue)						\
+			node->registers[__count].value = (defvalue);	\
+		node->number_of_registers++;				\
+	} while (0)
+
 static int guc_ads_create(struct intel_guc *guc)
 {
 	struct drm_i915_private *dev_priv = guc_to_i915(guc);
@@ -1016,6 +1034,7 @@ static int guc_ads_create(struct intel_guc *guc)
 		u8 reg_state_buffer[GUC_S3_SAVE_SPACE_PAGES * PAGE_SIZE];
 	} __packed *blob;
 	struct intel_engine_cs *engine;
+	struct i915_workarounds *workarounds = &dev_priv->workarounds;
 	enum intel_engine_id id;
 	u32 base;
 
@@ -1035,6 +1054,39 @@ static int guc_ads_create(struct intel_guc *guc)
 
 	/* MMIO reg state */
 	for_each_engine(engine, dev_priv, id) {
+		u32 i;
+		struct guc_mmio_regset *eng_reg =
+			&blob->reg_state.engine_reg[engine->guc_id];
+
+		/*
+		 * Provide a list of registers to be saved/restored during gpu
+		 * reset. This is mainly required for Media reset (aka watchdog
+		 * timeout) which is completely under the control of GuC
+		 * (resubmission of hung workload is handled inside GuC).
+		 */
+		GUC_ADD_MMIO_REG_ADS(eng_reg, RING_HWS_PGA(engine->mmio_base),
+				     GUC_REGSET_ENGINERESET |
+				     GUC_REGSET_SAVE_CURRENT_VALUE, 0);
+
+		/*
+		 * Workaround the guc issue with masked registers, note that
+		 * at this point guc submission is still disabled and the mode
+		 * register doesnt have the irq_steering bit set, which we
+		 * need to fwd irqs to GuC.
+		 */
+		GUC_ADD_MMIO_REG_ADS(eng_reg, RING_MODE_GEN7(engine),
+				     GUC_REGSET_ENGINERESET |
+				     GUC_REGSET_SAVE_DEFAULT_VALUE,
+				     I915_READ(RING_MODE_GEN7(engine)) |
+				     GFX_INTERRUPT_STEERING | (0xFFFF<<16));
+
+		GUC_ADD_MMIO_REG_ADS(eng_reg, RING_IMR(engine->mmio_base),
+				     GUC_REGSET_ENGINERESET |
+				     GUC_REGSET_SAVE_CURRENT_VALUE, 0);
+
+		DRM_DEBUG_DRIVER("%s register save/restore count: %u\n",
+				 engine->name, eng_reg->number_of_registers);
+
 		blob->reg_state.white_list[engine->guc_id].mmio_start =
 			i915_mmio_reg_offset(RING_FORCE_TO_NONPRIV(engine->mmio_base, 0));
 
@@ -1044,9 +1096,13 @@ static int guc_ads_create(struct intel_guc *guc)
 		 * inconsistencies with the handling of FORCE_TO_NONPRIV
 		 * registers.
 		 */
-		blob->reg_state.white_list[engine->guc_id].count = 0;
+		blob->reg_state.white_list[engine->guc_id].count =
+					workarounds->hw_whitelist_count[id];
 
-		/* Nothing to be saved or restored for now. */
+		for (i = 0; i < workarounds->hw_whitelist_count[id]; i++) {
+			blob->reg_state.white_list[engine->guc_id].offsets[i] =
+				I915_READ(RING_FORCE_TO_NONPRIV(engine->mmio_base, i));
+		}
 	}
 
 	/*
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-04-21 20:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <02fb15a-d90a-7d82-ab57-032a77636ba8@intel.com>
2017-04-19 18:35 ` [PATCH v6 13/20] drm/i915/guc: Provide register list to be saved/restored during engine reset Michel Thierry
2017-04-20 11:33   ` Joonas Lahtinen
2017-04-20 16:39     ` Daniele Ceraolo Spurio
2017-04-20 17:29       ` Michel Thierry
2017-04-21 20:07         ` Michel Thierry
2017-04-21 20:10           ` Daniele Ceraolo Spurio
2017-04-21 20:21             ` Chris Wilson
2017-04-21 20:31               ` Michel Thierry
2017-04-18 20:23 [PATCH v6 00/20] Gen8+ engine-reset Michel Thierry
2017-04-18 20:23 ` [PATCH v6 13/20] drm/i915/guc: Provide register list to be saved/restored during engine reset Michel Thierry
2017-04-19  0:26   ` Daniele Ceraolo Spurio
2017-04-19  0:44     ` Michel Thierry

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.