All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Move BDW workarounds to ring init fn
@ 2014-07-28 16:31 arun.siluvery
  2014-07-28 16:31 ` [RFC] drm/i915/bdw: Initialize BDW workarounds in render " arun.siluvery
  2014-07-28 17:26 ` [RFC] Move BDW workarounds to " Ville Syrjälä
  0 siblings, 2 replies; 10+ messages in thread
From: arun.siluvery @ 2014-07-28 16:31 UTC (permalink / raw)
  To: intel-gfx

From: Arun Siluvery <arun.siluvery@linux.intel.com>

This patch moves BDW workarounds from init_clock_gating() to render ring
init fn otherwise they are lost when gpu is reset.
In case of execlists, some of the workarounds modify registers that are
part of register state context which doesn't get initialized until
init_clock_gating(); this results in default context with incorrect values
as it is restored and saved before updated by workarounds.

Open issue:
For Wa4x4STCOptimizationDisable, we set CACHE_MODE_1[6:6] = 1
At the time when HW contexts are enabled after rings are initialized with
default context this workaround is valid but followed by a context switch
this is getting reset, please see below log snippet.

...
[    5.978209] [drm:i915_pages_create_for_stolen] offset=0x0, size=8294400
[    5.978213] [drm:intel_alloc_plane_obj] plane fb obj ffff8801472e0000
[    5.978215] [drm:i915_gem_setup_global_gtt] reserving preallocated space: 0 + 7e9000
[    5.978216] [drm:i915_gem_setup_global_gtt] clearing unused GTT space: [7e9000, fffff000]
[    5.979613] [drm:i915_gem_init] CACHE_MODE_1: 0x00000180
[    5.981372] [drm:gen8_ppgtt_init] Allocated 4 pages for page directories (0 wasted)
[    5.981373] [drm:gen8_ppgtt_init] Allocated 2048 pages for page tables (0 wasted)
[    5.981376] [drm:i915_gem_context_init] HW context support initialized
[    5.981462] [drm:i915_gem_init_hw] CACHE_MODE_1: 0x00000180
[    5.981467] [drm:i915_gem_init_rings] CACHE_MODE_1: 0x00000180
[    5.981704] [drm:bdw_init_workarounds] CACHE_MODE_1: 0x000001C0
[    5.981716] [drm:init_status_page] bsd ring hws offset: 0x0081e000
[    5.981792] [drm:init_status_page] blitter ring hws offset: 0x0083f000
[    5.981910] [drm:init_status_page] video enhancement ring hws offset: 0x00860000
[    5.982001] [drm:i915_gem_init_hw] CACHE_MODE_1: 0x000001C0
[    5.982104] [drm:i915_gem_context_enable] Switch render ring to default_context
[    5.982106] [drm:i915_gem_render_state_init] render ring: Render state init
[    5.982120] [drm:do_switch] render ring, CACHE_MODE_1: 0x000001C0, uninitialized: 1
[    5.982121] [drm:i915_gem_context_enable] Switch bsd ring to default_context
[    5.982122] [drm:do_switch] bsd ring, CACHE_MODE_1: 0x000001C0, uninitialized: 0
[    5.982123] [drm:i915_gem_context_enable] Switch blitter ring to default_context
[    5.982126] [drm:do_switch] blitter ring, CACHE_MODE_1: 0x000001C0, uninitialized: 0
[    5.982126] [drm:i915_gem_context_enable] Switch video enhancement ring to default_context
[    5.982128] [drm:do_switch] video enhancement ring, CACHE_MODE_1: 0x000001C0, uninitialized: 0
[    5.982133] [drm:i915_gem_init] CACHE_MODE_1: 0x000001C0
[    5.982258] [drm:intel_init_clock_gating]
...
[   10.037019] [drm:do_switch] blitter ring, CACHE_MODE_1: 0x00000180, uninitialized: 0
...
[   10.488145] [drm:do_switch] render ring, CACHE_MODE_1: 0x00000180, uninitialized: 0
...

I am currently testing this with an igt which triggers a gpu reset and compares
WA register contents before and after reset but the test fails because of
this register hence not sending it now.
Please let me know how to keep this WA valid after a context switch.


Arun Siluvery (1):
  drm/i915/bdw: Initialize BDW workarounds in render ring init fn

 drivers/gpu/drm/i915/i915_debugfs.c     | 46 ++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_pm.c         | 59 ----------------------------
 drivers/gpu/drm/i915/intel_ringbuffer.c | 68 +++++++++++++++++++++++++++++++++
 3 files changed, 114 insertions(+), 59 deletions(-)

-- 
1.9.2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC] drm/i915/bdw: Initialize BDW workarounds in render ring init fn
  2014-07-28 16:31 [RFC] Move BDW workarounds to ring init fn arun.siluvery
@ 2014-07-28 16:31 ` arun.siluvery
  2014-07-28 17:00   ` Ville Syrjälä
  2014-07-28 17:26 ` [RFC] Move BDW workarounds to " Ville Syrjälä
  1 sibling, 1 reply; 10+ messages in thread
From: arun.siluvery @ 2014-07-28 16:31 UTC (permalink / raw)
  To: intel-gfx

From: Arun Siluvery <arun.siluvery@linux.intel.com>

The workarounds at the moment are initialized in init_clock_gating() but
they are lost during reset; In case of execlists some workarounds modify
registers that are part of register state context, since these are not
initialized until init_clock_gating() default context ends up with
incorrect values as render context is restored and saved before updated
by workarounds hence move them to render ring init fn. This should be
ok as these workarounds are not related to display clock gating.

Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c     | 46 ++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_pm.c         | 59 ----------------------------
 drivers/gpu/drm/i915/intel_ringbuffer.c | 68 +++++++++++++++++++++++++++++++++
 3 files changed, 114 insertions(+), 59 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 083683c..cf7da30 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2397,20 +2397,65 @@ static int i915_shared_dplls_info(struct seq_file *m, void *unused)
 		seq_printf(m, " dpll_md: 0x%08x\n", pll->hw_state.dpll_md);
 		seq_printf(m, " fp0:     0x%08x\n", pll->hw_state.fp0);
 		seq_printf(m, " fp1:     0x%08x\n", pll->hw_state.fp1);
 		seq_printf(m, " wrpll:   0x%08x\n", pll->hw_state.wrpll);
 	}
 	drm_modeset_unlock_all(dev);
 
 	return 0;
 }
 
+static int i915_workaround_info(struct seq_file *m, void *unused)
+{
+	struct drm_info_node *node = (struct drm_info_node *) m->private;
+	struct drm_device *dev = node->minor->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int ret;
+
+	ret = mutex_lock_interruptible(&dev->struct_mutex);
+	if (ret)
+		return ret;
+
+	if (IS_BROADWELL(dev)) {
+		seq_printf(m, "GEN8_ROW_CHICKEN:\t0x%08x\n",
+			   I915_READ(GEN8_ROW_CHICKEN));
+		seq_printf(m, "HALF_SLICE_CHICKEN3:\t0x%08x\n",
+			   I915_READ(HALF_SLICE_CHICKEN3));
+		seq_printf(m, "GAMTARBMODE:\t0x%08x\n", I915_READ(GAMTARBMODE));
+		seq_printf(m, "_3D_CHICKEN3:\t0x%08x\n",
+			   I915_READ(_3D_CHICKEN3));
+		seq_printf(m, "COMMON_SLICE_CHICKEN2:\t0x%08x\n",
+			   I915_READ(COMMON_SLICE_CHICKEN2));
+		seq_printf(m, "GEN7_HALF_SLICE_CHICKEN1:\t0x%08x\n",
+			   I915_READ(GEN7_HALF_SLICE_CHICKEN1));
+		seq_printf(m, "GEN7_ROW_CHICKEN2:\t0x%08x\n",
+			   I915_READ(GEN7_ROW_CHICKEN2));
+		seq_printf(m, "GAM_ECOCHK:\t0x%08x\n",
+			   I915_READ(GAM_ECOCHK));
+		seq_printf(m, "HDC_CHICKEN0:\t0x%08x\n",
+			   I915_READ(HDC_CHICKEN0));
+		seq_printf(m, "GEN7_FF_THREAD_MODE:\t0x%08x\n",
+			   I915_READ(GEN7_FF_THREAD_MODE));
+		seq_printf(m, "GEN8_UCGCTL6:\t0x%08x\n",
+			   I915_READ(GEN8_UCGCTL6));
+		seq_printf(m, "GEN6_RC_SLEEP_PSMI_CONTROL:\t0x%08x\n",
+			   I915_READ(GEN6_RC_SLEEP_PSMI_CONTROL));
+		seq_printf(m, "CACHE_MODE_1:\t0x%08x\n",
+			   I915_READ(CACHE_MODE_1));
+	} else
+		DRM_DEBUG_DRIVER("Not available for Gen%d\n",
+				 INTEL_INFO(dev)->gen);
+
+	mutex_unlock(&dev->struct_mutex);
+	return 0;
+}
+
 struct pipe_crc_info {
 	const char *name;
 	struct drm_device *dev;
 	enum pipe pipe;
 };
 
 static int i915_pipe_crc_open(struct inode *inode, struct file *filep)
 {
 	struct pipe_crc_info *info = inode->i_private;
 	struct drm_i915_private *dev_priv = info->dev->dev_private;
@@ -3904,20 +3949,21 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_ppgtt_info", i915_ppgtt_info, 0},
 	{"i915_llc", i915_llc, 0},
 	{"i915_edp_psr_status", i915_edp_psr_status, 0},
 	{"i915_sink_crc_eDP1", i915_sink_crc, 0},
 	{"i915_energy_uJ", i915_energy_uJ, 0},
 	{"i915_pc8_status", i915_pc8_status, 0},
 	{"i915_power_domain_info", i915_power_domain_info, 0},
 	{"i915_display_info", i915_display_info, 0},
 	{"i915_semaphore_status", i915_semaphore_status, 0},
 	{"i915_shared_dplls_info", i915_shared_dplls_info, 0},
+	{"i915_workaround_info", i915_workaround_info, 0},
 };
 #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list)
 
 static const struct i915_debugfs_files {
 	const char *name;
 	const struct file_operations *fops;
 } i915_debugfs_files[] = {
 	{"i915_wedged", &i915_wedged_fops},
 	{"i915_max_freq", &i915_max_freq_fops},
 	{"i915_min_freq", &i915_min_freq_fops},
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 3f88f29..9597f95 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -5393,101 +5393,42 @@ static void gen8_init_clock_gating(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	enum pipe pipe;
 
 	I915_WRITE(WM3_LP_ILK, 0);
 	I915_WRITE(WM2_LP_ILK, 0);
 	I915_WRITE(WM1_LP_ILK, 0);
 
 	/* FIXME(BDW): Check all the w/a, some might only apply to
 	 * pre-production hw. */
 
-	/* WaDisablePartialInstShootdown:bdw */
-	I915_WRITE(GEN8_ROW_CHICKEN,
-		   _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE));
-
-	/* WaDisableThreadStallDopClockGating:bdw */
-	/* FIXME: Unclear whether we really need this on production bdw. */
-	I915_WRITE(GEN8_ROW_CHICKEN,
-		   _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE));
-
-	/*
-	 * This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for
-	 * pre-production hardware
-	 */
-	I915_WRITE(HALF_SLICE_CHICKEN3,
-		   _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS));
-	I915_WRITE(HALF_SLICE_CHICKEN3,
-		   _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS));
-	I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_BWGTLB_DISABLE));
-
-	I915_WRITE(_3D_CHICKEN3,
-		   _MASKED_BIT_ENABLE(_3D_CHICKEN_SDE_LIMIT_FIFO_POLY_DEPTH(2)));
-
-	I915_WRITE(COMMON_SLICE_CHICKEN2,
-		   _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE));
-
-	I915_WRITE(GEN7_HALF_SLICE_CHICKEN1,
-		   _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE));
-
-	/* WaDisableDopClockGating:bdw May not be needed for production */
-	I915_WRITE(GEN7_ROW_CHICKEN2,
-		   _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE));
-
-	/* WaSwitchSolVfFArbitrationPriority:bdw */
-	I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL);
 
 	/* WaPsrDPAMaskVBlankInSRD:bdw */
 	I915_WRITE(CHICKEN_PAR1_1,
 		   I915_READ(CHICKEN_PAR1_1) | DPA_MASK_VBLANK_SRD);
 
 	/* WaPsrDPRSUnmaskVBlankInSRD:bdw */
 	for_each_pipe(pipe) {
 		I915_WRITE(CHICKEN_PIPESL_1(pipe),
 			   I915_READ(CHICKEN_PIPESL_1(pipe)) |
 			   BDW_DPRS_MASK_VBLANK_SRD);
 	}
 
-	/* Use Force Non-Coherent whenever executing a 3D context. This is a
-	 * workaround for for a possible hang in the unlikely event a TLB
-	 * invalidation occurs during a PSD flush.
-	 */
-	I915_WRITE(HDC_CHICKEN0,
-		   I915_READ(HDC_CHICKEN0) |
-		   _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT));
-
-	/* WaVSRefCountFullforceMissDisable:bdw */
-	/* WaDSRefCountFullforceMissDisable:bdw */
-	I915_WRITE(GEN7_FF_THREAD_MODE,
-		   I915_READ(GEN7_FF_THREAD_MODE) &
-		   ~(GEN8_FF_DS_REF_CNT_FFME | GEN7_FF_VS_REF_CNT_FFME));
-
 	/*
 	 * BSpec recommends 8x4 when MSAA is used,
 	 * however in practice 16x4 seems fastest.
 	 *
 	 * Note that PS/WM thread counts depend on the WIZ hashing
 	 * disable bit, which we don't touch here, but it's good
 	 * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
 	 */
 	I915_WRITE(GEN7_GT_MODE,
 		   GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);
-
-	I915_WRITE(GEN6_RC_SLEEP_PSMI_CONTROL,
-		   _MASKED_BIT_ENABLE(GEN8_RC_SEMA_IDLE_MSG_DISABLE));
-
-	/* WaDisableSDEUnitClockGating:bdw */
-	I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) |
-		   GEN8_SDEUNIT_CLOCK_GATE_DISABLE);
-
-	/* Wa4x4STCOptimizationDisable:bdw */
-	I915_WRITE(CACHE_MODE_1,
-		   _MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE));
 }
 
 static void haswell_init_clock_gating(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
 	ilk_init_lp_watermarks(dev);
 
 	/* L3 caching of data atomics doesn't work -- disable it. */
 	I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index b3d8f76..6818e34 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -591,20 +591,85 @@ init_pipe_control(struct intel_engine_cs *ring)
 	return 0;
 
 err_unpin:
 	i915_gem_object_ggtt_unpin(ring->scratch.obj);
 err_unref:
 	drm_gem_object_unreference(&ring->scratch.obj->base);
 err:
 	return ret;
 }
 
+static void bdw_init_workarounds(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	/* WaDisablePartialInstShootdown:bdw */
+	I915_WRITE(GEN8_ROW_CHICKEN,
+		   _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE));
+
+	/* WaDisableThreadStallDopClockGating:bdw */
+	/* FIXME: Unclear whether we really need these on production bdw. */
+	I915_WRITE(GEN8_ROW_CHICKEN,
+		   _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE));
+
+	/*
+	 * This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for
+	 * pre-production hardware
+	 */
+	I915_WRITE(HALF_SLICE_CHICKEN3,
+		   _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS));
+	I915_WRITE(HALF_SLICE_CHICKEN3,
+		   _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS));
+	I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_BWGTLB_DISABLE));
+
+	I915_WRITE(_3D_CHICKEN3,
+		   _MASKED_BIT_ENABLE(_3D_CHICKEN_SDE_LIMIT_FIFO_POLY_DEPTH(2)));
+
+	I915_WRITE(COMMON_SLICE_CHICKEN2,
+		   _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE));
+
+	I915_WRITE(GEN7_HALF_SLICE_CHICKEN1,
+		   _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE));
+
+	/* WaDisableDopClockGating:bdw May not be needed for production */
+	I915_WRITE(GEN7_ROW_CHICKEN2,
+		   _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE));
+
+	/* WaSwitchSolVfFArbitrationPriority:bdw */
+	I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL);
+
+	/* Use Force Non-Coherent whenever executing a 3D context. This is a
+	 * workaround for for a possible hang in the unlikely event a TLB
+	 * invalidation occurs during a PSD flush.
+	 */
+	I915_WRITE(HDC_CHICKEN0,
+		   I915_READ(HDC_CHICKEN0) |
+		   _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT));
+
+	/* WaVSRefCountFullforceMissDisable:bdw */
+	/* WaDSRefCountFullforceMissDisable:bdw */
+	I915_WRITE(GEN7_FF_THREAD_MODE,
+		   I915_READ(GEN7_FF_THREAD_MODE) &
+		   ~(GEN8_FF_DS_REF_CNT_FFME | GEN7_FF_VS_REF_CNT_FFME));
+
+	/* WaDisableSDEUnitClockGating:bdw */
+	I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) |
+		   GEN8_SDEUNIT_CLOCK_GATE_DISABLE);
+
+	I915_WRITE(GEN6_RC_SLEEP_PSMI_CONTROL,
+		   _MASKED_BIT_ENABLE(GEN8_RC_SEMA_IDLE_MSG_DISABLE));
+
+	/* Wa4x4STCOptimizationDisable:bdw */
+	I915_WRITE(CACHE_MODE_1,
+		   _MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE));
+}
+
 static int init_render_ring(struct intel_engine_cs *ring)
 {
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	int ret = init_ring_common(ring);
 	if (ret)
 		return ret;
 
 	/* WaTimedSingleVertexDispatch:cl,bw,ctg,elk,ilk,snb */
 	if (INTEL_INFO(dev)->gen >= 4 && INTEL_INFO(dev)->gen < 7)
@@ -646,20 +711,23 @@ static int init_render_ring(struct intel_engine_cs *ring)
 		I915_WRITE(CACHE_MODE_0,
 			   _MASKED_BIT_DISABLE(CM0_STC_EVICT_DISABLE_LRA_SNB));
 	}
 
 	if (INTEL_INFO(dev)->gen >= 6)
 		I915_WRITE(INSTPM, _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING));
 
 	if (HAS_L3_DPF(dev))
 		I915_WRITE_IMR(ring, ~GT_PARITY_ERROR(dev));
 
+	if (IS_BROADWELL(dev))
+		bdw_init_workarounds(dev);
+
 	return ret;
 }
 
 static void render_ring_cleanup(struct intel_engine_cs *ring)
 {
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
 	if (dev_priv->semaphore_obj) {
 		i915_gem_object_ggtt_unpin(dev_priv->semaphore_obj);
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC] drm/i915/bdw: Initialize BDW workarounds in render ring init fn
  2014-07-28 16:31 ` [RFC] drm/i915/bdw: Initialize BDW workarounds in render " arun.siluvery
@ 2014-07-28 17:00   ` Ville Syrjälä
  2014-07-28 19:22     ` Daniel Vetter
  2014-07-28 20:46     ` Siluvery, Arun
  0 siblings, 2 replies; 10+ messages in thread
From: Ville Syrjälä @ 2014-07-28 17:00 UTC (permalink / raw)
  To: arun.siluvery; +Cc: intel-gfx

On Mon, Jul 28, 2014 at 05:31:46PM +0100, arun.siluvery@linux.intel.com wrote:
> From: Arun Siluvery <arun.siluvery@linux.intel.com>
> 
> The workarounds at the moment are initialized in init_clock_gating() but
> they are lost during reset; In case of execlists some workarounds modify
> registers that are part of register state context, since these are not
> initialized until init_clock_gating() default context ends up with
> incorrect values as render context is restored and saved before updated
> by workarounds hence move them to render ring init fn. This should be
> ok as these workarounds are not related to display clock gating.
> 
> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c     | 46 ++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_pm.c         | 59 ----------------------------
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 68 +++++++++++++++++++++++++++++++++
>  3 files changed, 114 insertions(+), 59 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 083683c..cf7da30 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -2397,20 +2397,65 @@ static int i915_shared_dplls_info(struct seq_file *m, void *unused)
>  		seq_printf(m, " dpll_md: 0x%08x\n", pll->hw_state.dpll_md);
>  		seq_printf(m, " fp0:     0x%08x\n", pll->hw_state.fp0);
>  		seq_printf(m, " fp1:     0x%08x\n", pll->hw_state.fp1);
>  		seq_printf(m, " wrpll:   0x%08x\n", pll->hw_state.wrpll);
>  	}
>  	drm_modeset_unlock_all(dev);
>  
>  	return 0;
>  }
>  
> +static int i915_workaround_info(struct seq_file *m, void *unused)
> +{
> +	struct drm_info_node *node = (struct drm_info_node *) m->private;
> +	struct drm_device *dev = node->minor->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	int ret;
> +
> +	ret = mutex_lock_interruptible(&dev->struct_mutex);
> +	if (ret)
> +		return ret;
> +
> +	if (IS_BROADWELL(dev)) {
> +		seq_printf(m, "GEN8_ROW_CHICKEN:\t0x%08x\n",
> +			   I915_READ(GEN8_ROW_CHICKEN));
> +		seq_printf(m, "HALF_SLICE_CHICKEN3:\t0x%08x\n",
> +			   I915_READ(HALF_SLICE_CHICKEN3));
> +		seq_printf(m, "GAMTARBMODE:\t0x%08x\n", I915_READ(GAMTARBMODE));
> +		seq_printf(m, "_3D_CHICKEN3:\t0x%08x\n",
> +			   I915_READ(_3D_CHICKEN3));
> +		seq_printf(m, "COMMON_SLICE_CHICKEN2:\t0x%08x\n",
> +			   I915_READ(COMMON_SLICE_CHICKEN2));
> +		seq_printf(m, "GEN7_HALF_SLICE_CHICKEN1:\t0x%08x\n",
> +			   I915_READ(GEN7_HALF_SLICE_CHICKEN1));
> +		seq_printf(m, "GEN7_ROW_CHICKEN2:\t0x%08x\n",
> +			   I915_READ(GEN7_ROW_CHICKEN2));
> +		seq_printf(m, "GAM_ECOCHK:\t0x%08x\n",
> +			   I915_READ(GAM_ECOCHK));
> +		seq_printf(m, "HDC_CHICKEN0:\t0x%08x\n",
> +			   I915_READ(HDC_CHICKEN0));
> +		seq_printf(m, "GEN7_FF_THREAD_MODE:\t0x%08x\n",
> +			   I915_READ(GEN7_FF_THREAD_MODE));
> +		seq_printf(m, "GEN8_UCGCTL6:\t0x%08x\n",
> +			   I915_READ(GEN8_UCGCTL6));
> +		seq_printf(m, "GEN6_RC_SLEEP_PSMI_CONTROL:\t0x%08x\n",
> +			   I915_READ(GEN6_RC_SLEEP_PSMI_CONTROL));
> +		seq_printf(m, "CACHE_MODE_1:\t0x%08x\n",
> +			   I915_READ(CACHE_MODE_1));
> +	} else
> +		DRM_DEBUG_DRIVER("Not available for Gen%d\n",
> +				 INTEL_INFO(dev)->gen);
> +
> +	mutex_unlock(&dev->struct_mutex);
> +	return 0;
> +}
> +

This smells like a separate patch. But I'm not sure we want at all since
intel_reg_read will provide the same information.

>  struct pipe_crc_info {
>  	const char *name;
>  	struct drm_device *dev;
>  	enum pipe pipe;
>  };
>  
>  static int i915_pipe_crc_open(struct inode *inode, struct file *filep)
>  {
>  	struct pipe_crc_info *info = inode->i_private;
>  	struct drm_i915_private *dev_priv = info->dev->dev_private;
> @@ -3904,20 +3949,21 @@ static const struct drm_info_list i915_debugfs_list[] = {
>  	{"i915_ppgtt_info", i915_ppgtt_info, 0},
>  	{"i915_llc", i915_llc, 0},
>  	{"i915_edp_psr_status", i915_edp_psr_status, 0},
>  	{"i915_sink_crc_eDP1", i915_sink_crc, 0},
>  	{"i915_energy_uJ", i915_energy_uJ, 0},
>  	{"i915_pc8_status", i915_pc8_status, 0},
>  	{"i915_power_domain_info", i915_power_domain_info, 0},
>  	{"i915_display_info", i915_display_info, 0},
>  	{"i915_semaphore_status", i915_semaphore_status, 0},
>  	{"i915_shared_dplls_info", i915_shared_dplls_info, 0},
> +	{"i915_workaround_info", i915_workaround_info, 0},
>  };
>  #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list)
>  
>  static const struct i915_debugfs_files {
>  	const char *name;
>  	const struct file_operations *fops;
>  } i915_debugfs_files[] = {
>  	{"i915_wedged", &i915_wedged_fops},
>  	{"i915_max_freq", &i915_max_freq_fops},
>  	{"i915_min_freq", &i915_min_freq_fops},
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 3f88f29..9597f95 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -5393,101 +5393,42 @@ static void gen8_init_clock_gating(struct drm_device *dev)
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	enum pipe pipe;
>  
>  	I915_WRITE(WM3_LP_ILK, 0);
>  	I915_WRITE(WM2_LP_ILK, 0);
>  	I915_WRITE(WM1_LP_ILK, 0);
>  
>  	/* FIXME(BDW): Check all the w/a, some might only apply to
>  	 * pre-production hw. */
>  
> -	/* WaDisablePartialInstShootdown:bdw */
> -	I915_WRITE(GEN8_ROW_CHICKEN,
> -		   _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE));
> -
> -	/* WaDisableThreadStallDopClockGating:bdw */
> -	/* FIXME: Unclear whether we really need this on production bdw. */
> -	I915_WRITE(GEN8_ROW_CHICKEN,
> -		   _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE));
> -
> -	/*
> -	 * This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for
> -	 * pre-production hardware
> -	 */
> -	I915_WRITE(HALF_SLICE_CHICKEN3,
> -		   _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS));
> -	I915_WRITE(HALF_SLICE_CHICKEN3,
> -		   _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS));
> -	I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_BWGTLB_DISABLE));
> -
> -	I915_WRITE(_3D_CHICKEN3,
> -		   _MASKED_BIT_ENABLE(_3D_CHICKEN_SDE_LIMIT_FIFO_POLY_DEPTH(2)));
> -
> -	I915_WRITE(COMMON_SLICE_CHICKEN2,
> -		   _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE));
> -
> -	I915_WRITE(GEN7_HALF_SLICE_CHICKEN1,
> -		   _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE));
> -
> -	/* WaDisableDopClockGating:bdw May not be needed for production */
> -	I915_WRITE(GEN7_ROW_CHICKEN2,
> -		   _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE));
> -
> -	/* WaSwitchSolVfFArbitrationPriority:bdw */
> -	I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL);
>  
>  	/* WaPsrDPAMaskVBlankInSRD:bdw */
>  	I915_WRITE(CHICKEN_PAR1_1,
>  		   I915_READ(CHICKEN_PAR1_1) | DPA_MASK_VBLANK_SRD);
>  
>  	/* WaPsrDPRSUnmaskVBlankInSRD:bdw */
>  	for_each_pipe(pipe) {
>  		I915_WRITE(CHICKEN_PIPESL_1(pipe),
>  			   I915_READ(CHICKEN_PIPESL_1(pipe)) |
>  			   BDW_DPRS_MASK_VBLANK_SRD);
>  	}
>  
> -	/* Use Force Non-Coherent whenever executing a 3D context. This is a
> -	 * workaround for for a possible hang in the unlikely event a TLB
> -	 * invalidation occurs during a PSD flush.
> -	 */
> -	I915_WRITE(HDC_CHICKEN0,
> -		   I915_READ(HDC_CHICKEN0) |
> -		   _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT));
> -
> -	/* WaVSRefCountFullforceMissDisable:bdw */
> -	/* WaDSRefCountFullforceMissDisable:bdw */
> -	I915_WRITE(GEN7_FF_THREAD_MODE,
> -		   I915_READ(GEN7_FF_THREAD_MODE) &
> -		   ~(GEN8_FF_DS_REF_CNT_FFME | GEN7_FF_VS_REF_CNT_FFME));
> -
>  	/*
>  	 * BSpec recommends 8x4 when MSAA is used,
>  	 * however in practice 16x4 seems fastest.
>  	 *
>  	 * Note that PS/WM thread counts depend on the WIZ hashing
>  	 * disable bit, which we don't touch here, but it's good
>  	 * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
>  	 */
>  	I915_WRITE(GEN7_GT_MODE,
>  		   GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);

GT_MODE doesn't get clobbered by a reset? I would have expected it to
behave the same way as CACHE_MODE_{0,1).

> -
> -	I915_WRITE(GEN6_RC_SLEEP_PSMI_CONTROL,
> -		   _MASKED_BIT_ENABLE(GEN8_RC_SEMA_IDLE_MSG_DISABLE));
> -
> -	/* WaDisableSDEUnitClockGating:bdw */
> -	I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) |
> -		   GEN8_SDEUNIT_CLOCK_GATE_DISABLE);

And I would have expected the UCGCTL registers to not be clobbered. IIRC
they didn't get clobbered on my IVB, but my memory might be wrong here.

> -
> -	/* Wa4x4STCOptimizationDisable:bdw */
> -	I915_WRITE(CACHE_MODE_1,
> -		   _MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE));
>  }
>  
>  static void haswell_init_clock_gating(struct drm_device *dev)
>  {
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  
>  	ilk_init_lp_watermarks(dev);
>  
>  	/* L3 caching of data atomics doesn't work -- disable it. */
>  	I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index b3d8f76..6818e34 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -591,20 +591,85 @@ init_pipe_control(struct intel_engine_cs *ring)
>  	return 0;
>  
>  err_unpin:
>  	i915_gem_object_ggtt_unpin(ring->scratch.obj);
>  err_unref:
>  	drm_gem_object_unreference(&ring->scratch.obj->base);
>  err:
>  	return ret;
>  }
>  
> +static void bdw_init_workarounds(struct drm_device *dev)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +
> +	/* WaDisablePartialInstShootdown:bdw */
> +	I915_WRITE(GEN8_ROW_CHICKEN,
> +		   _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE));
> +
> +	/* WaDisableThreadStallDopClockGating:bdw */
> +	/* FIXME: Unclear whether we really need these on production bdw. */
> +	I915_WRITE(GEN8_ROW_CHICKEN,
> +		   _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE));
> +
> +	/*
> +	 * This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for
> +	 * pre-production hardware
> +	 */
> +	I915_WRITE(HALF_SLICE_CHICKEN3,
> +		   _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS));
> +	I915_WRITE(HALF_SLICE_CHICKEN3,
> +		   _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS));
> +	I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_BWGTLB_DISABLE));
> +
> +	I915_WRITE(_3D_CHICKEN3,
> +		   _MASKED_BIT_ENABLE(_3D_CHICKEN_SDE_LIMIT_FIFO_POLY_DEPTH(2)));
> +
> +	I915_WRITE(COMMON_SLICE_CHICKEN2,
> +		   _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE));
> +
> +	I915_WRITE(GEN7_HALF_SLICE_CHICKEN1,
> +		   _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE));
> +
> +	/* WaDisableDopClockGating:bdw May not be needed for production */
> +	I915_WRITE(GEN7_ROW_CHICKEN2,
> +		   _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE));
> +
> +	/* WaSwitchSolVfFArbitrationPriority:bdw */
> +	I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL);
> +
> +	/* Use Force Non-Coherent whenever executing a 3D context. This is a
> +	 * workaround for for a possible hang in the unlikely event a TLB
> +	 * invalidation occurs during a PSD flush.
> +	 */
> +	I915_WRITE(HDC_CHICKEN0,
> +		   I915_READ(HDC_CHICKEN0) |

You might kill this read since it's a masked register. Maybe as a
separate prep-patch.

> +		   _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT));
> +
> +	/* WaVSRefCountFullforceMissDisable:bdw */
> +	/* WaDSRefCountFullforceMissDisable:bdw */
> +	I915_WRITE(GEN7_FF_THREAD_MODE,
> +		   I915_READ(GEN7_FF_THREAD_MODE) &
> +		   ~(GEN8_FF_DS_REF_CNT_FFME | GEN7_FF_VS_REF_CNT_FFME));
> +
> +	/* WaDisableSDEUnitClockGating:bdw */
> +	I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) |
> +		   GEN8_SDEUNIT_CLOCK_GATE_DISABLE);
> +
> +	I915_WRITE(GEN6_RC_SLEEP_PSMI_CONTROL,
> +		   _MASKED_BIT_ENABLE(GEN8_RC_SEMA_IDLE_MSG_DISABLE));
> +
> +	/* Wa4x4STCOptimizationDisable:bdw */
> +	I915_WRITE(CACHE_MODE_1,
> +		   _MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE));

I think it would be nice to reorder this function a bit to keep similar registers
close to each other. The current mismash makes it easy to overlook a
workaround and then we may end up with duplicates (wouldn't be the first
time that happened).

> +}
> +
>  static int init_render_ring(struct intel_engine_cs *ring)
>  {
>  	struct drm_device *dev = ring->dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	int ret = init_ring_common(ring);
>  	if (ret)
>  		return ret;
>  
>  	/* WaTimedSingleVertexDispatch:cl,bw,ctg,elk,ilk,snb */
>  	if (INTEL_INFO(dev)->gen >= 4 && INTEL_INFO(dev)->gen < 7)
> @@ -646,20 +711,23 @@ static int init_render_ring(struct intel_engine_cs *ring)
>  		I915_WRITE(CACHE_MODE_0,
>  			   _MASKED_BIT_DISABLE(CM0_STC_EVICT_DISABLE_LRA_SNB));
>  	}
>  
>  	if (INTEL_INFO(dev)->gen >= 6)
>  		I915_WRITE(INSTPM, _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING));
>  
>  	if (HAS_L3_DPF(dev))
>  		I915_WRITE_IMR(ring, ~GT_PARITY_ERROR(dev));
>  
> +	if (IS_BROADWELL(dev))
> +		bdw_init_workarounds(dev);
> +
>  	return ret;
>  }
>  
>  static void render_ring_cleanup(struct intel_engine_cs *ring)
>  {
>  	struct drm_device *dev = ring->dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  
>  	if (dev_priv->semaphore_obj) {
>  		i915_gem_object_ggtt_unpin(dev_priv->semaphore_obj);
> -- 
> 1.9.2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] Move BDW workarounds to ring init fn
  2014-07-28 16:31 [RFC] Move BDW workarounds to ring init fn arun.siluvery
  2014-07-28 16:31 ` [RFC] drm/i915/bdw: Initialize BDW workarounds in render " arun.siluvery
@ 2014-07-28 17:26 ` Ville Syrjälä
  2014-07-29 22:27   ` Siluvery, Arun
  1 sibling, 1 reply; 10+ messages in thread
From: Ville Syrjälä @ 2014-07-28 17:26 UTC (permalink / raw)
  To: arun.siluvery; +Cc: intel-gfx

On Mon, Jul 28, 2014 at 05:31:45PM +0100, arun.siluvery@linux.intel.com wrote:
> From: Arun Siluvery <arun.siluvery@linux.intel.com>
> 
> This patch moves BDW workarounds from init_clock_gating() to render ring
> init fn otherwise they are lost when gpu is reset.
> In case of execlists, some of the workarounds modify registers that are
> part of register state context which doesn't get initialized until
> init_clock_gating(); this results in default context with incorrect values
> as it is restored and saved before updated by workarounds.

I don't think it has to do with execlists. Many of the registers are
part of the context image even in ring buffer mode AFAIK.

> 
> Open issue:
> For Wa4x4STCOptimizationDisable, we set CACHE_MODE_1[6:6] = 1
> At the time when HW contexts are enabled after rings are initialized with
> default context this workaround is valid but followed by a context switch
> this is getting reset, please see below log snippet.

This is a bit weird. The default context should have restore inhibit==1
so it shouldn't clobber the CACHE_MODE_1 register. There was a specific magic
dance you're supposed to do when accessing such registers with mmio, but here
we do the write even before the first context switch.

Apparently there was some kind of problem with CACHE_MODE_0 on snb too:
 commit 3a69ddd6f872180b6f61fda87152b37202118fbc
 Author: Kenneth Graunke <kenneth@whitecape.org>
 Date:   Fri Apr 27 12:44:41 2012 -0700

    drm/i915: Set the Stencil Cache eviction policy to non-LRA mode.

but IIRC I wasn't able to reproduce it when I tried.

Maybe we need to delay these register writes until we've switched to the default
context?

> 
> ...
> [    5.978209] [drm:i915_pages_create_for_stolen] offset=0x0, size=8294400
> [    5.978213] [drm:intel_alloc_plane_obj] plane fb obj ffff8801472e0000
> [    5.978215] [drm:i915_gem_setup_global_gtt] reserving preallocated space: 0 + 7e9000
> [    5.978216] [drm:i915_gem_setup_global_gtt] clearing unused GTT space: [7e9000, fffff000]
> [    5.979613] [drm:i915_gem_init] CACHE_MODE_1: 0x00000180
> [    5.981372] [drm:gen8_ppgtt_init] Allocated 4 pages for page directories (0 wasted)
> [    5.981373] [drm:gen8_ppgtt_init] Allocated 2048 pages for page tables (0 wasted)
> [    5.981376] [drm:i915_gem_context_init] HW context support initialized
> [    5.981462] [drm:i915_gem_init_hw] CACHE_MODE_1: 0x00000180
> [    5.981467] [drm:i915_gem_init_rings] CACHE_MODE_1: 0x00000180
> [    5.981704] [drm:bdw_init_workarounds] CACHE_MODE_1: 0x000001C0
> [    5.981716] [drm:init_status_page] bsd ring hws offset: 0x0081e000
> [    5.981792] [drm:init_status_page] blitter ring hws offset: 0x0083f000
> [    5.981910] [drm:init_status_page] video enhancement ring hws offset: 0x00860000
> [    5.982001] [drm:i915_gem_init_hw] CACHE_MODE_1: 0x000001C0
> [    5.982104] [drm:i915_gem_context_enable] Switch render ring to default_context
> [    5.982106] [drm:i915_gem_render_state_init] render ring: Render state init
> [    5.982120] [drm:do_switch] render ring, CACHE_MODE_1: 0x000001C0, uninitialized: 1
> [    5.982121] [drm:i915_gem_context_enable] Switch bsd ring to default_context
> [    5.982122] [drm:do_switch] bsd ring, CACHE_MODE_1: 0x000001C0, uninitialized: 0
> [    5.982123] [drm:i915_gem_context_enable] Switch blitter ring to default_context
> [    5.982126] [drm:do_switch] blitter ring, CACHE_MODE_1: 0x000001C0, uninitialized: 0
> [    5.982126] [drm:i915_gem_context_enable] Switch video enhancement ring to default_context
> [    5.982128] [drm:do_switch] video enhancement ring, CACHE_MODE_1: 0x000001C0, uninitialized: 0
> [    5.982133] [drm:i915_gem_init] CACHE_MODE_1: 0x000001C0
> [    5.982258] [drm:intel_init_clock_gating]
> ...
> [   10.037019] [drm:do_switch] blitter ring, CACHE_MODE_1: 0x00000180, uninitialized: 0
> ...
> [   10.488145] [drm:do_switch] render ring, CACHE_MODE_1: 0x00000180, uninitialized: 0
> ...
> 
> I am currently testing this with an igt which triggers a gpu reset and compares
> WA register contents before and after reset but the test fails because of
> this register hence not sending it now.
> Please let me know how to keep this WA valid after a context switch.
> 
> 
> Arun Siluvery (1):
>   drm/i915/bdw: Initialize BDW workarounds in render ring init fn
> 
>  drivers/gpu/drm/i915/i915_debugfs.c     | 46 ++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_pm.c         | 59 ----------------------------
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 68 +++++++++++++++++++++++++++++++++
>  3 files changed, 114 insertions(+), 59 deletions(-)
> 
> -- 
> 1.9.2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] drm/i915/bdw: Initialize BDW workarounds in render ring init fn
  2014-07-28 17:00   ` Ville Syrjälä
@ 2014-07-28 19:22     ` Daniel Vetter
  2014-07-28 20:41       ` Siluvery, Arun
  2014-07-28 20:46     ` Siluvery, Arun
  1 sibling, 1 reply; 10+ messages in thread
From: Daniel Vetter @ 2014-07-28 19:22 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On Mon, Jul 28, 2014 at 08:00:39PM +0300, Ville Syrjälä wrote:
> On Mon, Jul 28, 2014 at 05:31:46PM +0100, arun.siluvery@linux.intel.com wrote:
> > From: Arun Siluvery <arun.siluvery@linux.intel.com>
> > 
> > The workarounds at the moment are initialized in init_clock_gating() but
> > they are lost during reset; In case of execlists some workarounds modify
> > registers that are part of register state context, since these are not
> > initialized until init_clock_gating() default context ends up with
> > incorrect values as render context is restored and saved before updated
> > by workarounds hence move them to render ring init fn. This should be
> > ok as these workarounds are not related to display clock gating.
> > 
> > Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_debugfs.c     | 46 ++++++++++++++++++++++
> >  drivers/gpu/drm/i915/intel_pm.c         | 59 ----------------------------
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 68 +++++++++++++++++++++++++++++++++
> >  3 files changed, 114 insertions(+), 59 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > index 083683c..cf7da30 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -2397,20 +2397,65 @@ static int i915_shared_dplls_info(struct seq_file *m, void *unused)
> >  		seq_printf(m, " dpll_md: 0x%08x\n", pll->hw_state.dpll_md);
> >  		seq_printf(m, " fp0:     0x%08x\n", pll->hw_state.fp0);
> >  		seq_printf(m, " fp1:     0x%08x\n", pll->hw_state.fp1);
> >  		seq_printf(m, " wrpll:   0x%08x\n", pll->hw_state.wrpll);
> >  	}
> >  	drm_modeset_unlock_all(dev);
> >  
> >  	return 0;
> >  }
> >  
> > +static int i915_workaround_info(struct seq_file *m, void *unused)
> > +{
> > +	struct drm_info_node *node = (struct drm_info_node *) m->private;
> > +	struct drm_device *dev = node->minor->dev;
> > +	struct drm_i915_private *dev_priv = dev->dev_private;
> > +	int ret;
> > +
> > +	ret = mutex_lock_interruptible(&dev->struct_mutex);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (IS_BROADWELL(dev)) {
> > +		seq_printf(m, "GEN8_ROW_CHICKEN:\t0x%08x\n",
> > +			   I915_READ(GEN8_ROW_CHICKEN));
> > +		seq_printf(m, "HALF_SLICE_CHICKEN3:\t0x%08x\n",
> > +			   I915_READ(HALF_SLICE_CHICKEN3));
> > +		seq_printf(m, "GAMTARBMODE:\t0x%08x\n", I915_READ(GAMTARBMODE));
> > +		seq_printf(m, "_3D_CHICKEN3:\t0x%08x\n",
> > +			   I915_READ(_3D_CHICKEN3));
> > +		seq_printf(m, "COMMON_SLICE_CHICKEN2:\t0x%08x\n",
> > +			   I915_READ(COMMON_SLICE_CHICKEN2));
> > +		seq_printf(m, "GEN7_HALF_SLICE_CHICKEN1:\t0x%08x\n",
> > +			   I915_READ(GEN7_HALF_SLICE_CHICKEN1));
> > +		seq_printf(m, "GEN7_ROW_CHICKEN2:\t0x%08x\n",
> > +			   I915_READ(GEN7_ROW_CHICKEN2));
> > +		seq_printf(m, "GAM_ECOCHK:\t0x%08x\n",
> > +			   I915_READ(GAM_ECOCHK));
> > +		seq_printf(m, "HDC_CHICKEN0:\t0x%08x\n",
> > +			   I915_READ(HDC_CHICKEN0));
> > +		seq_printf(m, "GEN7_FF_THREAD_MODE:\t0x%08x\n",
> > +			   I915_READ(GEN7_FF_THREAD_MODE));
> > +		seq_printf(m, "GEN8_UCGCTL6:\t0x%08x\n",
> > +			   I915_READ(GEN8_UCGCTL6));
> > +		seq_printf(m, "GEN6_RC_SLEEP_PSMI_CONTROL:\t0x%08x\n",
> > +			   I915_READ(GEN6_RC_SLEEP_PSMI_CONTROL));
> > +		seq_printf(m, "CACHE_MODE_1:\t0x%08x\n",
> > +			   I915_READ(CACHE_MODE_1));
> > +	} else
> > +		DRM_DEBUG_DRIVER("Not available for Gen%d\n",
> > +				 INTEL_INFO(dev)->gen);
> > +
> > +	mutex_unlock(&dev->struct_mutex);
> > +	return 0;
> > +}
> > +
> 
> This smells like a separate patch. But I'm not sure we want at all since
> intel_reg_read will provide the same information.

Yeah, debugfs files that just do what intel_reg_read does are just an
additional maintaince burden. I know that we have a few that dump lots of
registers, but most of them dump a lot of other information, too.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] drm/i915/bdw: Initialize BDW workarounds in render ring init fn
  2014-07-28 19:22     ` Daniel Vetter
@ 2014-07-28 20:41       ` Siluvery, Arun
  0 siblings, 0 replies; 10+ messages in thread
From: Siluvery, Arun @ 2014-07-28 20:41 UTC (permalink / raw)
  To: Daniel Vetter, Ville Syrjälä; +Cc: intel-gfx

On 28/07/2014 20:22, Daniel Vetter wrote:
> On Mon, Jul 28, 2014 at 08:00:39PM +0300, Ville Syrjälä wrote:
>> On Mon, Jul 28, 2014 at 05:31:46PM +0100, arun.siluvery@linux.intel.com wrote:
>>> From: Arun Siluvery <arun.siluvery@linux.intel.com>
>>>
>>> The workarounds at the moment are initialized in init_clock_gating() but
>>> they are lost during reset; In case of execlists some workarounds modify
>>> registers that are part of register state context, since these are not
>>> initialized until init_clock_gating() default context ends up with
>>> incorrect values as render context is restored and saved before updated
>>> by workarounds hence move them to render ring init fn. This should be
>>> ok as these workarounds are not related to display clock gating.
>>>
>>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_debugfs.c     | 46 ++++++++++++++++++++++
>>>   drivers/gpu/drm/i915/intel_pm.c         | 59 ----------------------------
>>>   drivers/gpu/drm/i915/intel_ringbuffer.c | 68 +++++++++++++++++++++++++++++++++
>>>   3 files changed, 114 insertions(+), 59 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
>>> index 083683c..cf7da30 100644
>>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>>> @@ -2397,20 +2397,65 @@ static int i915_shared_dplls_info(struct seq_file *m, void *unused)
>>>   		seq_printf(m, " dpll_md: 0x%08x\n", pll->hw_state.dpll_md);
>>>   		seq_printf(m, " fp0:     0x%08x\n", pll->hw_state.fp0);
>>>   		seq_printf(m, " fp1:     0x%08x\n", pll->hw_state.fp1);
>>>   		seq_printf(m, " wrpll:   0x%08x\n", pll->hw_state.wrpll);
>>>   	}
>>>   	drm_modeset_unlock_all(dev);
>>>
>>>   	return 0;
>>>   }
>>>
>>> +static int i915_workaround_info(struct seq_file *m, void *unused)
>>> +{
>>> +	struct drm_info_node *node = (struct drm_info_node *) m->private;
>>> +	struct drm_device *dev = node->minor->dev;
>>> +	struct drm_i915_private *dev_priv = dev->dev_private;
>>> +	int ret;
>>> +
>>> +	ret = mutex_lock_interruptible(&dev->struct_mutex);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	if (IS_BROADWELL(dev)) {
>>> +		seq_printf(m, "GEN8_ROW_CHICKEN:\t0x%08x\n",
>>> +			   I915_READ(GEN8_ROW_CHICKEN));
>>> +		seq_printf(m, "HALF_SLICE_CHICKEN3:\t0x%08x\n",
>>> +			   I915_READ(HALF_SLICE_CHICKEN3));
>>> +		seq_printf(m, "GAMTARBMODE:\t0x%08x\n", I915_READ(GAMTARBMODE));
>>> +		seq_printf(m, "_3D_CHICKEN3:\t0x%08x\n",
>>> +			   I915_READ(_3D_CHICKEN3));
>>> +		seq_printf(m, "COMMON_SLICE_CHICKEN2:\t0x%08x\n",
>>> +			   I915_READ(COMMON_SLICE_CHICKEN2));
>>> +		seq_printf(m, "GEN7_HALF_SLICE_CHICKEN1:\t0x%08x\n",
>>> +			   I915_READ(GEN7_HALF_SLICE_CHICKEN1));
>>> +		seq_printf(m, "GEN7_ROW_CHICKEN2:\t0x%08x\n",
>>> +			   I915_READ(GEN7_ROW_CHICKEN2));
>>> +		seq_printf(m, "GAM_ECOCHK:\t0x%08x\n",
>>> +			   I915_READ(GAM_ECOCHK));
>>> +		seq_printf(m, "HDC_CHICKEN0:\t0x%08x\n",
>>> +			   I915_READ(HDC_CHICKEN0));
>>> +		seq_printf(m, "GEN7_FF_THREAD_MODE:\t0x%08x\n",
>>> +			   I915_READ(GEN7_FF_THREAD_MODE));
>>> +		seq_printf(m, "GEN8_UCGCTL6:\t0x%08x\n",
>>> +			   I915_READ(GEN8_UCGCTL6));
>>> +		seq_printf(m, "GEN6_RC_SLEEP_PSMI_CONTROL:\t0x%08x\n",
>>> +			   I915_READ(GEN6_RC_SLEEP_PSMI_CONTROL));
>>> +		seq_printf(m, "CACHE_MODE_1:\t0x%08x\n",
>>> +			   I915_READ(CACHE_MODE_1));
>>> +	} else
>>> +		DRM_DEBUG_DRIVER("Not available for Gen%d\n",
>>> +				 INTEL_INFO(dev)->gen);
>>> +
>>> +	mutex_unlock(&dev->struct_mutex);
>>> +	return 0;
>>> +}
>>> +
>>
>> This smells like a separate patch. But I'm not sure we want at all since
>> intel_reg_read will provide the same information.
>
> Yeah, debugfs files that just do what intel_reg_read does are just an
> additional maintaince burden. I know that we have a few that dump lots of
> registers, but most of them dump a lot of other information, too.
> -Daniel
>

I've added this mainly for testing workarounds which can be extended 
further as we move WAs for other chipsets but I agree it can be done 
with intel_reg_read.

regards
Arun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] drm/i915/bdw: Initialize BDW workarounds in render ring init fn
  2014-07-28 17:00   ` Ville Syrjälä
  2014-07-28 19:22     ` Daniel Vetter
@ 2014-07-28 20:46     ` Siluvery, Arun
  2014-07-29  9:56       ` Ville Syrjälä
  1 sibling, 1 reply; 10+ messages in thread
From: Siluvery, Arun @ 2014-07-28 20:46 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On 28/07/2014 18:00, Ville Syrjälä wrote:
> On Mon, Jul 28, 2014 at 05:31:46PM +0100, arun.siluvery@linux.intel.com wrote:
>> From: Arun Siluvery <arun.siluvery@linux.intel.com>
>>
>> The workarounds at the moment are initialized in init_clock_gating() but
>> they are lost during reset; In case of execlists some workarounds modify
>> registers that are part of register state context, since these are not
>> initialized until init_clock_gating() default context ends up with
>> incorrect values as render context is restored and saved before updated
>> by workarounds hence move them to render ring init fn. This should be
>> ok as these workarounds are not related to display clock gating.
>>
>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_debugfs.c     | 46 ++++++++++++++++++++++
>>   drivers/gpu/drm/i915/intel_pm.c         | 59 ----------------------------
>>   drivers/gpu/drm/i915/intel_ringbuffer.c | 68 +++++++++++++++++++++++++++++++++
>>   3 files changed, 114 insertions(+), 59 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
>> index 083683c..cf7da30 100644
>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>> @@ -2397,20 +2397,65 @@ static int i915_shared_dplls_info(struct seq_file *m, void *unused)
>>   		seq_printf(m, " dpll_md: 0x%08x\n", pll->hw_state.dpll_md);
>>   		seq_printf(m, " fp0:     0x%08x\n", pll->hw_state.fp0);
>>   		seq_printf(m, " fp1:     0x%08x\n", pll->hw_state.fp1);
>>   		seq_printf(m, " wrpll:   0x%08x\n", pll->hw_state.wrpll);
>>   	}
>>   	drm_modeset_unlock_all(dev);
>>
>>   	return 0;
>>   }
>>
>> +static int i915_workaround_info(struct seq_file *m, void *unused)
>> +{
>> +	struct drm_info_node *node = (struct drm_info_node *) m->private;
>> +	struct drm_device *dev = node->minor->dev;
>> +	struct drm_i915_private *dev_priv = dev->dev_private;
>> +	int ret;
>> +
>> +	ret = mutex_lock_interruptible(&dev->struct_mutex);
>> +	if (ret)
>> +		return ret;
>> +
>> +	if (IS_BROADWELL(dev)) {
>> +		seq_printf(m, "GEN8_ROW_CHICKEN:\t0x%08x\n",
>> +			   I915_READ(GEN8_ROW_CHICKEN));
>> +		seq_printf(m, "HALF_SLICE_CHICKEN3:\t0x%08x\n",
>> +			   I915_READ(HALF_SLICE_CHICKEN3));
>> +		seq_printf(m, "GAMTARBMODE:\t0x%08x\n", I915_READ(GAMTARBMODE));
>> +		seq_printf(m, "_3D_CHICKEN3:\t0x%08x\n",
>> +			   I915_READ(_3D_CHICKEN3));
>> +		seq_printf(m, "COMMON_SLICE_CHICKEN2:\t0x%08x\n",
>> +			   I915_READ(COMMON_SLICE_CHICKEN2));
>> +		seq_printf(m, "GEN7_HALF_SLICE_CHICKEN1:\t0x%08x\n",
>> +			   I915_READ(GEN7_HALF_SLICE_CHICKEN1));
>> +		seq_printf(m, "GEN7_ROW_CHICKEN2:\t0x%08x\n",
>> +			   I915_READ(GEN7_ROW_CHICKEN2));
>> +		seq_printf(m, "GAM_ECOCHK:\t0x%08x\n",
>> +			   I915_READ(GAM_ECOCHK));
>> +		seq_printf(m, "HDC_CHICKEN0:\t0x%08x\n",
>> +			   I915_READ(HDC_CHICKEN0));
>> +		seq_printf(m, "GEN7_FF_THREAD_MODE:\t0x%08x\n",
>> +			   I915_READ(GEN7_FF_THREAD_MODE));
>> +		seq_printf(m, "GEN8_UCGCTL6:\t0x%08x\n",
>> +			   I915_READ(GEN8_UCGCTL6));
>> +		seq_printf(m, "GEN6_RC_SLEEP_PSMI_CONTROL:\t0x%08x\n",
>> +			   I915_READ(GEN6_RC_SLEEP_PSMI_CONTROL));
>> +		seq_printf(m, "CACHE_MODE_1:\t0x%08x\n",
>> +			   I915_READ(CACHE_MODE_1));
>> +	} else
>> +		DRM_DEBUG_DRIVER("Not available for Gen%d\n",
>> +				 INTEL_INFO(dev)->gen);
>> +
>> +	mutex_unlock(&dev->struct_mutex);
>> +	return 0;
>> +}
>> +
>
> This smells like a separate patch. But I'm not sure we want at all since
> intel_reg_read will provide the same information.
>
>>   struct pipe_crc_info {
>>   	const char *name;
>>   	struct drm_device *dev;
>>   	enum pipe pipe;
>>   };
>>
>>   static int i915_pipe_crc_open(struct inode *inode, struct file *filep)
>>   {
>>   	struct pipe_crc_info *info = inode->i_private;
>>   	struct drm_i915_private *dev_priv = info->dev->dev_private;
>> @@ -3904,20 +3949,21 @@ static const struct drm_info_list i915_debugfs_list[] = {
>>   	{"i915_ppgtt_info", i915_ppgtt_info, 0},
>>   	{"i915_llc", i915_llc, 0},
>>   	{"i915_edp_psr_status", i915_edp_psr_status, 0},
>>   	{"i915_sink_crc_eDP1", i915_sink_crc, 0},
>>   	{"i915_energy_uJ", i915_energy_uJ, 0},
>>   	{"i915_pc8_status", i915_pc8_status, 0},
>>   	{"i915_power_domain_info", i915_power_domain_info, 0},
>>   	{"i915_display_info", i915_display_info, 0},
>>   	{"i915_semaphore_status", i915_semaphore_status, 0},
>>   	{"i915_shared_dplls_info", i915_shared_dplls_info, 0},
>> +	{"i915_workaround_info", i915_workaround_info, 0},
>>   };
>>   #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list)
>>
>>   static const struct i915_debugfs_files {
>>   	const char *name;
>>   	const struct file_operations *fops;
>>   } i915_debugfs_files[] = {
>>   	{"i915_wedged", &i915_wedged_fops},
>>   	{"i915_max_freq", &i915_max_freq_fops},
>>   	{"i915_min_freq", &i915_min_freq_fops},
>> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
>> index 3f88f29..9597f95 100644
>> --- a/drivers/gpu/drm/i915/intel_pm.c
>> +++ b/drivers/gpu/drm/i915/intel_pm.c
>> @@ -5393,101 +5393,42 @@ static void gen8_init_clock_gating(struct drm_device *dev)
>>   	struct drm_i915_private *dev_priv = dev->dev_private;
>>   	enum pipe pipe;
>>
>>   	I915_WRITE(WM3_LP_ILK, 0);
>>   	I915_WRITE(WM2_LP_ILK, 0);
>>   	I915_WRITE(WM1_LP_ILK, 0);
>>
>>   	/* FIXME(BDW): Check all the w/a, some might only apply to
>>   	 * pre-production hw. */
>>
>> -	/* WaDisablePartialInstShootdown:bdw */
>> -	I915_WRITE(GEN8_ROW_CHICKEN,
>> -		   _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE));
>> -
>> -	/* WaDisableThreadStallDopClockGating:bdw */
>> -	/* FIXME: Unclear whether we really need this on production bdw. */
>> -	I915_WRITE(GEN8_ROW_CHICKEN,
>> -		   _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE));
>> -
>> -	/*
>> -	 * This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for
>> -	 * pre-production hardware
>> -	 */
>> -	I915_WRITE(HALF_SLICE_CHICKEN3,
>> -		   _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS));
>> -	I915_WRITE(HALF_SLICE_CHICKEN3,
>> -		   _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS));
>> -	I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_BWGTLB_DISABLE));
>> -
>> -	I915_WRITE(_3D_CHICKEN3,
>> -		   _MASKED_BIT_ENABLE(_3D_CHICKEN_SDE_LIMIT_FIFO_POLY_DEPTH(2)));
>> -
>> -	I915_WRITE(COMMON_SLICE_CHICKEN2,
>> -		   _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE));
>> -
>> -	I915_WRITE(GEN7_HALF_SLICE_CHICKEN1,
>> -		   _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE));
>> -
>> -	/* WaDisableDopClockGating:bdw May not be needed for production */
>> -	I915_WRITE(GEN7_ROW_CHICKEN2,
>> -		   _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE));
>> -
>> -	/* WaSwitchSolVfFArbitrationPriority:bdw */
>> -	I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL);
>>
>>   	/* WaPsrDPAMaskVBlankInSRD:bdw */
>>   	I915_WRITE(CHICKEN_PAR1_1,
>>   		   I915_READ(CHICKEN_PAR1_1) | DPA_MASK_VBLANK_SRD);
>>
>>   	/* WaPsrDPRSUnmaskVBlankInSRD:bdw */
>>   	for_each_pipe(pipe) {
>>   		I915_WRITE(CHICKEN_PIPESL_1(pipe),
>>   			   I915_READ(CHICKEN_PIPESL_1(pipe)) |
>>   			   BDW_DPRS_MASK_VBLANK_SRD);
>>   	}
>>
>> -	/* Use Force Non-Coherent whenever executing a 3D context. This is a
>> -	 * workaround for for a possible hang in the unlikely event a TLB
>> -	 * invalidation occurs during a PSD flush.
>> -	 */
>> -	I915_WRITE(HDC_CHICKEN0,
>> -		   I915_READ(HDC_CHICKEN0) |
>> -		   _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT));
>> -
>> -	/* WaVSRefCountFullforceMissDisable:bdw */
>> -	/* WaDSRefCountFullforceMissDisable:bdw */
>> -	I915_WRITE(GEN7_FF_THREAD_MODE,
>> -		   I915_READ(GEN7_FF_THREAD_MODE) &
>> -		   ~(GEN8_FF_DS_REF_CNT_FFME | GEN7_FF_VS_REF_CNT_FFME));
>> -
>>   	/*
>>   	 * BSpec recommends 8x4 when MSAA is used,
>>   	 * however in practice 16x4 seems fastest.
>>   	 *
>>   	 * Note that PS/WM thread counts depend on the WIZ hashing
>>   	 * disable bit, which we don't touch here, but it's good
>>   	 * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
>>   	 */
>>   	I915_WRITE(GEN7_GT_MODE,
>>   		   GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);
>
> GT_MODE doesn't get clobbered by a reset? I would have expected it to
> behave the same way as CACHE_MODE_{0,1).
>
>> -
>> -	I915_WRITE(GEN6_RC_SLEEP_PSMI_CONTROL,
>> -		   _MASKED_BIT_ENABLE(GEN8_RC_SEMA_IDLE_MSG_DISABLE));
>> -
>> -	/* WaDisableSDEUnitClockGating:bdw */
>> -	I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) |
>> -		   GEN8_SDEUNIT_CLOCK_GATE_DISABLE);
>
> And I would have expected the UCGCTL registers to not be clobbered. IIRC
> they didn't get clobbered on my IVB, but my memory might be wrong here.

I could complete testing because of the problem with CACHE_MODE_1, will 
double check these registers and move/unmove them accordingly.

>
>> -
>> -	/* Wa4x4STCOptimizationDisable:bdw */
>> -	I915_WRITE(CACHE_MODE_1,
>> -		   _MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE));
>>   }
>>
>>   static void haswell_init_clock_gating(struct drm_device *dev)
>>   {
>>   	struct drm_i915_private *dev_priv = dev->dev_private;
>>
>>   	ilk_init_lp_watermarks(dev);
>>
>>   	/* L3 caching of data atomics doesn't work -- disable it. */
>>   	I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE);
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index b3d8f76..6818e34 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -591,20 +591,85 @@ init_pipe_control(struct intel_engine_cs *ring)
>>   	return 0;
>>
>>   err_unpin:
>>   	i915_gem_object_ggtt_unpin(ring->scratch.obj);
>>   err_unref:
>>   	drm_gem_object_unreference(&ring->scratch.obj->base);
>>   err:
>>   	return ret;
>>   }
>>
>> +static void bdw_init_workarounds(struct drm_device *dev)
>> +{
>> +	struct drm_i915_private *dev_priv = dev->dev_private;
>> +
>> +	/* WaDisablePartialInstShootdown:bdw */
>> +	I915_WRITE(GEN8_ROW_CHICKEN,
>> +		   _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE));
>> +
>> +	/* WaDisableThreadStallDopClockGating:bdw */
>> +	/* FIXME: Unclear whether we really need these on production bdw. */
>> +	I915_WRITE(GEN8_ROW_CHICKEN,
>> +		   _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE));
>> +
>> +	/*
>> +	 * This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for
>> +	 * pre-production hardware
>> +	 */
>> +	I915_WRITE(HALF_SLICE_CHICKEN3,
>> +		   _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS));
>> +	I915_WRITE(HALF_SLICE_CHICKEN3,
>> +		   _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS));
>> +	I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_BWGTLB_DISABLE));
>> +
>> +	I915_WRITE(_3D_CHICKEN3,
>> +		   _MASKED_BIT_ENABLE(_3D_CHICKEN_SDE_LIMIT_FIFO_POLY_DEPTH(2)));
>> +
>> +	I915_WRITE(COMMON_SLICE_CHICKEN2,
>> +		   _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE));
>> +
>> +	I915_WRITE(GEN7_HALF_SLICE_CHICKEN1,
>> +		   _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE));
>> +
>> +	/* WaDisableDopClockGating:bdw May not be needed for production */
>> +	I915_WRITE(GEN7_ROW_CHICKEN2,
>> +		   _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE));
>> +
>> +	/* WaSwitchSolVfFArbitrationPriority:bdw */
>> +	I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL);
>> +
>> +	/* Use Force Non-Coherent whenever executing a 3D context. This is a
>> +	 * workaround for for a possible hang in the unlikely event a TLB
>> +	 * invalidation occurs during a PSD flush.
>> +	 */
>> +	I915_WRITE(HDC_CHICKEN0,
>> +		   I915_READ(HDC_CHICKEN0) |
>
> You might kill this read since it's a masked register. Maybe as a
> separate prep-patch.
>
>> +		   _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT));
>> +
>> +	/* WaVSRefCountFullforceMissDisable:bdw */
>> +	/* WaDSRefCountFullforceMissDisable:bdw */
>> +	I915_WRITE(GEN7_FF_THREAD_MODE,
>> +		   I915_READ(GEN7_FF_THREAD_MODE) &
>> +		   ~(GEN8_FF_DS_REF_CNT_FFME | GEN7_FF_VS_REF_CNT_FFME));
>> +
>> +	/* WaDisableSDEUnitClockGating:bdw */
>> +	I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) |
>> +		   GEN8_SDEUNIT_CLOCK_GATE_DISABLE);
>> +
>> +	I915_WRITE(GEN6_RC_SLEEP_PSMI_CONTROL,
>> +		   _MASKED_BIT_ENABLE(GEN8_RC_SEMA_IDLE_MSG_DISABLE));
>> +
>> +	/* Wa4x4STCOptimizationDisable:bdw */
>> +	I915_WRITE(CACHE_MODE_1,
>> +		   _MASKED_BIT_ENABLE(GEN8_4x4_STC_OPTIMIZATION_DISABLE));
>
> I think it would be nice to reorder this function a bit to keep similar registers
> close to each other. The current mismash makes it easy to overlook a
> workaround and then we may end up with duplicates (wouldn't be the first
> time that happened).
>
sure, I will take care of this in the next version.

>> +}
>> +
>>   static int init_render_ring(struct intel_engine_cs *ring)
>>   {
>>   	struct drm_device *dev = ring->dev;
>>   	struct drm_i915_private *dev_priv = dev->dev_private;
>>   	int ret = init_ring_common(ring);
>>   	if (ret)
>>   		return ret;
>>
>>   	/* WaTimedSingleVertexDispatch:cl,bw,ctg,elk,ilk,snb */
>>   	if (INTEL_INFO(dev)->gen >= 4 && INTEL_INFO(dev)->gen < 7)
>> @@ -646,20 +711,23 @@ static int init_render_ring(struct intel_engine_cs *ring)
>>   		I915_WRITE(CACHE_MODE_0,
>>   			   _MASKED_BIT_DISABLE(CM0_STC_EVICT_DISABLE_LRA_SNB));
>>   	}
>>
>>   	if (INTEL_INFO(dev)->gen >= 6)
>>   		I915_WRITE(INSTPM, _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING));
>>
>>   	if (HAS_L3_DPF(dev))
>>   		I915_WRITE_IMR(ring, ~GT_PARITY_ERROR(dev));
>>
>> +	if (IS_BROADWELL(dev))
>> +		bdw_init_workarounds(dev);
>> +
>>   	return ret;
>>   }
>>
>>   static void render_ring_cleanup(struct intel_engine_cs *ring)
>>   {
>>   	struct drm_device *dev = ring->dev;
>>   	struct drm_i915_private *dev_priv = dev->dev_private;
>>
>>   	if (dev_priv->semaphore_obj) {
>>   		i915_gem_object_ggtt_unpin(dev_priv->semaphore_obj);
>> --
>> 1.9.2
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] drm/i915/bdw: Initialize BDW workarounds in render ring init fn
  2014-07-28 20:46     ` Siluvery, Arun
@ 2014-07-29  9:56       ` Ville Syrjälä
  0 siblings, 0 replies; 10+ messages in thread
From: Ville Syrjälä @ 2014-07-29  9:56 UTC (permalink / raw)
  To: Siluvery, Arun; +Cc: intel-gfx

On Mon, Jul 28, 2014 at 09:46:33PM +0100, Siluvery, Arun wrote:
> On 28/07/2014 18:00, Ville Syrjälä wrote:
> > On Mon, Jul 28, 2014 at 05:31:46PM +0100, arun.siluvery@linux.intel.com wrote:
> >> From: Arun Siluvery <arun.siluvery@linux.intel.com>
> >>
> >> The workarounds at the moment are initialized in init_clock_gating() but
> >> they are lost during reset; In case of execlists some workarounds modify
> >> registers that are part of register state context, since these are not
> >> initialized until init_clock_gating() default context ends up with
> >> incorrect values as render context is restored and saved before updated
> >> by workarounds hence move them to render ring init fn. This should be
> >> ok as these workarounds are not related to display clock gating.
> >>
> >> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/i915_debugfs.c     | 46 ++++++++++++++++++++++
> >>   drivers/gpu/drm/i915/intel_pm.c         | 59 ----------------------------
> >>   drivers/gpu/drm/i915/intel_ringbuffer.c | 68 +++++++++++++++++++++++++++++++++
> >>   3 files changed, 114 insertions(+), 59 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> >> index 083683c..cf7da30 100644
> >> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> >> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> >> @@ -2397,20 +2397,65 @@ static int i915_shared_dplls_info(struct seq_file *m, void *unused)
> >>   		seq_printf(m, " dpll_md: 0x%08x\n", pll->hw_state.dpll_md);
> >>   		seq_printf(m, " fp0:     0x%08x\n", pll->hw_state.fp0);
> >>   		seq_printf(m, " fp1:     0x%08x\n", pll->hw_state.fp1);
> >>   		seq_printf(m, " wrpll:   0x%08x\n", pll->hw_state.wrpll);
> >>   	}
> >>   	drm_modeset_unlock_all(dev);
> >>
> >>   	return 0;
> >>   }
> >>
> >> +static int i915_workaround_info(struct seq_file *m, void *unused)
> >> +{
> >> +	struct drm_info_node *node = (struct drm_info_node *) m->private;
> >> +	struct drm_device *dev = node->minor->dev;
> >> +	struct drm_i915_private *dev_priv = dev->dev_private;
> >> +	int ret;
> >> +
> >> +	ret = mutex_lock_interruptible(&dev->struct_mutex);
> >> +	if (ret)
> >> +		return ret;
> >> +
> >> +	if (IS_BROADWELL(dev)) {
> >> +		seq_printf(m, "GEN8_ROW_CHICKEN:\t0x%08x\n",
> >> +			   I915_READ(GEN8_ROW_CHICKEN));
> >> +		seq_printf(m, "HALF_SLICE_CHICKEN3:\t0x%08x\n",
> >> +			   I915_READ(HALF_SLICE_CHICKEN3));
> >> +		seq_printf(m, "GAMTARBMODE:\t0x%08x\n", I915_READ(GAMTARBMODE));
> >> +		seq_printf(m, "_3D_CHICKEN3:\t0x%08x\n",
> >> +			   I915_READ(_3D_CHICKEN3));
> >> +		seq_printf(m, "COMMON_SLICE_CHICKEN2:\t0x%08x\n",
> >> +			   I915_READ(COMMON_SLICE_CHICKEN2));
> >> +		seq_printf(m, "GEN7_HALF_SLICE_CHICKEN1:\t0x%08x\n",
> >> +			   I915_READ(GEN7_HALF_SLICE_CHICKEN1));
> >> +		seq_printf(m, "GEN7_ROW_CHICKEN2:\t0x%08x\n",
> >> +			   I915_READ(GEN7_ROW_CHICKEN2));
> >> +		seq_printf(m, "GAM_ECOCHK:\t0x%08x\n",
> >> +			   I915_READ(GAM_ECOCHK));
> >> +		seq_printf(m, "HDC_CHICKEN0:\t0x%08x\n",
> >> +			   I915_READ(HDC_CHICKEN0));
> >> +		seq_printf(m, "GEN7_FF_THREAD_MODE:\t0x%08x\n",
> >> +			   I915_READ(GEN7_FF_THREAD_MODE));
> >> +		seq_printf(m, "GEN8_UCGCTL6:\t0x%08x\n",
> >> +			   I915_READ(GEN8_UCGCTL6));
> >> +		seq_printf(m, "GEN6_RC_SLEEP_PSMI_CONTROL:\t0x%08x\n",
> >> +			   I915_READ(GEN6_RC_SLEEP_PSMI_CONTROL));
> >> +		seq_printf(m, "CACHE_MODE_1:\t0x%08x\n",
> >> +			   I915_READ(CACHE_MODE_1));
> >> +	} else
> >> +		DRM_DEBUG_DRIVER("Not available for Gen%d\n",
> >> +				 INTEL_INFO(dev)->gen);
> >> +
> >> +	mutex_unlock(&dev->struct_mutex);
> >> +	return 0;
> >> +}
> >> +
> >
> > This smells like a separate patch. But I'm not sure we want at all since
> > intel_reg_read will provide the same information.
> >
> >>   struct pipe_crc_info {
> >>   	const char *name;
> >>   	struct drm_device *dev;
> >>   	enum pipe pipe;
> >>   };
> >>
> >>   static int i915_pipe_crc_open(struct inode *inode, struct file *filep)
> >>   {
> >>   	struct pipe_crc_info *info = inode->i_private;
> >>   	struct drm_i915_private *dev_priv = info->dev->dev_private;
> >> @@ -3904,20 +3949,21 @@ static const struct drm_info_list i915_debugfs_list[] = {
> >>   	{"i915_ppgtt_info", i915_ppgtt_info, 0},
> >>   	{"i915_llc", i915_llc, 0},
> >>   	{"i915_edp_psr_status", i915_edp_psr_status, 0},
> >>   	{"i915_sink_crc_eDP1", i915_sink_crc, 0},
> >>   	{"i915_energy_uJ", i915_energy_uJ, 0},
> >>   	{"i915_pc8_status", i915_pc8_status, 0},
> >>   	{"i915_power_domain_info", i915_power_domain_info, 0},
> >>   	{"i915_display_info", i915_display_info, 0},
> >>   	{"i915_semaphore_status", i915_semaphore_status, 0},
> >>   	{"i915_shared_dplls_info", i915_shared_dplls_info, 0},
> >> +	{"i915_workaround_info", i915_workaround_info, 0},
> >>   };
> >>   #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list)
> >>
> >>   static const struct i915_debugfs_files {
> >>   	const char *name;
> >>   	const struct file_operations *fops;
> >>   } i915_debugfs_files[] = {
> >>   	{"i915_wedged", &i915_wedged_fops},
> >>   	{"i915_max_freq", &i915_max_freq_fops},
> >>   	{"i915_min_freq", &i915_min_freq_fops},
> >> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> >> index 3f88f29..9597f95 100644
> >> --- a/drivers/gpu/drm/i915/intel_pm.c
> >> +++ b/drivers/gpu/drm/i915/intel_pm.c
> >> @@ -5393,101 +5393,42 @@ static void gen8_init_clock_gating(struct drm_device *dev)
> >>   	struct drm_i915_private *dev_priv = dev->dev_private;
> >>   	enum pipe pipe;
> >>
> >>   	I915_WRITE(WM3_LP_ILK, 0);
> >>   	I915_WRITE(WM2_LP_ILK, 0);
> >>   	I915_WRITE(WM1_LP_ILK, 0);
> >>
> >>   	/* FIXME(BDW): Check all the w/a, some might only apply to
> >>   	 * pre-production hw. */
> >>
> >> -	/* WaDisablePartialInstShootdown:bdw */
> >> -	I915_WRITE(GEN8_ROW_CHICKEN,
> >> -		   _MASKED_BIT_ENABLE(PARTIAL_INSTRUCTION_SHOOTDOWN_DISABLE));
> >> -
> >> -	/* WaDisableThreadStallDopClockGating:bdw */
> >> -	/* FIXME: Unclear whether we really need this on production bdw. */
> >> -	I915_WRITE(GEN8_ROW_CHICKEN,
> >> -		   _MASKED_BIT_ENABLE(STALL_DOP_GATING_DISABLE));
> >> -
> >> -	/*
> >> -	 * This GEN8_CENTROID_PIXEL_OPT_DIS W/A is only needed for
> >> -	 * pre-production hardware
> >> -	 */
> >> -	I915_WRITE(HALF_SLICE_CHICKEN3,
> >> -		   _MASKED_BIT_ENABLE(GEN8_CENTROID_PIXEL_OPT_DIS));
> >> -	I915_WRITE(HALF_SLICE_CHICKEN3,
> >> -		   _MASKED_BIT_ENABLE(GEN8_SAMPLER_POWER_BYPASS_DIS));
> >> -	I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_BWGTLB_DISABLE));
> >> -
> >> -	I915_WRITE(_3D_CHICKEN3,
> >> -		   _MASKED_BIT_ENABLE(_3D_CHICKEN_SDE_LIMIT_FIFO_POLY_DEPTH(2)));
> >> -
> >> -	I915_WRITE(COMMON_SLICE_CHICKEN2,
> >> -		   _MASKED_BIT_ENABLE(GEN8_CSC2_SBE_VUE_CACHE_CONSERVATIVE));
> >> -
> >> -	I915_WRITE(GEN7_HALF_SLICE_CHICKEN1,
> >> -		   _MASKED_BIT_ENABLE(GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE));
> >> -
> >> -	/* WaDisableDopClockGating:bdw May not be needed for production */
> >> -	I915_WRITE(GEN7_ROW_CHICKEN2,
> >> -		   _MASKED_BIT_ENABLE(DOP_CLOCK_GATING_DISABLE));
> >> -
> >> -	/* WaSwitchSolVfFArbitrationPriority:bdw */
> >> -	I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL);
> >>
> >>   	/* WaPsrDPAMaskVBlankInSRD:bdw */
> >>   	I915_WRITE(CHICKEN_PAR1_1,
> >>   		   I915_READ(CHICKEN_PAR1_1) | DPA_MASK_VBLANK_SRD);
> >>
> >>   	/* WaPsrDPRSUnmaskVBlankInSRD:bdw */
> >>   	for_each_pipe(pipe) {
> >>   		I915_WRITE(CHICKEN_PIPESL_1(pipe),
> >>   			   I915_READ(CHICKEN_PIPESL_1(pipe)) |
> >>   			   BDW_DPRS_MASK_VBLANK_SRD);
> >>   	}
> >>
> >> -	/* Use Force Non-Coherent whenever executing a 3D context. This is a
> >> -	 * workaround for for a possible hang in the unlikely event a TLB
> >> -	 * invalidation occurs during a PSD flush.
> >> -	 */
> >> -	I915_WRITE(HDC_CHICKEN0,
> >> -		   I915_READ(HDC_CHICKEN0) |
> >> -		   _MASKED_BIT_ENABLE(HDC_FORCE_NON_COHERENT));
> >> -
> >> -	/* WaVSRefCountFullforceMissDisable:bdw */
> >> -	/* WaDSRefCountFullforceMissDisable:bdw */
> >> -	I915_WRITE(GEN7_FF_THREAD_MODE,
> >> -		   I915_READ(GEN7_FF_THREAD_MODE) &
> >> -		   ~(GEN8_FF_DS_REF_CNT_FFME | GEN7_FF_VS_REF_CNT_FFME));
> >> -
> >>   	/*
> >>   	 * BSpec recommends 8x4 when MSAA is used,
> >>   	 * however in practice 16x4 seems fastest.
> >>   	 *
> >>   	 * Note that PS/WM thread counts depend on the WIZ hashing
> >>   	 * disable bit, which we don't touch here, but it's good
> >>   	 * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
> >>   	 */
> >>   	I915_WRITE(GEN7_GT_MODE,
> >>   		   GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);
> >
> > GT_MODE doesn't get clobbered by a reset? I would have expected it to
> > behave the same way as CACHE_MODE_{0,1).
> >
> >> -
> >> -	I915_WRITE(GEN6_RC_SLEEP_PSMI_CONTROL,
> >> -		   _MASKED_BIT_ENABLE(GEN8_RC_SEMA_IDLE_MSG_DISABLE));
> >> -
> >> -	/* WaDisableSDEUnitClockGating:bdw */
> >> -	I915_WRITE(GEN8_UCGCTL6, I915_READ(GEN8_UCGCTL6) |
> >> -		   GEN8_SDEUNIT_CLOCK_GATE_DISABLE);
> >
> > And I would have expected the UCGCTL registers to not be clobbered. IIRC
> > they didn't get clobbered on my IVB, but my memory might be wrong here.
> 
> I could complete testing because of the problem with CACHE_MODE_1, will 
> double check these registers and move/unmove them accordingly.

I would try to identify the approriate regiters by simply using
intel_reg_write to first write some non-reset value to the suspect
register(s), then perform GPU reset (w/ intel_reg_write as well)
and afterwards check if the register value(s) changed.

As long as there are no drm clients around that should work out
nicely. It would avoid uncertainty about context saved registers
getting restored from the context at "random" times, and also
avoids the kernel's hangcheck getting in the way.

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] Move BDW workarounds to ring init fn
  2014-07-28 17:26 ` [RFC] Move BDW workarounds to " Ville Syrjälä
@ 2014-07-29 22:27   ` Siluvery, Arun
  2014-07-30  9:36     ` Ville Syrjälä
  0 siblings, 1 reply; 10+ messages in thread
From: Siluvery, Arun @ 2014-07-29 22:27 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On 28/07/2014 18:26, Ville Syrjälä wrote:
> On Mon, Jul 28, 2014 at 05:31:45PM +0100, arun.siluvery@linux.intel.com wrote:
>> From: Arun Siluvery <arun.siluvery@linux.intel.com>
>>
>> This patch moves BDW workarounds from init_clock_gating() to render ring
>> init fn otherwise they are lost when gpu is reset.
>> In case of execlists, some of the workarounds modify registers that are
>> part of register state context which doesn't get initialized until
>> init_clock_gating(); this results in default context with incorrect values
>> as it is restored and saved before updated by workarounds.
>
> I don't think it has to do with execlists. Many of the registers are
> part of the context image even in ring buffer mode AFAIK.
>
>>
>> Open issue:
>> For Wa4x4STCOptimizationDisable, we set CACHE_MODE_1[6:6] = 1
>> At the time when HW contexts are enabled after rings are initialized with
>> default context this workaround is valid but followed by a context switch
>> this is getting reset, please see below log snippet.
>
> This is a bit weird. The default context should have restore inhibit==1
> so it shouldn't clobber the CACHE_MODE_1 register. There was a specific magic
> dance you're supposed to do when accessing such registers with mmio, but here
> we do the write even before the first context switch.
>
> Apparently there was some kind of problem with CACHE_MODE_0 on snb too:
>   commit 3a69ddd6f872180b6f61fda87152b37202118fbc
>   Author: Kenneth Graunke <kenneth@whitecape.org>
>   Date:   Fri Apr 27 12:44:41 2012 -0700
>
>      drm/i915: Set the Stencil Cache eviction policy to non-LRA mode.
>
> but IIRC I wasn't able to reproduce it when I tried.

Similar to this register I am also applying this in render ring init fn.

>
> Maybe we need to delay these register writes until we've switched to the default
> context?
>
In its current state (WAs applied in init_clock_gating()) we are writing 
these registers after switching to default context.

When a new hw context is created does all the registers part of context 
start with default values or they sample the current state? and at what 
point this sampling takes place?

As a test I have updated CACHE_MODE_1 after mi_set_context() then the 
workaround was valid with every context switch but I think it may not be 
the right way otherwise we will have to update other WA registers also 
at this point with every context switch.

regards
Arun

>>
>> ...
>> [    5.978209] [drm:i915_pages_create_for_stolen] offset=0x0, size=8294400
>> [    5.978213] [drm:intel_alloc_plane_obj] plane fb obj ffff8801472e0000
>> [    5.978215] [drm:i915_gem_setup_global_gtt] reserving preallocated space: 0 + 7e9000
>> [    5.978216] [drm:i915_gem_setup_global_gtt] clearing unused GTT space: [7e9000, fffff000]
>> [    5.979613] [drm:i915_gem_init] CACHE_MODE_1: 0x00000180
>> [    5.981372] [drm:gen8_ppgtt_init] Allocated 4 pages for page directories (0 wasted)
>> [    5.981373] [drm:gen8_ppgtt_init] Allocated 2048 pages for page tables (0 wasted)
>> [    5.981376] [drm:i915_gem_context_init] HW context support initialized
>> [    5.981462] [drm:i915_gem_init_hw] CACHE_MODE_1: 0x00000180
>> [    5.981467] [drm:i915_gem_init_rings] CACHE_MODE_1: 0x00000180
>> [    5.981704] [drm:bdw_init_workarounds] CACHE_MODE_1: 0x000001C0
>> [    5.981716] [drm:init_status_page] bsd ring hws offset: 0x0081e000
>> [    5.981792] [drm:init_status_page] blitter ring hws offset: 0x0083f000
>> [    5.981910] [drm:init_status_page] video enhancement ring hws offset: 0x00860000
>> [    5.982001] [drm:i915_gem_init_hw] CACHE_MODE_1: 0x000001C0
>> [    5.982104] [drm:i915_gem_context_enable] Switch render ring to default_context
>> [    5.982106] [drm:i915_gem_render_state_init] render ring: Render state init
>> [    5.982120] [drm:do_switch] render ring, CACHE_MODE_1: 0x000001C0, uninitialized: 1
>> [    5.982121] [drm:i915_gem_context_enable] Switch bsd ring to default_context
>> [    5.982122] [drm:do_switch] bsd ring, CACHE_MODE_1: 0x000001C0, uninitialized: 0
>> [    5.982123] [drm:i915_gem_context_enable] Switch blitter ring to default_context
>> [    5.982126] [drm:do_switch] blitter ring, CACHE_MODE_1: 0x000001C0, uninitialized: 0
>> [    5.982126] [drm:i915_gem_context_enable] Switch video enhancement ring to default_context
>> [    5.982128] [drm:do_switch] video enhancement ring, CACHE_MODE_1: 0x000001C0, uninitialized: 0
>> [    5.982133] [drm:i915_gem_init] CACHE_MODE_1: 0x000001C0
>> [    5.982258] [drm:intel_init_clock_gating]
>> ...
>> [   10.037019] [drm:do_switch] blitter ring, CACHE_MODE_1: 0x00000180, uninitialized: 0
>> ...
>> [   10.488145] [drm:do_switch] render ring, CACHE_MODE_1: 0x00000180, uninitialized: 0
>> ...
>>
>> I am currently testing this with an igt which triggers a gpu reset and compares
>> WA register contents before and after reset but the test fails because of
>> this register hence not sending it now.
>> Please let me know how to keep this WA valid after a context switch.
>>
>>
>> Arun Siluvery (1):
>>    drm/i915/bdw: Initialize BDW workarounds in render ring init fn
>>
>>   drivers/gpu/drm/i915/i915_debugfs.c     | 46 ++++++++++++++++++++++
>>   drivers/gpu/drm/i915/intel_pm.c         | 59 ----------------------------
>>   drivers/gpu/drm/i915/intel_ringbuffer.c | 68 +++++++++++++++++++++++++++++++++
>>   3 files changed, 114 insertions(+), 59 deletions(-)
>>
>> --
>> 1.9.2
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] Move BDW workarounds to ring init fn
  2014-07-29 22:27   ` Siluvery, Arun
@ 2014-07-30  9:36     ` Ville Syrjälä
  0 siblings, 0 replies; 10+ messages in thread
From: Ville Syrjälä @ 2014-07-30  9:36 UTC (permalink / raw)
  To: Siluvery, Arun; +Cc: intel-gfx

On Tue, Jul 29, 2014 at 11:27:55PM +0100, Siluvery, Arun wrote:
> On 28/07/2014 18:26, Ville Syrjälä wrote:
> > On Mon, Jul 28, 2014 at 05:31:45PM +0100, arun.siluvery@linux.intel.com wrote:
> >> From: Arun Siluvery <arun.siluvery@linux.intel.com>
> >>
> >> This patch moves BDW workarounds from init_clock_gating() to render ring
> >> init fn otherwise they are lost when gpu is reset.
> >> In case of execlists, some of the workarounds modify registers that are
> >> part of register state context which doesn't get initialized until
> >> init_clock_gating(); this results in default context with incorrect values
> >> as it is restored and saved before updated by workarounds.
> >
> > I don't think it has to do with execlists. Many of the registers are
> > part of the context image even in ring buffer mode AFAIK.
> >
> >>
> >> Open issue:
> >> For Wa4x4STCOptimizationDisable, we set CACHE_MODE_1[6:6] = 1
> >> At the time when HW contexts are enabled after rings are initialized with
> >> default context this workaround is valid but followed by a context switch
> >> this is getting reset, please see below log snippet.
> >
> > This is a bit weird. The default context should have restore inhibit==1
> > so it shouldn't clobber the CACHE_MODE_1 register. There was a specific magic
> > dance you're supposed to do when accessing such registers with mmio, but here
> > we do the write even before the first context switch.
> >
> > Apparently there was some kind of problem with CACHE_MODE_0 on snb too:
> >   commit 3a69ddd6f872180b6f61fda87152b37202118fbc
> >   Author: Kenneth Graunke <kenneth@whitecape.org>
> >   Date:   Fri Apr 27 12:44:41 2012 -0700
> >
> >      drm/i915: Set the Stencil Cache eviction policy to non-LRA mode.
> >
> > but IIRC I wasn't able to reproduce it when I tried.
> 
> Similar to this register I am also applying this in render ring init fn.
> 
> >
> > Maybe we need to delay these register writes until we've switched to the default
> > context?
> >
> In its current state (WAs applied in init_clock_gating()) we are writing 
> these registers after switching to default context.
> 
> When a new hw context is created does all the registers part of context 
> start with default values or they sample the current state? and at what 
> point this sampling takes place?

We load each uninitialized context with restore inhibit=true so AFAIK
the current register values should stay intact.

> 
> As a test I have updated CACHE_MODE_1 after mi_set_context() then the 
> workaround was valid with every context switch but I think it may not be 
> the right way otherwise we will have to update other WA registers also 
> at this point with every context switch.

Maybe there's something special about the very first context switch?
Though I don't see why that would be the case.

> 
> regards
> Arun
> 
> >>
> >> ...
> >> [    5.978209] [drm:i915_pages_create_for_stolen] offset=0x0, size=8294400
> >> [    5.978213] [drm:intel_alloc_plane_obj] plane fb obj ffff8801472e0000
> >> [    5.978215] [drm:i915_gem_setup_global_gtt] reserving preallocated space: 0 + 7e9000
> >> [    5.978216] [drm:i915_gem_setup_global_gtt] clearing unused GTT space: [7e9000, fffff000]
> >> [    5.979613] [drm:i915_gem_init] CACHE_MODE_1: 0x00000180
> >> [    5.981372] [drm:gen8_ppgtt_init] Allocated 4 pages for page directories (0 wasted)
> >> [    5.981373] [drm:gen8_ppgtt_init] Allocated 2048 pages for page tables (0 wasted)
> >> [    5.981376] [drm:i915_gem_context_init] HW context support initialized
> >> [    5.981462] [drm:i915_gem_init_hw] CACHE_MODE_1: 0x00000180
> >> [    5.981467] [drm:i915_gem_init_rings] CACHE_MODE_1: 0x00000180
> >> [    5.981704] [drm:bdw_init_workarounds] CACHE_MODE_1: 0x000001C0
> >> [    5.981716] [drm:init_status_page] bsd ring hws offset: 0x0081e000
> >> [    5.981792] [drm:init_status_page] blitter ring hws offset: 0x0083f000
> >> [    5.981910] [drm:init_status_page] video enhancement ring hws offset: 0x00860000
> >> [    5.982001] [drm:i915_gem_init_hw] CACHE_MODE_1: 0x000001C0
> >> [    5.982104] [drm:i915_gem_context_enable] Switch render ring to default_context
> >> [    5.982106] [drm:i915_gem_render_state_init] render ring: Render state init
> >> [    5.982120] [drm:do_switch] render ring, CACHE_MODE_1: 0x000001C0, uninitialized: 1
> >> [    5.982121] [drm:i915_gem_context_enable] Switch bsd ring to default_context
> >> [    5.982122] [drm:do_switch] bsd ring, CACHE_MODE_1: 0x000001C0, uninitialized: 0
> >> [    5.982123] [drm:i915_gem_context_enable] Switch blitter ring to default_context
> >> [    5.982126] [drm:do_switch] blitter ring, CACHE_MODE_1: 0x000001C0, uninitialized: 0
> >> [    5.982126] [drm:i915_gem_context_enable] Switch video enhancement ring to default_context
> >> [    5.982128] [drm:do_switch] video enhancement ring, CACHE_MODE_1: 0x000001C0, uninitialized: 0
> >> [    5.982133] [drm:i915_gem_init] CACHE_MODE_1: 0x000001C0
> >> [    5.982258] [drm:intel_init_clock_gating]
> >> ...
> >> [   10.037019] [drm:do_switch] blitter ring, CACHE_MODE_1: 0x00000180, uninitialized: 0
> >> ...
> >> [   10.488145] [drm:do_switch] render ring, CACHE_MODE_1: 0x00000180, uninitialized: 0
> >> ...
> >>
> >> I am currently testing this with an igt which triggers a gpu reset and compares
> >> WA register contents before and after reset but the test fails because of
> >> this register hence not sending it now.
> >> Please let me know how to keep this WA valid after a context switch.
> >>
> >>
> >> Arun Siluvery (1):
> >>    drm/i915/bdw: Initialize BDW workarounds in render ring init fn
> >>
> >>   drivers/gpu/drm/i915/i915_debugfs.c     | 46 ++++++++++++++++++++++
> >>   drivers/gpu/drm/i915/intel_pm.c         | 59 ----------------------------
> >>   drivers/gpu/drm/i915/intel_ringbuffer.c | 68 +++++++++++++++++++++++++++++++++
> >>   3 files changed, 114 insertions(+), 59 deletions(-)
> >>
> >> --
> >> 1.9.2
> >>
> >> _______________________________________________
> >> Intel-gfx mailing list
> >> Intel-gfx@lists.freedesktop.org
> >> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> >

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-07-30  9:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-28 16:31 [RFC] Move BDW workarounds to ring init fn arun.siluvery
2014-07-28 16:31 ` [RFC] drm/i915/bdw: Initialize BDW workarounds in render " arun.siluvery
2014-07-28 17:00   ` Ville Syrjälä
2014-07-28 19:22     ` Daniel Vetter
2014-07-28 20:41       ` Siluvery, Arun
2014-07-28 20:46     ` Siluvery, Arun
2014-07-29  9:56       ` Ville Syrjälä
2014-07-28 17:26 ` [RFC] Move BDW workarounds to " Ville Syrjälä
2014-07-29 22:27   ` Siluvery, Arun
2014-07-30  9:36     ` Ville Syrjälä

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.