All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Rendering Specific HW Workarounds for VLV
@ 2014-03-24  6:49 sourab.gupta
  2014-03-24  6:49 ` [PATCH 1/6] drm/i915/vlv: Added a rendering specific Hw WA 'WaTlbInvalidateStoreDataBefore' sourab.gupta
                   ` (6 more replies)
  0 siblings, 7 replies; 28+ messages in thread
From: sourab.gupta @ 2014-03-24  6:49 UTC (permalink / raw)
  To: intel-gfx; +Cc: Sourab Gupta

From: Sourab Gupta <sourab.gupta@intel.com>

This patch series adds rendering specific HW workarounds for VLV platform.
These patches leads to stable behavior on VLV, especially
when playing 3D Apps, benchmarks.

Though, the patch set was submitter earlier, this new patch set is initiating
a clean thread. We have addressed the comments on earlier patches and given
the latest version of these patches in a single thread.

Akash Goel (6):
  drm/i915/vlv: Added a rendering specific Hw WA
    'WaTlbInvalidateStoreDataBefore'
  drm/i915/vlv: Added a rendering specific Hw WA
    'WaSendDummy3dPrimitveAfterSetContext'
  drm/i915: Enabling the TLB invalidate bit in GFX Mode register
  drm/i915/vlv: Remove the enabling of VS_TIMER_DISPATCH bit in MI MODE
    reg
  drm/i915/vlv:Implement WaDisable_RenderCache_OperationalFlush
  drm/i915/vlv: Modified Implementation of WaDisableL3Bank2xClockGate

 drivers/gpu/drm/i915/i915_gem_context.c | 65 ++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_reg.h         |  6 +++
 drivers/gpu/drm/i915/intel_pm.c         | 13 ++++++-
 drivers/gpu/drm/i915/intel_ringbuffer.c | 38 ++++++++++++++++++-
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
 5 files changed, 117 insertions(+), 6 deletions(-)

-- 
1.8.5.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/6] drm/i915/vlv: Added a rendering specific Hw WA 'WaTlbInvalidateStoreDataBefore'
  2014-03-24  6:49 [PATCH 0/6] Rendering Specific HW Workarounds for VLV sourab.gupta
@ 2014-03-24  6:49 ` sourab.gupta
  2014-03-24  9:32   ` Chris Wilson
  2014-03-24  6:49 ` [PATCH 2/6] drm/i915/vlv: Added a rendering specific Hw WA 'WaSendDummy3dPrimitveAfterSetContext' sourab.gupta
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: sourab.gupta @ 2014-03-24  6:49 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel, Sourab Gupta

From: Akash Goel <akash.goel@intel.com>

Added a new rendering specific Workaround 'WaTlbInvalidateStoreDataBefore'.
In this WA, before pipecontrol with TLB invalidate set, need to add 2 MI
Store data commands.

Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 87d1a2d..2812384 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2207,6 +2207,28 @@ intel_ring_invalidate_all_caches(struct intel_ring_buffer *ring)
 	uint32_t flush_domains;
 	int ret;
 
+	if (IS_VALLEYVIEW(ring->dev)) {
+		/*
+		 * WaTlbInvalidateStoreDataBefore:vlv
+		 * Before pipecontrol with TLB invalidate set, need 2 store
+		 * data commands (such as MI_STORE_DATA_IMM or MI_STORE_DATA_INDEX)
+		 * Without this, hardware cannot guarantee the command after the
+		 * PIPE_CONTROL with TLB inv will not use the old TLB values.
+		 */
+		int i;
+		ret = intel_ring_begin(ring, 4 * 2);
+		if (ret)
+			return ret;
+		for (i = 0; i < 2; i++) {
+			intel_ring_emit(ring, MI_STORE_DWORD_INDEX);
+			intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_INDEX <<
+						MI_STORE_DWORD_INDEX_SHIFT);
+			intel_ring_emit(ring, 0);
+			intel_ring_emit(ring, MI_NOOP);
+		}
+		intel_ring_advance(ring);
+	}
+
 	flush_domains = 0;
 	if (ring->gpu_caches_dirty)
 		flush_domains = I915_GEM_GPU_DOMAINS;
-- 
1.8.5.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/6] drm/i915/vlv: Added a rendering specific Hw WA 'WaSendDummy3dPrimitveAfterSetContext'
  2014-03-24  6:49 [PATCH 0/6] Rendering Specific HW Workarounds for VLV sourab.gupta
  2014-03-24  6:49 ` [PATCH 1/6] drm/i915/vlv: Added a rendering specific Hw WA 'WaTlbInvalidateStoreDataBefore' sourab.gupta
@ 2014-03-24  6:49 ` sourab.gupta
  2014-03-24  9:31   ` Daniel Vetter
  2014-03-24  9:39   ` Chris Wilson
  2014-03-24  6:49 ` [PATCH 3/6] drm/i915: Enabling the TLB invalidate bit in GFX Mode register sourab.gupta
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 28+ messages in thread
From: sourab.gupta @ 2014-03-24  6:49 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel, Sourab Gupta

From: Akash Goel <akash.goel@intel.com>

This workaround is needed on VLV for the HW context feature.
It is used after adding the mi_set_context command in ring buffer
for Hw context switch. As per the spec
"The software must send a pipe_control with a CS stall and a post sync
operation and then a dummy DRAW after every MI_SET_CONTEXT and after any
PIPELINE_SELECT that is enabling 3D mode".

Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 65 ++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_reg.h         |  3 ++
 drivers/gpu/drm/i915/intel_ringbuffer.c |  9 +++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
 4 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 6043062..544fc2d 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -584,6 +584,58 @@ i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id)
 	return ctx;
 }
 
+static inline void
+mi_set_context_dummy3d_prim_wa(struct intel_ring_buffer *ring)
+{
+	u32 scratch_addr;
+	u32 flags = 0;
+
+	/*
+	 * Check if we have the scratch page allocated needed
+	 * for the Pipe Control command, otherwise don't apply
+	 * the dummmy 3d primitive workaround & add NOOPs instead
+	 */
+	if (get_pipe_control_scratch_addr(ring)) {
+		/* Actual scratch location is at 128 bytes offset */
+		scratch_addr = get_pipe_control_scratch_addr(ring) + 128;
+
+		/*
+		 * WaSendDummy3dPrimitveAfterSetContext:vlv
+		 * Software must send a pipe_control with a CS stall
+		 * and a post sync operation and then a dummy DRAW after
+		 * every MI_SET_CONTEXT and after any PIPELINE_SELECT that
+		 * is enabling 3D mode. A dummy draw is a 3DPRIMITIVE command
+		 * with Indirect Parameter Enable set to 0, UAV Coherency
+		 * Required set to 0, Predicate Enable set to 0,
+		 * End Offset Enable set to 0, and Vertex Count Per Instance
+		 * set to 0, All other parameters are a don't care.
+		 */
+
+		/*
+		 * Add a pipe control with CS Stall and postsync op
+		 * before dummy 3D_PRIMITIVE
+		 */
+		flags |= PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_CS_STALL;
+		intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4));
+		intel_ring_emit(ring, flags);
+		intel_ring_emit(ring, scratch_addr | PIPE_CONTROL_GLOBAL_GTT);
+		intel_ring_emit(ring, 0);
+
+		/* Add a dummy 3D_PRIMITVE */
+		intel_ring_emit(ring, GFX_OP_3DPRIMITIVE());
+		intel_ring_emit(ring, 4); /* PrimTopoType*/
+		intel_ring_emit(ring, 0); /* VertexCountPerInstance */
+		intel_ring_emit(ring, 0); /* StartVertexLocation */
+		intel_ring_emit(ring, 0); /* InstanceCount */
+		intel_ring_emit(ring, 0); /* StartInstanceLocation */
+		intel_ring_emit(ring, 0); /* BaseVertexLocation  */
+	} else {
+		int i;
+		for (i = 0; i < 11; i++)
+			intel_ring_emit(ring, MI_NOOP);
+	}
+}
+
 static inline int
 mi_set_context(struct intel_ring_buffer *ring,
 	       struct i915_hw_context *new_context,
@@ -602,7 +654,10 @@ mi_set_context(struct intel_ring_buffer *ring,
 			return ret;
 	}
 
-	ret = intel_ring_begin(ring, 6);
+	if (IS_VALLEYVIEW(ring->dev))
+		ret = intel_ring_begin(ring, 6+4+8);
+	else
+		ret = intel_ring_begin(ring, 6);
 	if (ret)
 		return ret;
 
@@ -626,7 +681,13 @@ mi_set_context(struct intel_ring_buffer *ring,
 	intel_ring_emit(ring, MI_NOOP);
 
 	if (IS_GEN7(ring->dev))
-		intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_ENABLE);
+		if (IS_VALLEYVIEW(ring->dev)) {
+			/* FIXME, should also apply to ivb */
+			mi_set_context_dummy3d_prim_wa(ring);
+			intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_ENABLE);
+			intel_ring_emit(ring, MI_NOOP);
+		} else
+			intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_ENABLE);
 	else
 		intel_ring_emit(ring, MI_NOOP);
 
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index adcb9c7..fd25e8e4 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -348,6 +348,9 @@
 #define   PIPE_CONTROL_DEPTH_CACHE_FLUSH		(1<<0)
 #define   PIPE_CONTROL_GLOBAL_GTT (1<<2) /* in addr dword */
 
+#define GFX_OP_3DPRIMITIVE()              \
+	((0x3<<29)|(0x3<<27)|(0x3<<24)|       \
+	 (0x0<<16)|(0x0<<10)|(0x0<<8)|(7-2))
 
 /*
  * Reset registers
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 2812384..98ddfec 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -560,6 +560,15 @@ err:
 	return ret;
 }
 
+u32
+get_pipe_control_scratch_addr(struct intel_ring_buffer *ring)
+{
+	if (ring->scratch.obj == NULL)
+		return 0;
+
+	return ring->scratch.gtt_offset;
+}
+
 static int init_render_ring(struct intel_ring_buffer *ring)
 {
 	struct drm_device *dev = ring->dev;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index f11ceb2..640c56d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -294,6 +294,7 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev);
 
 u32 intel_ring_get_active_head(struct intel_ring_buffer *ring);
 void intel_ring_setup_status_page(struct intel_ring_buffer *ring);
+u32 get_pipe_control_scratch_addr(struct intel_ring_buffer *ring);
 
 static inline u32 intel_ring_get_tail(struct intel_ring_buffer *ring)
 {
-- 
1.8.5.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/6] drm/i915: Enabling the TLB invalidate bit in GFX Mode register
  2014-03-24  6:49 [PATCH 0/6] Rendering Specific HW Workarounds for VLV sourab.gupta
  2014-03-24  6:49 ` [PATCH 1/6] drm/i915/vlv: Added a rendering specific Hw WA 'WaTlbInvalidateStoreDataBefore' sourab.gupta
  2014-03-24  6:49 ` [PATCH 2/6] drm/i915/vlv: Added a rendering specific Hw WA 'WaSendDummy3dPrimitveAfterSetContext' sourab.gupta
@ 2014-03-24  6:49 ` sourab.gupta
  2014-03-24  9:35   ` Chris Wilson
  2014-03-24  6:49 ` [PATCH 4/6] drm/i915/vlv: Remove the enabling of VS_TIMER_DISPATCH bit in MI MODE reg sourab.gupta
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 28+ messages in thread
From: sourab.gupta @ 2014-03-24  6:49 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel, Sourab Gupta

From: Akash Goel <akash.goel@intel.com>

This patch Enables the bit for TLB invalidate in GFX Mode register
for Gen7.

According to bspec,  When enabled this bit limits the invalidation
of the TLB only to batch buffer boundaries, to pipe_control
commands which have the TLB invalidation bit set and sync flushes.
If disabled, the TLB caches are flushed for every full flush of
the pipeline.

Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 98ddfec..6d4e8f5 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -594,7 +594,7 @@ static int init_render_ring(struct intel_ring_buffer *ring)
 
 	if (IS_GEN7(dev))
 		I915_WRITE(GFX_MODE_GEN7,
-			   _MASKED_BIT_DISABLE(GFX_TLB_INVALIDATE_EXPLICIT) |
+			   _MASKED_BIT_ENABLE(GFX_TLB_INVALIDATE_EXPLICIT) |
 			   _MASKED_BIT_ENABLE(GFX_REPLAY_MODE));
 
 	if (INTEL_INFO(dev)->gen >= 5) {
-- 
1.8.5.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/6] drm/i915/vlv: Remove the enabling of VS_TIMER_DISPATCH bit in MI MODE reg
  2014-03-24  6:49 [PATCH 0/6] Rendering Specific HW Workarounds for VLV sourab.gupta
                   ` (2 preceding siblings ...)
  2014-03-24  6:49 ` [PATCH 3/6] drm/i915: Enabling the TLB invalidate bit in GFX Mode register sourab.gupta
@ 2014-03-24  6:49 ` sourab.gupta
  2014-03-24  6:49 ` [PATCH 5/6] drm/i915/vlv:Implement WaDisable_RenderCache_OperationalFlush sourab.gupta
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 28+ messages in thread
From: sourab.gupta @ 2014-03-24  6:49 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

From: Akash Goel <akash.goel@intel.com>

Removing the VS_TIMER_DISPATCH bit enable for MI MODE reg for
VLV platform as it is not required.

Signed-off-by: Akash Goel <akash.goel@intel.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 6d4e8f5..4b4ca58 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -576,7 +576,10 @@ static int init_render_ring(struct intel_ring_buffer *ring)
 	int ret = init_ring_common(ring);
 
 	if (INTEL_INFO(dev)->gen > 3)
-		I915_WRITE(MI_MODE, _MASKED_BIT_ENABLE(VS_TIMER_DISPATCH));
+		/* FIXME, should also apply to ivb */
+		if (!IS_VALLEYVIEW(dev))
+			I915_WRITE(MI_MODE,
+					_MASKED_BIT_ENABLE(VS_TIMER_DISPATCH));
 
 	/* We need to disable the AsyncFlip performance optimisations in order
 	 * to use MI_WAIT_FOR_EVENT within the CS. It should already be
-- 
1.8.5.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/6] drm/i915/vlv:Implement WaDisable_RenderCache_OperationalFlush
  2014-03-24  6:49 [PATCH 0/6] Rendering Specific HW Workarounds for VLV sourab.gupta
                   ` (3 preceding siblings ...)
  2014-03-24  6:49 ` [PATCH 4/6] drm/i915/vlv: Remove the enabling of VS_TIMER_DISPATCH bit in MI MODE reg sourab.gupta
@ 2014-03-24  6:49 ` sourab.gupta
  2014-03-24  9:32   ` Chris Wilson
  2014-03-24  6:49 ` [PATCH 6/6] drm/i915/vlv: Modified Implementation of WaDisableL3Bank2xClockGate sourab.gupta
  2014-03-24  9:35 ` [PATCH 0/6] Rendering Specific HW Workarounds for VLV Daniel Vetter
  6 siblings, 1 reply; 28+ messages in thread
From: sourab.gupta @ 2014-03-24  6:49 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel, Sourab Gupta

From: Akash Goel <akash.goel@intel.com>

In Valleyview, Operational flush cannot be enabled on
BWG A0 [Errata BWT006]

Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h | 3 +++
 drivers/gpu/drm/i915/intel_pm.c | 6 ++++++
 2 files changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index fd25e8e4..803f392 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -974,6 +974,9 @@ enum punit_power_well {
 #define   ECO_GATING_CX_ONLY	(1<<3)
 #define   ECO_FLIP_DONE		(1<<0)
 
+#define GEN7_CACHE_MODE_0	0x07000 /* IVB+ only */
+#define GEN7_RC_OP_FLUSH_ENABLE (1<<0)
+
 #define CACHE_MODE_0_GEN7	0x7000 /* IVB+ */
 #define   HIZ_RAW_STALL_OPT_DISABLE (1<<2)
 #define CACHE_MODE_1		0x7004 /* IVB+ */
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index fd68f93..6c04b79 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -5068,6 +5068,12 @@ static void valleyview_init_clock_gating(struct drm_device *dev)
 		   _MASKED_BIT_ENABLE(GEN7_MAX_PS_THREAD_DEP |
 				      GEN7_PSD_SINGLE_PORT_DISPATCH_ENABLE));
 
+	/* WaDisable_RenderCache_OperationalFlush:vlv
+	 * Clear bit 0, so we do a AND with the mask
+	 * to keep other bits the same */
+	I915_WRITE(GEN7_CACHE_MODE_0,  (I915_READ(GEN7_CACHE_MODE_0) |
+			  _MASKED_BIT_DISABLE(GEN7_RC_OP_FLUSH_ENABLE)));
+
 	/* WaForceL3Serialization:vlv */
 	I915_WRITE(GEN7_L3SQCREG4, I915_READ(GEN7_L3SQCREG4) &
 		   ~L3SQ_URB_READ_CAM_MATCH_DISABLE);
-- 
1.8.5.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 6/6] drm/i915/vlv: Modified Implementation of WaDisableL3Bank2xClockGate
  2014-03-24  6:49 [PATCH 0/6] Rendering Specific HW Workarounds for VLV sourab.gupta
                   ` (4 preceding siblings ...)
  2014-03-24  6:49 ` [PATCH 5/6] drm/i915/vlv:Implement WaDisable_RenderCache_OperationalFlush sourab.gupta
@ 2014-03-24  6:49 ` sourab.gupta
  2014-03-24  9:41   ` Chris Wilson
  2014-03-24  9:35 ` [PATCH 0/6] Rendering Specific HW Workarounds for VLV Daniel Vetter
  6 siblings, 1 reply; 28+ messages in thread
From: sourab.gupta @ 2014-03-24  6:49 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel, Sourab Gupta

From: Akash Goel <akash.goel@intel.com>

For VLV, disabling L3 clock gating- MMIO 940c[25] = 1

Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
---
 drivers/gpu/drm/i915/intel_pm.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 6c04b79..fbfdca7 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -5096,8 +5096,11 @@ static void valleyview_init_clock_gating(struct drm_device *dev)
 	I915_WRITE(GEN6_UCGCTL2,
 		   GEN6_RCZUNIT_CLOCK_GATE_DISABLE);
 
-	/* WaDisableL3Bank2xClockGate:vlv */
-	I915_WRITE(GEN7_UCGCTL4, GEN7_L3BANK2X_CLOCK_GATE_DISABLE);
+	/* WaDisableL3Bank2xClockGate:vlv
+	 * Disabling L3 clock gating- MMIO 940c[25] = 1
+	 * Set bit 25, to disable L3_BANK_2x_CLK_GATING */
+	I915_WRITE(GEN7_UCGCTL4,
+			I915_READ(GEN7_UCGCTL4) | GEN7_L3BANK2X_CLOCK_GATE_DISABLE);
 
 	I915_WRITE(MI_ARB_VLV, MI_ARB_DISPLAY_TRICKLE_FEED_DISABLE);
 
-- 
1.8.5.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/6] drm/i915/vlv: Added a rendering specific Hw WA 'WaSendDummy3dPrimitveAfterSetContext'
  2014-03-24  6:49 ` [PATCH 2/6] drm/i915/vlv: Added a rendering specific Hw WA 'WaSendDummy3dPrimitveAfterSetContext' sourab.gupta
@ 2014-03-24  9:31   ` Daniel Vetter
  2014-03-24 11:11     ` Gupta, Sourab
  2014-03-24  9:39   ` Chris Wilson
  1 sibling, 1 reply; 28+ messages in thread
From: Daniel Vetter @ 2014-03-24  9:31 UTC (permalink / raw)
  To: sourab.gupta; +Cc: Akash Goel, intel-gfx

On Mon, Mar 24, 2014 at 12:19:20PM +0530, sourab.gupta@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> This workaround is needed on VLV for the HW context feature.
> It is used after adding the mi_set_context command in ring buffer
> for Hw context switch. As per the spec
> "The software must send a pipe_control with a CS stall and a post sync
> operation and then a dummy DRAW after every MI_SET_CONTEXT and after any
> PIPELINE_SELECT that is enabling 3D mode".
> 
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 65 ++++++++++++++++++++++++++++++++-
>  drivers/gpu/drm/i915/i915_reg.h         |  3 ++
>  drivers/gpu/drm/i915/intel_ringbuffer.c |  9 +++++
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
>  4 files changed, 76 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 6043062..544fc2d 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -584,6 +584,58 @@ i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id)
>  	return ctx;
>  }
>  
> +static inline void
> +mi_set_context_dummy3d_prim_wa(struct intel_ring_buffer *ring)
> +{
> +	u32 scratch_addr;
> +	u32 flags = 0;
> +
> +	/*
> +	 * Check if we have the scratch page allocated needed
> +	 * for the Pipe Control command, otherwise don't apply
> +	 * the dummmy 3d primitive workaround & add NOOPs instead
> +	 */
> +	if (get_pipe_control_scratch_addr(ring)) {

This check here looks like it papers over a bug (or at least rather
serious init ordering confusion). Can you please rework this to avoid this
condition?

> +		/* Actual scratch location is at 128 bytes offset */
> +		scratch_addr = get_pipe_control_scratch_addr(ring) + 128;
> +
> +		/*
> +		 * WaSendDummy3dPrimitveAfterSetContext:vlv
> +		 * Software must send a pipe_control with a CS stall
> +		 * and a post sync operation and then a dummy DRAW after
> +		 * every MI_SET_CONTEXT and after any PIPELINE_SELECT that
> +		 * is enabling 3D mode. A dummy draw is a 3DPRIMITIVE command
> +		 * with Indirect Parameter Enable set to 0, UAV Coherency
> +		 * Required set to 0, Predicate Enable set to 0,
> +		 * End Offset Enable set to 0, and Vertex Count Per Instance
> +		 * set to 0, All other parameters are a don't care.
> +		 */
> +
> +		/*
> +		 * Add a pipe control with CS Stall and postsync op
> +		 * before dummy 3D_PRIMITIVE
> +		 */
> +		flags |= PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_CS_STALL;
> +		intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4));
> +		intel_ring_emit(ring, flags);
> +		intel_ring_emit(ring, scratch_addr | PIPE_CONTROL_GLOBAL_GTT);
> +		intel_ring_emit(ring, 0);
> +
> +		/* Add a dummy 3D_PRIMITVE */
> +		intel_ring_emit(ring, GFX_OP_3DPRIMITIVE());
> +		intel_ring_emit(ring, 4); /* PrimTopoType*/
> +		intel_ring_emit(ring, 0); /* VertexCountPerInstance */
> +		intel_ring_emit(ring, 0); /* StartVertexLocation */
> +		intel_ring_emit(ring, 0); /* InstanceCount */
> +		intel_ring_emit(ring, 0); /* StartInstanceLocation */
> +		intel_ring_emit(ring, 0); /* BaseVertexLocation  */

Is this save even in a pristine context when the gpu comes right out of
reset? E.g. on resume or after gpu reset. We've had tons of fun in this
area, see e.g. also Ben's latest bdw patch.
-Daniel

> +	} else {
> +		int i;
> +		for (i = 0; i < 11; i++)
> +			intel_ring_emit(ring, MI_NOOP);
> +	}
> +}
> +
>  static inline int
>  mi_set_context(struct intel_ring_buffer *ring,
>  	       struct i915_hw_context *new_context,
> @@ -602,7 +654,10 @@ mi_set_context(struct intel_ring_buffer *ring,
>  			return ret;
>  	}
>  
> -	ret = intel_ring_begin(ring, 6);
> +	if (IS_VALLEYVIEW(ring->dev))
> +		ret = intel_ring_begin(ring, 6+4+8);
> +	else
> +		ret = intel_ring_begin(ring, 6);
>  	if (ret)
>  		return ret;
>  
> @@ -626,7 +681,13 @@ mi_set_context(struct intel_ring_buffer *ring,
>  	intel_ring_emit(ring, MI_NOOP);
>  
>  	if (IS_GEN7(ring->dev))
> -		intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_ENABLE);
> +		if (IS_VALLEYVIEW(ring->dev)) {
> +			/* FIXME, should also apply to ivb */
> +			mi_set_context_dummy3d_prim_wa(ring);
> +			intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_ENABLE);
> +			intel_ring_emit(ring, MI_NOOP);
> +		} else
> +			intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_ENABLE);
>  	else
>  		intel_ring_emit(ring, MI_NOOP);
>  
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index adcb9c7..fd25e8e4 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -348,6 +348,9 @@
>  #define   PIPE_CONTROL_DEPTH_CACHE_FLUSH		(1<<0)
>  #define   PIPE_CONTROL_GLOBAL_GTT (1<<2) /* in addr dword */
>  
> +#define GFX_OP_3DPRIMITIVE()              \
> +	((0x3<<29)|(0x3<<27)|(0x3<<24)|       \
> +	 (0x0<<16)|(0x0<<10)|(0x0<<8)|(7-2))
>  
>  /*
>   * Reset registers
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 2812384..98ddfec 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -560,6 +560,15 @@ err:
>  	return ret;
>  }
>  
> +u32
> +get_pipe_control_scratch_addr(struct intel_ring_buffer *ring)
> +{
> +	if (ring->scratch.obj == NULL)
> +		return 0;
> +
> +	return ring->scratch.gtt_offset;
> +}
> +
>  static int init_render_ring(struct intel_ring_buffer *ring)
>  {
>  	struct drm_device *dev = ring->dev;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index f11ceb2..640c56d 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -294,6 +294,7 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev);
>  
>  u32 intel_ring_get_active_head(struct intel_ring_buffer *ring);
>  void intel_ring_setup_status_page(struct intel_ring_buffer *ring);
> +u32 get_pipe_control_scratch_addr(struct intel_ring_buffer *ring);
>  
>  static inline u32 intel_ring_get_tail(struct intel_ring_buffer *ring)
>  {
> -- 
> 1.8.5.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/6] drm/i915/vlv: Added a rendering specific Hw WA 'WaTlbInvalidateStoreDataBefore'
  2014-03-24  6:49 ` [PATCH 1/6] drm/i915/vlv: Added a rendering specific Hw WA 'WaTlbInvalidateStoreDataBefore' sourab.gupta
@ 2014-03-24  9:32   ` Chris Wilson
  2014-03-24 11:20     ` Gupta, Sourab
  0 siblings, 1 reply; 28+ messages in thread
From: Chris Wilson @ 2014-03-24  9:32 UTC (permalink / raw)
  To: sourab.gupta; +Cc: Akash Goel, intel-gfx

On Mon, Mar 24, 2014 at 12:19:19PM +0530, sourab.gupta@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> Added a new rendering specific Workaround 'WaTlbInvalidateStoreDataBefore'.
> In this WA, before pipecontrol with TLB invalidate set, need to add 2 MI
> Store data commands.
> 
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 87d1a2d..2812384 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2207,6 +2207,28 @@ intel_ring_invalidate_all_caches(struct intel_ring_buffer *ring)
>  	uint32_t flush_domains;
>  	int ret;
>  
> +	if (IS_VALLEYVIEW(ring->dev)) {
The ring flushes are vfuncs, so why is this here and not in a special
vlv ring flush?

> +		/*
> +		 * WaTlbInvalidateStoreDataBefore:vlv
> +		 * Before pipecontrol with TLB invalidate set, need 2 store
> +		 * data commands (such as MI_STORE_DATA_IMM or MI_STORE_DATA_INDEX)
> +		 * Without this, hardware cannot guarantee the command after the
> +		 * PIPE_CONTROL with TLB inv will not use the old TLB values.

Crumbs, it sounds like our i-g-t are not sensitive enough. This bug
crops up in many disguises over the years, do you have any suggestion on
how we can improve our tests?

> +		 */
> +		int i;
> +		ret = intel_ring_begin(ring, 4 * 2);

This can be we written to use 6 dwords.

> +		if (ret)
> +			return ret;
> +		for (i = 0; i < 2; i++) {
> +			intel_ring_emit(ring, MI_STORE_DWORD_INDEX);
> +			intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_INDEX <<
> +						MI_STORE_DWORD_INDEX_SHIFT);

This is I915_GEM_HWS_SCRATCH_ADDR

> +			intel_ring_emit(ring, 0);
> +			intel_ring_emit(ring, MI_NOOP);
> +		}
> +		intel_ring_advance(ring);
> +	}
> +
>  	flush_domains = 0;
>  	if (ring->gpu_caches_dirty)
>  		flush_domains = I915_GEM_GPU_DOMAINS;

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/6] drm/i915/vlv:Implement WaDisable_RenderCache_OperationalFlush
  2014-03-24  6:49 ` [PATCH 5/6] drm/i915/vlv:Implement WaDisable_RenderCache_OperationalFlush sourab.gupta
@ 2014-03-24  9:32   ` Chris Wilson
  0 siblings, 0 replies; 28+ messages in thread
From: Chris Wilson @ 2014-03-24  9:32 UTC (permalink / raw)
  To: sourab.gupta; +Cc: Akash Goel, intel-gfx

On Mon, Mar 24, 2014 at 12:19:23PM +0530, sourab.gupta@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> In Valleyview, Operational flush cannot be enabled on
> BWG A0 [Errata BWT006]
> 
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_reg.h | 3 +++
>  drivers/gpu/drm/i915/intel_pm.c | 6 ++++++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index fd25e8e4..803f392 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -974,6 +974,9 @@ enum punit_power_well {
>  #define   ECO_GATING_CX_ONLY	(1<<3)
>  #define   ECO_FLIP_DONE		(1<<0)
>  
> +#define GEN7_CACHE_MODE_0	0x07000 /* IVB+ only */
> +#define GEN7_RC_OP_FLUSH_ENABLE (1<<0)
> +
>  #define CACHE_MODE_0_GEN7	0x7000 /* IVB+ */
>  #define   HIZ_RAW_STALL_OPT_DISABLE (1<<2)
>  #define CACHE_MODE_1		0x7004 /* IVB+ */
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index fd68f93..6c04b79 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -5068,6 +5068,12 @@ static void valleyview_init_clock_gating(struct drm_device *dev)
>  		   _MASKED_BIT_ENABLE(GEN7_MAX_PS_THREAD_DEP |
>  				      GEN7_PSD_SINGLE_PORT_DISPATCH_ENABLE));
>  
> +	/* WaDisable_RenderCache_OperationalFlush:vlv
> +	 * Clear bit 0, so we do a AND with the mask
> +	 * to keep other bits the same */
> +	I915_WRITE(GEN7_CACHE_MODE_0,  (I915_READ(GEN7_CACHE_MODE_0) |
> +			  _MASKED_BIT_DISABLE(GEN7_RC_OP_FLUSH_ENABLE)));

It's a masked register, so why the read...
Bonus points for a nonsense comment.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/6] Rendering Specific HW Workarounds for VLV
  2014-03-24  6:49 [PATCH 0/6] Rendering Specific HW Workarounds for VLV sourab.gupta
                   ` (5 preceding siblings ...)
  2014-03-24  6:49 ` [PATCH 6/6] drm/i915/vlv: Modified Implementation of WaDisableL3Bank2xClockGate sourab.gupta
@ 2014-03-24  9:35 ` Daniel Vetter
  2014-03-24 11:05   ` Gupta, Sourab
  6 siblings, 1 reply; 28+ messages in thread
From: Daniel Vetter @ 2014-03-24  9:35 UTC (permalink / raw)
  To: sourab.gupta; +Cc: intel-gfx

On Mon, Mar 24, 2014 at 12:19:18PM +0530, sourab.gupta@intel.com wrote:
> From: Sourab Gupta <sourab.gupta@intel.com>
> 
> This patch series adds rendering specific HW workarounds for VLV platform.
> These patches leads to stable behavior on VLV, especially
> when playing 3D Apps, benchmarks.
> 
> Though, the patch set was submitter earlier, this new patch set is initiating
> a clean thread. We have addressed the comments on earlier patches and given
> the latest version of these patches in a single thread.
> 
> Akash Goel (6):
>   drm/i915/vlv: Added a rendering specific Hw WA
>     'WaTlbInvalidateStoreDataBefore'
>   drm/i915/vlv: Added a rendering specific Hw WA
>     'WaSendDummy3dPrimitveAfterSetContext'
>   drm/i915: Enabling the TLB invalidate bit in GFX Mode register
>   drm/i915/vlv: Remove the enabling of VS_TIMER_DISPATCH bit in MI MODE
>     reg
>   drm/i915/vlv:Implement WaDisable_RenderCache_OperationalFlush
>   drm/i915/vlv: Modified Implementation of WaDisableL3Bank2xClockGate

Thanks for resending. But your patches are missing the in-patch changelogs
completely, which makes reviewing them hard - I have no idea what was
discussed already and why we've reached certain conclusions. Can you
please dig out the changelogs from older resends (I remember v2/v3/v4 of
some of your patches) and add this again?

Also you missed e.g. Chris' tested-by on the tlb invalidation
optimization. And the commit message fails to state that you've only
tested this on vlv, but Chris has done the required testing on ivb/hsw.

Such details are important in case the patch breaks something in the
future.

One comment on one patch before I've noticed this, otherwise I think it's
better to wait for the resend.

Thanks.
> 
>  drivers/gpu/drm/i915/i915_gem_context.c | 65 ++++++++++++++++++++++++++++++++-
>  drivers/gpu/drm/i915/i915_reg.h         |  6 +++
>  drivers/gpu/drm/i915/intel_pm.c         | 13 ++++++-
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 38 ++++++++++++++++++-
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
>  5 files changed, 117 insertions(+), 6 deletions(-)
> 
> -- 
> 1.8.5.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/6] drm/i915: Enabling the TLB invalidate bit in GFX Mode register
  2014-03-24  6:49 ` [PATCH 3/6] drm/i915: Enabling the TLB invalidate bit in GFX Mode register sourab.gupta
@ 2014-03-24  9:35   ` Chris Wilson
  0 siblings, 0 replies; 28+ messages in thread
From: Chris Wilson @ 2014-03-24  9:35 UTC (permalink / raw)
  To: sourab.gupta; +Cc: Akash Goel, intel-gfx

On Mon, Mar 24, 2014 at 12:19:21PM +0530, sourab.gupta@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> This patch Enables the bit for TLB invalidate in GFX Mode register
> for Gen7.
> 
> According to bspec,  When enabled this bit limits the invalidation
> of the TLB only to batch buffer boundaries, to pipe_control
> commands which have the TLB invalidation bit set and sync flushes.
> If disabled, the TLB caches are flushed for every full flush of
> the pipeline.
> 
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk> # ivb, hsw
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/6] drm/i915/vlv: Added a rendering specific Hw WA 'WaSendDummy3dPrimitveAfterSetContext'
  2014-03-24  6:49 ` [PATCH 2/6] drm/i915/vlv: Added a rendering specific Hw WA 'WaSendDummy3dPrimitveAfterSetContext' sourab.gupta
  2014-03-24  9:31   ` Daniel Vetter
@ 2014-03-24  9:39   ` Chris Wilson
  1 sibling, 0 replies; 28+ messages in thread
From: Chris Wilson @ 2014-03-24  9:39 UTC (permalink / raw)
  To: sourab.gupta; +Cc: Akash Goel, intel-gfx

On Mon, Mar 24, 2014 at 12:19:20PM +0530, sourab.gupta@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> This workaround is needed on VLV for the HW context feature.
> It is used after adding the mi_set_context command in ring buffer
> for Hw context switch. As per the spec
> "The software must send a pipe_control with a CS stall and a post sync
> operation and then a dummy DRAW after every MI_SET_CONTEXT and after any
> PIPELINE_SELECT that is enabling 3D mode".
> 
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 65 ++++++++++++++++++++++++++++++++-
>  drivers/gpu/drm/i915/i915_reg.h         |  3 ++
>  drivers/gpu/drm/i915/intel_ringbuffer.c |  9 +++++
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
>  4 files changed, 76 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 6043062..544fc2d 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -584,6 +584,58 @@ i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id)
>  	return ctx;
>  }
>  
> +static inline void
> +mi_set_context_dummy3d_prim_wa(struct intel_ring_buffer *ring)
> +{
> +	u32 scratch_addr;
> +	u32 flags = 0;
> +
> +	/*
> +	 * Check if we have the scratch page allocated needed
> +	 * for the Pipe Control command, otherwise don't apply
> +	 * the dummmy 3d primitive workaround & add NOOPs instead
> +	 */
> +	if (get_pipe_control_scratch_addr(ring)) {
0 is a valid address. Plus using an unprefixed global static inline is
frown upon. Also we cannot get here without allocating a scratch_obj, so
we can forgo the test completely (and then kill the
get_pipe_control_scratch_addr).

> +		/* Actual scratch location is at 128 bytes offset */
> +		scratch_addr = get_pipe_control_scratch_addr(ring) + 128;
> +
> +		/*
> +		 * WaSendDummy3dPrimitveAfterSetContext:vlv
> +		 * Software must send a pipe_control with a CS stall
> +		 * and a post sync operation and then a dummy DRAW after
> +		 * every MI_SET_CONTEXT and after any PIPELINE_SELECT that
> +		 * is enabling 3D mode. A dummy draw is a 3DPRIMITIVE command
> +		 * with Indirect Parameter Enable set to 0, UAV Coherency
> +		 * Required set to 0, Predicate Enable set to 0,
> +		 * End Offset Enable set to 0, and Vertex Count Per Instance
> +		 * set to 0, All other parameters are a don't care.
> +		 */
> +
> +		/*
> +		 * Add a pipe control with CS Stall and postsync op
> +		 * before dummy 3D_PRIMITIVE
> +		 */
> +		flags |= PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_CS_STALL;
> +		intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4));
> +		intel_ring_emit(ring, flags);
> +		intel_ring_emit(ring, scratch_addr | PIPE_CONTROL_GLOBAL_GTT);
> +		intel_ring_emit(ring, 0);
> +
> +		/* Add a dummy 3D_PRIMITVE */
> +		intel_ring_emit(ring, GFX_OP_3DPRIMITIVE());

GFX_OP_3DPRIMITIVE emulates a function, why? I thought it was a
constant.

> +		intel_ring_emit(ring, 4); /* PrimTopoType*/
> +		intel_ring_emit(ring, 0); /* VertexCountPerInstance */
> +		intel_ring_emit(ring, 0); /* StartVertexLocation */
> +		intel_ring_emit(ring, 0); /* InstanceCount */
> +		intel_ring_emit(ring, 0); /* StartInstanceLocation */
> +		intel_ring_emit(ring, 0); /* BaseVertexLocation  */
> +	} else {
> +		int i;
> +		for (i = 0; i < 11; i++)
> +			intel_ring_emit(ring, MI_NOOP);
> +	}
> +}
> +
>  static inline int
>  mi_set_context(struct intel_ring_buffer *ring,
>  	       struct i915_hw_context *new_context,
> @@ -602,7 +654,10 @@ mi_set_context(struct intel_ring_buffer *ring,
>  			return ret;
>  	}
>  
> -	ret = intel_ring_begin(ring, 6);
> +	if (IS_VALLEYVIEW(ring->dev))
> +		ret = intel_ring_begin(ring, 6+4+8);
> +	else
> +		ret = intel_ring_begin(ring, 6);
>  	if (ret)
>  		return ret;
>  
> @@ -626,7 +681,13 @@ mi_set_context(struct intel_ring_buffer *ring,
>  	intel_ring_emit(ring, MI_NOOP);
>  
>  	if (IS_GEN7(ring->dev))
> -		intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_ENABLE);
> +		if (IS_VALLEYVIEW(ring->dev)) {
> +			/* FIXME, should also apply to ivb */
> +			mi_set_context_dummy3d_prim_wa(ring);
> +			intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_ENABLE);
> +			intel_ring_emit(ring, MI_NOOP);
> +		} else
> +			intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_ENABLE);
>  	else
>  		intel_ring_emit(ring, MI_NOOP);
>  
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index adcb9c7..fd25e8e4 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -348,6 +348,9 @@
>  #define   PIPE_CONTROL_DEPTH_CACHE_FLUSH		(1<<0)
>  #define   PIPE_CONTROL_GLOBAL_GTT (1<<2) /* in addr dword */
>  
> +#define GFX_OP_3DPRIMITIVE()              \
> +	((0x3<<29)|(0x3<<27)|(0x3<<24)|       \
> +	 (0x0<<16)|(0x0<<10)|(0x0<<8)|(7-2))

Please emulate the style we use else where for splitting this up into
client, opcode, subop, len etc.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/6] drm/i915/vlv: Modified Implementation of WaDisableL3Bank2xClockGate
  2014-03-24  6:49 ` [PATCH 6/6] drm/i915/vlv: Modified Implementation of WaDisableL3Bank2xClockGate sourab.gupta
@ 2014-03-24  9:41   ` Chris Wilson
  0 siblings, 0 replies; 28+ messages in thread
From: Chris Wilson @ 2014-03-24  9:41 UTC (permalink / raw)
  To: sourab.gupta; +Cc: Akash Goel, intel-gfx

On Mon, Mar 24, 2014 at 12:19:24PM +0530, sourab.gupta@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> For VLV, disabling L3 clock gating- MMIO 940c[25] = 1
> 
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_pm.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 6c04b79..fbfdca7 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -5096,8 +5096,11 @@ static void valleyview_init_clock_gating(struct drm_device *dev)
>  	I915_WRITE(GEN6_UCGCTL2,
>  		   GEN6_RCZUNIT_CLOCK_GATE_DISABLE);
>  
> -	/* WaDisableL3Bank2xClockGate:vlv */
> -	I915_WRITE(GEN7_UCGCTL4, GEN7_L3BANK2X_CLOCK_GATE_DISABLE);
> +	/* WaDisableL3Bank2xClockGate:vlv
> +	 * Disabling L3 clock gating- MMIO 940c[25] = 1
> +	 * Set bit 25, to disable L3_BANK_2x_CLK_GATING */

Who's the intended audience for this comment? It doesn't provide the
reader with any more information than the following line. And this patch
only converts the write into a rmw, so the entire patch is misleading.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/6] Rendering Specific HW Workarounds for VLV
  2014-03-24  9:35 ` [PATCH 0/6] Rendering Specific HW Workarounds for VLV Daniel Vetter
@ 2014-03-24 11:05   ` Gupta, Sourab
  0 siblings, 0 replies; 28+ messages in thread
From: Gupta, Sourab @ 2014-03-24 11:05 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Mon, 2014-03-24 at 09:35 +0000, Daniel Vetter wrote:
> On Mon, Mar 24, 2014 at 12:19:18PM +0530, sourab.gupta@intel.com wrote:
> > From: Sourab Gupta <sourab.gupta@intel.com>
> > 
> > This patch series adds rendering specific HW workarounds for VLV platform.
> > These patches leads to stable behavior on VLV, especially
> > when playing 3D Apps, benchmarks.
> > 
> > Though, the patch set was submitter earlier, this new patch set is initiating
> > a clean thread. We have addressed the comments on earlier patches and given
> > the latest version of these patches in a single thread.
> > 
> > Akash Goel (6):
> >   drm/i915/vlv: Added a rendering specific Hw WA
> >     'WaTlbInvalidateStoreDataBefore'
> >   drm/i915/vlv: Added a rendering specific Hw WA
> >     'WaSendDummy3dPrimitveAfterSetContext'
> >   drm/i915: Enabling the TLB invalidate bit in GFX Mode register
> >   drm/i915/vlv: Remove the enabling of VS_TIMER_DISPATCH bit in MI MODE
> >     reg
> >   drm/i915/vlv:Implement WaDisable_RenderCache_OperationalFlush
> >   drm/i915/vlv: Modified Implementation of WaDisableL3Bank2xClockGate
> 
> Thanks for resending. But your patches are missing the in-patch changelogs
> completely, which makes reviewing them hard - I have no idea what was
> discussed already and why we've reached certain conclusions. Can you
> please dig out the changelogs from older resends (I remember v2/v3/v4 of
> some of your patches) and add this again?
> 
> Also you missed e.g. Chris' tested-by on the tlb invalidation
> optimization. And the commit message fails to state that you've only
> tested this on vlv, but Chris has done the required testing on ivb/hsw.
> 
> Such details are important in case the patch breaks something in the
> future.
> 
> One comment on one patch before I've noticed this, otherwise I think it's
> better to wait for the resend.
> 
> Thanks.

Hi Daniel,
Sorry for missing out on the in-patch changelogs. I'll add them to the
patches and resend the set again.

Regards,
Sourab

> > 
> >  drivers/gpu/drm/i915/i915_gem_context.c | 65 ++++++++++++++++++++++++++++++++-
> >  drivers/gpu/drm/i915/i915_reg.h         |  6 +++
> >  drivers/gpu/drm/i915/intel_pm.c         | 13 ++++++-
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 38 ++++++++++++++++++-
> >  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
> >  5 files changed, 117 insertions(+), 6 deletions(-)
> > 
> > -- 
> > 1.8.5.1
> > 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/6] drm/i915/vlv: Added a rendering specific Hw WA 'WaSendDummy3dPrimitveAfterSetContext'
  2014-03-24  9:31   ` Daniel Vetter
@ 2014-03-24 11:11     ` Gupta, Sourab
  0 siblings, 0 replies; 28+ messages in thread
From: Gupta, Sourab @ 2014-03-24 11:11 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, Goel, Akash

On Mon, 2014-03-24 at 09:31 +0000, Daniel Vetter wrote:
> On Mon, Mar 24, 2014 at 12:19:20PM +0530, sourab.gupta@intel.com wrote:
> > From: Akash Goel <akash.goel@intel.com>
> > 
> > This workaround is needed on VLV for the HW context feature.
> > It is used after adding the mi_set_context command in ring buffer
> > for Hw context switch. As per the spec
> > "The software must send a pipe_control with a CS stall and a post sync
> > operation and then a dummy DRAW after every MI_SET_CONTEXT and after any
> > PIPELINE_SELECT that is enabling 3D mode".
> > 
> > Signed-off-by: Akash Goel <akash.goel@intel.com>
> > Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem_context.c | 65 ++++++++++++++++++++++++++++++++-
> >  drivers/gpu/drm/i915/i915_reg.h         |  3 ++
> >  drivers/gpu/drm/i915/intel_ringbuffer.c |  9 +++++
> >  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
> >  4 files changed, 76 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> > index 6043062..544fc2d 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -584,6 +584,58 @@ i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id)
> >  	return ctx;
> >  }
> >  
> > +static inline void
> > +mi_set_context_dummy3d_prim_wa(struct intel_ring_buffer *ring)
> > +{
> > +	u32 scratch_addr;
> > +	u32 flags = 0;
> > +
> > +	/*
> > +	 * Check if we have the scratch page allocated needed
> > +	 * for the Pipe Control command, otherwise don't apply
> > +	 * the dummmy 3d primitive workaround & add NOOPs instead
> > +	 */
> > +	if (get_pipe_control_scratch_addr(ring)) {
> 
> This check here looks like it papers over a bug (or at least rather
> serious init ordering confusion). Can you please rework this to avoid this
> condition?

Thanks, I'll rework this in the new set.
> 
> > +		/* Actual scratch location is at 128 bytes offset */
> > +		scratch_addr = get_pipe_control_scratch_addr(ring) + 128;
> > +
> > +		/*
> > +		 * WaSendDummy3dPrimitveAfterSetContext:vlv
> > +		 * Software must send a pipe_control with a CS stall
> > +		 * and a post sync operation and then a dummy DRAW after
> > +		 * every MI_SET_CONTEXT and after any PIPELINE_SELECT that
> > +		 * is enabling 3D mode. A dummy draw is a 3DPRIMITIVE command
> > +		 * with Indirect Parameter Enable set to 0, UAV Coherency
> > +		 * Required set to 0, Predicate Enable set to 0,
> > +		 * End Offset Enable set to 0, and Vertex Count Per Instance
> > +		 * set to 0, All other parameters are a don't care.
> > +		 */
> > +
> > +		/*
> > +		 * Add a pipe control with CS Stall and postsync op
> > +		 * before dummy 3D_PRIMITIVE
> > +		 */
> > +		flags |= PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_CS_STALL;
> > +		intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4));
> > +		intel_ring_emit(ring, flags);
> > +		intel_ring_emit(ring, scratch_addr | PIPE_CONTROL_GLOBAL_GTT);
> > +		intel_ring_emit(ring, 0);
> > +
> > +		/* Add a dummy 3D_PRIMITVE */
> > +		intel_ring_emit(ring, GFX_OP_3DPRIMITIVE());
> > +		intel_ring_emit(ring, 4); /* PrimTopoType*/
> > +		intel_ring_emit(ring, 0); /* VertexCountPerInstance */
> > +		intel_ring_emit(ring, 0); /* StartVertexLocation */
> > +		intel_ring_emit(ring, 0); /* InstanceCount */
> > +		intel_ring_emit(ring, 0); /* StartInstanceLocation */
> > +		intel_ring_emit(ring, 0); /* BaseVertexLocation  */
> 
> Is this save even in a pristine context when the gpu comes right out of
> reset? E.g. on resume or after gpu reset. We've had tons of fun in this
> area, see e.g. also Ben's latest bdw patch.
> -Daniel

In vlv, this is employed even when the gpu comes out of reset and it is
working fine there. We'll refer to the Ben's bdw patch also.
Regards,
Sourab
> 
> > +	} else {
> > +		int i;
> > +		for (i = 0; i < 11; i++)
> > +			intel_ring_emit(ring, MI_NOOP);
> > +	}
> > +}
> > +
> >  static inline int
> >  mi_set_context(struct intel_ring_buffer *ring,
> >  	       struct i915_hw_context *new_context,
> > @@ -602,7 +654,10 @@ mi_set_context(struct intel_ring_buffer *ring,
> >  			return ret;
> >  	}
> >  
> > -	ret = intel_ring_begin(ring, 6);
> > +	if (IS_VALLEYVIEW(ring->dev))
> > +		ret = intel_ring_begin(ring, 6+4+8);
> > +	else
> > +		ret = intel_ring_begin(ring, 6);
> >  	if (ret)
> >  		return ret;
> >  
> > @@ -626,7 +681,13 @@ mi_set_context(struct intel_ring_buffer *ring,
> >  	intel_ring_emit(ring, MI_NOOP);
> >  
> >  	if (IS_GEN7(ring->dev))
> > -		intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_ENABLE);
> > +		if (IS_VALLEYVIEW(ring->dev)) {
> > +			/* FIXME, should also apply to ivb */
> > +			mi_set_context_dummy3d_prim_wa(ring);
> > +			intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_ENABLE);
> > +			intel_ring_emit(ring, MI_NOOP);
> > +		} else
> > +			intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_ENABLE);
> >  	else
> >  		intel_ring_emit(ring, MI_NOOP);
> >  
> > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> > index adcb9c7..fd25e8e4 100644
> > --- a/drivers/gpu/drm/i915/i915_reg.h
> > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > @@ -348,6 +348,9 @@
> >  #define   PIPE_CONTROL_DEPTH_CACHE_FLUSH		(1<<0)
> >  #define   PIPE_CONTROL_GLOBAL_GTT (1<<2) /* in addr dword */
> >  
> > +#define GFX_OP_3DPRIMITIVE()              \
> > +	((0x3<<29)|(0x3<<27)|(0x3<<24)|       \
> > +	 (0x0<<16)|(0x0<<10)|(0x0<<8)|(7-2))
> >  
> >  /*
> >   * Reset registers
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 2812384..98ddfec 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -560,6 +560,15 @@ err:
> >  	return ret;
> >  }
> >  
> > +u32
> > +get_pipe_control_scratch_addr(struct intel_ring_buffer *ring)
> > +{
> > +	if (ring->scratch.obj == NULL)
> > +		return 0;
> > +
> > +	return ring->scratch.gtt_offset;
> > +}
> > +
> >  static int init_render_ring(struct intel_ring_buffer *ring)
> >  {
> >  	struct drm_device *dev = ring->dev;
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > index f11ceb2..640c56d 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > @@ -294,6 +294,7 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev);
> >  
> >  u32 intel_ring_get_active_head(struct intel_ring_buffer *ring);
> >  void intel_ring_setup_status_page(struct intel_ring_buffer *ring);
> > +u32 get_pipe_control_scratch_addr(struct intel_ring_buffer *ring);
> >  
> >  static inline u32 intel_ring_get_tail(struct intel_ring_buffer *ring)
> >  {
> > -- 
> > 1.8.5.1
> > 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/6] drm/i915/vlv: Added a rendering specific Hw WA 'WaTlbInvalidateStoreDataBefore'
  2014-03-24  9:32   ` Chris Wilson
@ 2014-03-24 11:20     ` Gupta, Sourab
  2014-03-24 18:32       ` Ville Syrjälä
  0 siblings, 1 reply; 28+ messages in thread
From: Gupta, Sourab @ 2014-03-24 11:20 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Goel, Akash, intel-gfx

On Mon, 2014-03-24 at 09:32 +0000, Chris Wilson wrote:
> On Mon, Mar 24, 2014 at 12:19:19PM +0530, sourab.gupta@intel.com wrote:
> > From: Akash Goel <akash.goel@intel.com>
> > 
> > Added a new rendering specific Workaround 'WaTlbInvalidateStoreDataBefore'.
> > In this WA, before pipecontrol with TLB invalidate set, need to add 2 MI
> > Store data commands.
> > 
> > Signed-off-by: Akash Goel <akash.goel@intel.com>
> > Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 22 ++++++++++++++++++++++
> >  1 file changed, 22 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 87d1a2d..2812384 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -2207,6 +2207,28 @@ intel_ring_invalidate_all_caches(struct intel_ring_buffer *ring)
> >  	uint32_t flush_domains;
> >  	int ret;
> >  
> > +	if (IS_VALLEYVIEW(ring->dev)) {
> The ring flushes are vfuncs, so why is this here and not in a special
> vlv ring flush?

Yes, we can as well put it in the platform specific vlv flush. Since we
apply this WA only for invalidate_all_caches function, we have to
differentiate in the vlv flush function regarding where the flush
originated from. For this we plan to check the 'invalidate_domains'
field of flush function. (This field will be non-zero in case the call
originated from invalidate_all_caches function). So, we'll have a
vlv_render_ring_flush something like this:
	if(invalidate_domains)
		apply_our_wa;
	gen7_render_ring_flush();

Does this look okay?

Regards,
Sourab

> 
> > +		/*
> > +		 * WaTlbInvalidateStoreDataBefore:vlv
> > +		 * Before pipecontrol with TLB invalidate set, need 2 store
> > +		 * data commands (such as MI_STORE_DATA_IMM or MI_STORE_DATA_INDEX)
> > +		 * Without this, hardware cannot guarantee the command after the
> > +		 * PIPE_CONTROL with TLB inv will not use the old TLB values.
> 
> Crumbs, it sounds like our i-g-t are not sensitive enough. This bug
> crops up in many disguises over the years, do you have any suggestion on
> how we can improve our tests?
> 
We'll think of how to capture the scenario in the i-g-t testcases and
come back with suggestions.

> > +		 */
> > +		int i;
> > +		ret = intel_ring_begin(ring, 4 * 2);
> 
> This can be we written to use 6 dwords.
> 
Agreed. We'll have this in our next version
> > +		if (ret)
> > +			return ret;
> > +		for (i = 0; i < 2; i++) {
> > +			intel_ring_emit(ring, MI_STORE_DWORD_INDEX);
> > +			intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_INDEX <<
> > +						MI_STORE_DWORD_INDEX_SHIFT);
> 
> This is I915_GEM_HWS_SCRATCH_ADDR

Agreed. We'll have this in our next version

> 
> > +			intel_ring_emit(ring, 0);
> > +			intel_ring_emit(ring, MI_NOOP);
> > +		}
> > +		intel_ring_advance(ring);
> > +	}
> > +
> >  	flush_domains = 0;
> >  	if (ring->gpu_caches_dirty)
> >  		flush_domains = I915_GEM_GPU_DOMAINS;
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/6] drm/i915/vlv: Added a rendering specific Hw WA 'WaTlbInvalidateStoreDataBefore'
  2014-03-24 11:20     ` Gupta, Sourab
@ 2014-03-24 18:32       ` Ville Syrjälä
  2014-03-24 18:47         ` Chris Wilson
  0 siblings, 1 reply; 28+ messages in thread
From: Ville Syrjälä @ 2014-03-24 18:32 UTC (permalink / raw)
  To: Gupta, Sourab; +Cc: intel-gfx, Goel, Akash

On Mon, Mar 24, 2014 at 11:20:40AM +0000, Gupta, Sourab wrote:
> On Mon, 2014-03-24 at 09:32 +0000, Chris Wilson wrote:
> > On Mon, Mar 24, 2014 at 12:19:19PM +0530, sourab.gupta@intel.com wrote:
> > > From: Akash Goel <akash.goel@intel.com>
> > > 
> > > Added a new rendering specific Workaround 'WaTlbInvalidateStoreDataBefore'.
> > > In this WA, before pipecontrol with TLB invalidate set, need to add 2 MI
> > > Store data commands.
> > > 
> > > Signed-off-by: Akash Goel <akash.goel@intel.com>
> > > Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/intel_ringbuffer.c | 22 ++++++++++++++++++++++
> > >  1 file changed, 22 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > index 87d1a2d..2812384 100644
> > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > @@ -2207,6 +2207,28 @@ intel_ring_invalidate_all_caches(struct intel_ring_buffer *ring)
> > >  	uint32_t flush_domains;
> > >  	int ret;
> > >  
> > > +	if (IS_VALLEYVIEW(ring->dev)) {
> > The ring flushes are vfuncs, so why is this here and not in a special
> > vlv ring flush?
> 
> Yes, we can as well put it in the platform specific vlv flush. Since we
> apply this WA only for invalidate_all_caches function, we have to
> differentiate in the vlv flush function regarding where the flush
> originated from. For this we plan to check the 'invalidate_domains'
> field of flush function. (This field will be non-zero in case the call
> originated from invalidate_all_caches function). So, we'll have a
> vlv_render_ring_flush something like this:
> 	if(invalidate_domains)
> 		apply_our_wa;
> 	gen7_render_ring_flush();
> 
> Does this look okay?

Since we supposdely need this for all gen6/gen7, I'd just add a new func
(eg. gen6_tlb_invalidate_wa()) and call that from gen6_render_ring_flush(),
gen7_render_ring_flush(), gen6_bsd_ring_flush() and gen6_ring_flush().

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/6] drm/i915/vlv: Added a rendering specific Hw WA 'WaTlbInvalidateStoreDataBefore'
  2014-03-24 18:32       ` Ville Syrjälä
@ 2014-03-24 18:47         ` Chris Wilson
  2014-03-25  5:17           ` Gupta, Sourab
  2014-03-25  8:31           ` [PATCH v5] " sourab.gupta
  0 siblings, 2 replies; 28+ messages in thread
From: Chris Wilson @ 2014-03-24 18:47 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: Goel, Akash, Gupta, Sourab, intel-gfx

On Mon, Mar 24, 2014 at 08:32:30PM +0200, Ville Syrjälä wrote:
> On Mon, Mar 24, 2014 at 11:20:40AM +0000, Gupta, Sourab wrote:
> > On Mon, 2014-03-24 at 09:32 +0000, Chris Wilson wrote:
> > > On Mon, Mar 24, 2014 at 12:19:19PM +0530, sourab.gupta@intel.com wrote:
> > > > From: Akash Goel <akash.goel@intel.com>
> > > > 
> > > > Added a new rendering specific Workaround 'WaTlbInvalidateStoreDataBefore'.
> > > > In this WA, before pipecontrol with TLB invalidate set, need to add 2 MI
> > > > Store data commands.
> > > > 
> > > > Signed-off-by: Akash Goel <akash.goel@intel.com>
> > > > Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/intel_ringbuffer.c | 22 ++++++++++++++++++++++
> > > >  1 file changed, 22 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > index 87d1a2d..2812384 100644
> > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > @@ -2207,6 +2207,28 @@ intel_ring_invalidate_all_caches(struct intel_ring_buffer *ring)
> > > >  	uint32_t flush_domains;
> > > >  	int ret;
> > > >  
> > > > +	if (IS_VALLEYVIEW(ring->dev)) {
> > > The ring flushes are vfuncs, so why is this here and not in a special
> > > vlv ring flush?
> > 
> > Yes, we can as well put it in the platform specific vlv flush. Since we
> > apply this WA only for invalidate_all_caches function, we have to
> > differentiate in the vlv flush function regarding where the flush
> > originated from. For this we plan to check the 'invalidate_domains'
> > field of flush function. (This field will be non-zero in case the call
> > originated from invalidate_all_caches function). So, we'll have a
> > vlv_render_ring_flush something like this:
> > 	if(invalidate_domains)
> > 		apply_our_wa;
> > 	gen7_render_ring_flush();
> > 
> > Does this look okay?
> 
> Since we supposdely need this for all gen6/gen7, I'd just add a new func
> (eg. gen6_tlb_invalidate_wa()) and call that from gen6_render_ring_flush(),
> gen7_render_ring_flush(), gen6_bsd_ring_flush() and gen6_ring_flush().

Now, I am extremely curious as to what the exact bug symptoms are. We
seem to have an absence of bug reports since SNB regarding random
corruption.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/6] drm/i915/vlv: Added a rendering specific Hw WA 'WaTlbInvalidateStoreDataBefore'
  2014-03-24 18:47         ` Chris Wilson
@ 2014-03-25  5:17           ` Gupta, Sourab
  2014-03-25  8:31           ` [PATCH v5] " sourab.gupta
  1 sibling, 0 replies; 28+ messages in thread
From: Gupta, Sourab @ 2014-03-25  5:17 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Goel, Akash, intel-gfx

On Mon, 2014-03-24 at 18:47 +0000, Chris Wilson wrote:
> On Mon, Mar 24, 2014 at 08:32:30PM +0200, Ville Syrjälä wrote:
> > On Mon, Mar 24, 2014 at 11:20:40AM +0000, Gupta, Sourab wrote:
> > > On Mon, 2014-03-24 at 09:32 +0000, Chris Wilson wrote:
> > > > On Mon, Mar 24, 2014 at 12:19:19PM +0530, sourab.gupta@intel.com wrote:
> > > > > From: Akash Goel <akash.goel@intel.com>
> > > > > 
> > > > > Added a new rendering specific Workaround 'WaTlbInvalidateStoreDataBefore'.
> > > > > In this WA, before pipecontrol with TLB invalidate set, need to add 2 MI
> > > > > Store data commands.
> > > > > 
> > > > > Signed-off-by: Akash Goel <akash.goel@intel.com>
> > > > > Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> > > > > ---
> > > > >  drivers/gpu/drm/i915/intel_ringbuffer.c | 22 ++++++++++++++++++++++
> > > > >  1 file changed, 22 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > index 87d1a2d..2812384 100644
> > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > > > @@ -2207,6 +2207,28 @@ intel_ring_invalidate_all_caches(struct intel_ring_buffer *ring)
> > > > >  	uint32_t flush_domains;
> > > > >  	int ret;
> > > > >  
> > > > > +	if (IS_VALLEYVIEW(ring->dev)) {
> > > > The ring flushes are vfuncs, so why is this here and not in a special
> > > > vlv ring flush?
> > > 
> > > Yes, we can as well put it in the platform specific vlv flush. Since we
> > > apply this WA only for invalidate_all_caches function, we have to
> > > differentiate in the vlv flush function regarding where the flush
> > > originated from. For this we plan to check the 'invalidate_domains'
> > > field of flush function. (This field will be non-zero in case the call
> > > originated from invalidate_all_caches function). So, we'll have a
> > > vlv_render_ring_flush something like this:
> > > 	if(invalidate_domains)
> > > 		apply_our_wa;
> > > 	gen7_render_ring_flush();
> > > 
> > > Does this look okay?
> > 
> > Since we supposdely need this for all gen6/gen7, I'd just add a new func
> > (eg. gen6_tlb_invalidate_wa()) and call that from gen6_render_ring_flush(),
> > gen7_render_ring_flush(), gen6_bsd_ring_flush() and gen6_ring_flush().
> 
> Now, I am extremely curious as to what the exact bug symptoms are. We
> seem to have an absence of bug reports since SNB regarding random
> corruption.
> -Chris
> 
Hi Chris,
We had applied this WA in a preemptive way, as this was amongst the
recommended list of WA's applicable. So, we don't have any specific bug
characteristics which is prevented by this particular WA.
Generally, we have been very cautious wrt WA's as we have seen hangs
(which may be very difficult to debug in wild), if we miss out on some
WA's recommended.
Regards,
Sourab

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5] drm/i915/vlv: Added a rendering specific Hw WA 'WaTlbInvalidateStoreDataBefore'
  2014-03-24 18:47         ` Chris Wilson
  2014-03-25  5:17           ` Gupta, Sourab
@ 2014-03-25  8:31           ` sourab.gupta
  2014-03-25  9:15             ` Chris Wilson
  1 sibling, 1 reply; 28+ messages in thread
From: sourab.gupta @ 2014-03-25  8:31 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Akash Goel, Sourab Gupta

From: Akash Goel <akash.goel@intel.com>

Added a new rendering specific Workaround 'WaTlbInvalidateStoreDataBefore'.
This workaround has to be applied before doing TLB Invalidation.
In this WA, before pipecontrol with TLB invalidate set, need to add 2 MI
Store data commands.
Without this, hardware cannot guarantee the command after the PIPE_CONTROL
with TLB inv will not use the old TLB values.

v2: Modified the WA comment (Ville)

v3: Added the vlv identifier with WA name (Damien)

v4: Reworked based on Chris' comments (WA moved to gen7 ring flush func,
sending 6 dwords instead of 8) (Chris)

v5: Enhancing the scope of WA to gen6, gen7. Having a common WA func being
called from gen6, gen7 flush functions. (Ville)

Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 52 +++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 87d1a2d..ef4ca3dd 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -208,6 +208,30 @@ intel_emit_post_sync_nonzero_flush(struct intel_ring_buffer *ring)
 }
 
 static int
+gen6_tlb_invalidate_wa(struct intel_ring_buffer *ring)
+{
+	/*
+	 * WaTlbInvalidateStoreDataBefore:gen6,gen7
+	 * This workaround has to be applied before doing TLB invalidation.
+	 * Before pipecontrol with TLB invalidate set, need 2 store
+	 * data commands (such as MI_STORE_DATA_IMM or MI_STORE_DATA_INDEX)
+	 * Without this, hardware cannot guarantee the command after the
+	 * PIPE_CONTROL with TLB inv will not use the old TLB values.
+	 */
+	int i, ret;
+	ret = intel_ring_begin(ring, 3 * 2);
+	if (ret)
+		return ret;
+	for (i = 0; i < 2; i++) {
+		intel_ring_emit(ring, MI_STORE_DWORD_INDEX);
+		intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR);
+		intel_ring_emit(ring, 0);
+	}
+	intel_ring_advance(ring);
+	return 0;
+}
+
+static int
 gen6_render_ring_flush(struct intel_ring_buffer *ring,
                          u32 invalidate_domains, u32 flush_domains)
 {
@@ -215,6 +239,13 @@ gen6_render_ring_flush(struct intel_ring_buffer *ring,
 	u32 scratch_addr = ring->scratch.gtt_offset + 128;
 	int ret;
 
+	/* Apply WaTlbInvalidateStoreDataBefore workaround */
+	if (invalidate_domains) {
+		ret = gen6_tlb_invalidate_wa(ring);
+		if (ret)
+			return ret;
+	}
+
 	/* Force SNB workarounds for PIPE_CONTROL flushes */
 	ret = intel_emit_post_sync_nonzero_flush(ring);
 	if (ret)
@@ -309,6 +340,13 @@ gen7_render_ring_flush(struct intel_ring_buffer *ring,
 	u32 scratch_addr = ring->scratch.gtt_offset + 128;
 	int ret;
 
+	/* Apply WaTlbInvalidateStoreDataBefore workaround */
+	if (invalidate_domains) {
+		ret = gen6_tlb_invalidate_wa(ring);
+		if (ret)
+			return ret;
+	}
+
 	/*
 	 * Ensure that any following seqno writes only happen when the render
 	 * cache is indeed flushed.
@@ -1733,6 +1771,13 @@ static int gen6_bsd_ring_flush(struct intel_ring_buffer *ring,
 	uint32_t cmd;
 	int ret;
 
+	/* Apply WaTlbInvalidateStoreDataBefore workaround */
+	if (invalidate) {
+		ret = gen6_tlb_invalidate_wa(ring);
+		if (ret)
+			return ret;
+	}
+
 	ret = intel_ring_begin(ring, 4);
 	if (ret)
 		return ret;
@@ -1837,6 +1882,13 @@ static int gen6_ring_flush(struct intel_ring_buffer *ring,
 	uint32_t cmd;
 	int ret;
 
+	/* Apply WaTlbInvalidateStoreDataBefore workaround */
+	if (invalidate) {
+		ret = gen6_tlb_invalidate_wa(ring);
+		if (ret)
+			return ret;
+	}
+
 	ret = intel_ring_begin(ring, 4);
 	if (ret)
 		return ret;
-- 
1.8.5.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v5] drm/i915/vlv: Added a rendering specific Hw WA 'WaTlbInvalidateStoreDataBefore'
  2014-03-25  8:31           ` [PATCH v5] " sourab.gupta
@ 2014-03-25  9:15             ` Chris Wilson
  2014-03-25  9:39               ` Gupta, Sourab
  2014-03-25  9:53               ` [PATCH v6] " sourab.gupta
  0 siblings, 2 replies; 28+ messages in thread
From: Chris Wilson @ 2014-03-25  9:15 UTC (permalink / raw)
  To: sourab.gupta; +Cc: Daniel Vetter, intel-gfx, Akash Goel

On Tue, Mar 25, 2014 at 02:01:05PM +0530, sourab.gupta@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> Added a new rendering specific Workaround 'WaTlbInvalidateStoreDataBefore'.
> This workaround has to be applied before doing TLB Invalidation.
> In this WA, before pipecontrol with TLB invalidate set, need to add 2 MI
> Store data commands.
> Without this, hardware cannot guarantee the command after the PIPE_CONTROL
> with TLB inv will not use the old TLB values.
> 
> v2: Modified the WA comment (Ville)
> 
> v3: Added the vlv identifier with WA name (Damien)
> 
> v4: Reworked based on Chris' comments (WA moved to gen7 ring flush func,
> sending 6 dwords instead of 8) (Chris)
> 
> v5: Enhancing the scope of WA to gen6, gen7. Having a common WA func being
> called from gen6, gen7 flush functions. (Ville)
> 
> @@ -1733,6 +1771,13 @@ static int gen6_bsd_ring_flush(struct intel_ring_buffer *ring,
>  	uint32_t cmd;
>  	int ret;
>  
> +	/* Apply WaTlbInvalidateStoreDataBefore workaround */
> +	if (invalidate) {
> +		ret = gen6_tlb_invalidate_wa(ring);
> +		if (ret)
> +			return ret;
> +	}

BSD uses MI_FLUSH_DW. Does this w/a still apply? Do we need it for BLT
as well? VEBOX?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5] drm/i915/vlv: Added a rendering specific Hw WA 'WaTlbInvalidateStoreDataBefore'
  2014-03-25  9:15             ` Chris Wilson
@ 2014-03-25  9:39               ` Gupta, Sourab
  2014-03-25  9:53               ` [PATCH v6] " sourab.gupta
  1 sibling, 0 replies; 28+ messages in thread
From: Gupta, Sourab @ 2014-03-25  9:39 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx, Goel, Akash

On Tue, 2014-03-25 at 09:15 +0000, Chris Wilson wrote:
> On Tue, Mar 25, 2014 at 02:01:05PM +0530, sourab.gupta@intel.com wrote:
> > From: Akash Goel <akash.goel@intel.com>
> > 
> > Added a new rendering specific Workaround 'WaTlbInvalidateStoreDataBefore'.
> > This workaround has to be applied before doing TLB Invalidation.
> > In this WA, before pipecontrol with TLB invalidate set, need to add 2 MI
> > Store data commands.
> > Without this, hardware cannot guarantee the command after the PIPE_CONTROL
> > with TLB inv will not use the old TLB values.
> > 
> > v2: Modified the WA comment (Ville)
> > 
> > v3: Added the vlv identifier with WA name (Damien)
> > 
> > v4: Reworked based on Chris' comments (WA moved to gen7 ring flush func,
> > sending 6 dwords instead of 8) (Chris)
> > 
> > v5: Enhancing the scope of WA to gen6, gen7. Having a common WA func being
> > called from gen6, gen7 flush functions. (Ville)
> > 
> > @@ -1733,6 +1771,13 @@ static int gen6_bsd_ring_flush(struct intel_ring_buffer *ring,
> >  	uint32_t cmd;
> >  	int ret;
> >  
> > +	/* Apply WaTlbInvalidateStoreDataBefore workaround */
> > +	if (invalidate) {
> > +		ret = gen6_tlb_invalidate_wa(ring);
> > +		if (ret)
> > +			return ret;
> > +	}
> 
> BSD uses MI_FLUSH_DW. Does this w/a still apply? Do we need it for BLT
> as well? VEBOX?
> -Chris
> 
This should be applicable only to the render ring funcs, not the bsd,
blt, vebox ones, as you called out. PIPE_CONTROL is used only in render
ring, others use MI_FLUSH. My bad for missing this out.
Thanks,
Sourab

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v6] drm/i915/vlv: Added a rendering specific Hw WA 'WaTlbInvalidateStoreDataBefore'
  2014-03-25  9:15             ` Chris Wilson
  2014-03-25  9:39               ` Gupta, Sourab
@ 2014-03-25  9:53               ` sourab.gupta
  2014-03-25 10:59                 ` Chris Wilson
  1 sibling, 1 reply; 28+ messages in thread
From: sourab.gupta @ 2014-03-25  9:53 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Akash Goel, Sourab Gupta

From: Akash Goel <akash.goel@intel.com>

Added a new rendering specific Workaround 'WaTlbInvalidateStoreDataBefore'.
This workaround has to be applied before doing TLB Invalidation on render ring.
In this WA, before pipecontrol with TLB invalidate set, need to add 2 MI
Store data commands.
Without this, hardware cannot guarantee the command after the PIPE_CONTROL
with TLB inv will not use the old TLB values.

v2: Modified the WA comment (Ville)

v3: Added the vlv identifier with WA name (Damien)

v4: Reworked based on Chris' comments (WA moved to gen7 ring flush func,
sending 6 dwords instead of 8) (Chris)

v5: Enhancing the scope of WA to gen6, gen7. Having a common WA func being
called from gen6, gen7 flush functions. (Ville)

v6: WA is applicable only to render ring, earlier put for all rings in v5.
(Chris)

Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 39 +++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 87d1a2d..816137f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -208,6 +208,31 @@ intel_emit_post_sync_nonzero_flush(struct intel_ring_buffer *ring)
 }
 
 static int
+gen6_tlb_invalidate_wa(struct intel_ring_buffer *ring)
+{
+	/*
+	 * WaTlbInvalidateStoreDataBefore:gen6,gen7
+	 * This workaround has to be applied before doing TLB invalidation
+	 * on the render ring. Before pipecontrol with TLB invalidate set,
+	 * need 2 store data commands (such as MI_STORE_DATA_IMM or
+	 * MI_STORE_DATA_INDEX). Without this, hardware cannot guarantee
+	 * the command after the PIPE_CONTROL with TLB inv will not use
+	 * the old TLB values.
+	 */
+	int i, ret;
+	ret = intel_ring_begin(ring, 3 * 2);
+	if (ret)
+		return ret;
+	for (i = 0; i < 2; i++) {
+		intel_ring_emit(ring, MI_STORE_DWORD_INDEX);
+		intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR);
+		intel_ring_emit(ring, 0);
+	}
+	intel_ring_advance(ring);
+	return 0;
+}
+
+static int
 gen6_render_ring_flush(struct intel_ring_buffer *ring,
                          u32 invalidate_domains, u32 flush_domains)
 {
@@ -215,6 +240,13 @@ gen6_render_ring_flush(struct intel_ring_buffer *ring,
 	u32 scratch_addr = ring->scratch.gtt_offset + 128;
 	int ret;
 
+	/* Apply WaTlbInvalidateStoreDataBefore workaround */
+	if (invalidate_domains) {
+		ret = gen6_tlb_invalidate_wa(ring);
+		if (ret)
+			return ret;
+	}
+
 	/* Force SNB workarounds for PIPE_CONTROL flushes */
 	ret = intel_emit_post_sync_nonzero_flush(ring);
 	if (ret)
@@ -309,6 +341,13 @@ gen7_render_ring_flush(struct intel_ring_buffer *ring,
 	u32 scratch_addr = ring->scratch.gtt_offset + 128;
 	int ret;
 
+	/* Apply WaTlbInvalidateStoreDataBefore workaround */
+	if (invalidate_domains) {
+		ret = gen6_tlb_invalidate_wa(ring);
+		if (ret)
+			return ret;
+	}
+
 	/*
 	 * Ensure that any following seqno writes only happen when the render
 	 * cache is indeed flushed.
-- 
1.8.5.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v6] drm/i915/vlv: Added a rendering specific Hw WA 'WaTlbInvalidateStoreDataBefore'
  2014-03-25  9:53               ` [PATCH v6] " sourab.gupta
@ 2014-03-25 10:59                 ` Chris Wilson
  2014-03-26  5:14                   ` Gupta, Sourab
  0 siblings, 1 reply; 28+ messages in thread
From: Chris Wilson @ 2014-03-25 10:59 UTC (permalink / raw)
  To: sourab.gupta; +Cc: Daniel Vetter, intel-gfx, Akash Goel

On Tue, Mar 25, 2014 at 03:23:34PM +0530, sourab.gupta@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> Added a new rendering specific Workaround 'WaTlbInvalidateStoreDataBefore'.
> This workaround has to be applied before doing TLB Invalidation on render ring.
> In this WA, before pipecontrol with TLB invalidate set, need to add 2 MI
> Store data commands.
> Without this, hardware cannot guarantee the command after the PIPE_CONTROL
> with TLB inv will not use the old TLB values.

Note, that our command programming sequence already has multiple dword
writes between the flush of the last batch and the invalidation of the
next.

Is this w/a still required? Why?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6] drm/i915/vlv: Added a rendering specific Hw WA 'WaTlbInvalidateStoreDataBefore'
  2014-03-25 10:59                 ` Chris Wilson
@ 2014-03-26  5:14                   ` Gupta, Sourab
  2014-03-26  7:54                     ` Chris Wilson
  0 siblings, 1 reply; 28+ messages in thread
From: Gupta, Sourab @ 2014-03-26  5:14 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx, Goel, Akash

On Tue, 2014-03-25 at 10:59 +0000, Chris Wilson wrote:
> On Tue, Mar 25, 2014 at 03:23:34PM +0530, sourab.gupta@intel.com wrote:
> > From: Akash Goel <akash.goel@intel.com>
> > 
> > Added a new rendering specific Workaround 'WaTlbInvalidateStoreDataBefore'.
> > This workaround has to be applied before doing TLB Invalidation on render ring.
> > In this WA, before pipecontrol with TLB invalidate set, need to add 2 MI
> > Store data commands.
> > Without this, hardware cannot guarantee the command after the PIPE_CONTROL
> > with TLB inv will not use the old TLB values.
> 
> Note, that our command programming sequence already has multiple dword
> writes between the flush of the last batch and the invalidation of the
> next.
> 
> Is this w/a still required? Why?
> -Chris
> 
Hi Chris,
Are you referring to the MI_STORE_DWORD_INDEX commands emitted as a part
of add_request functions?
We couldn't find any other place where we are storing dwords between
flush of last batch and invalidation of next.
In that case, we agree that as a part of command programming sequence,
we'll have one set of store dwords emitted.

But, as per spec, it is required to emit 2 sets of store dwords before
the tlb invalidation.
Also, our motive for having this w/a is just being paranoid, and not
assuming that dwords would already have been emitted before doing tlb
invalidation. So, we try to explicitly ensure the same through our w/a.

Regards,
Sourab

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6] drm/i915/vlv: Added a rendering specific Hw WA 'WaTlbInvalidateStoreDataBefore'
  2014-03-26  5:14                   ` Gupta, Sourab
@ 2014-03-26  7:54                     ` Chris Wilson
  2014-03-26 14:13                       ` Gupta, Sourab
  0 siblings, 1 reply; 28+ messages in thread
From: Chris Wilson @ 2014-03-26  7:54 UTC (permalink / raw)
  To: Gupta, Sourab; +Cc: Daniel Vetter, intel-gfx, Goel, Akash

On Wed, Mar 26, 2014 at 05:14:05AM +0000, Gupta, Sourab wrote:
> On Tue, 2014-03-25 at 10:59 +0000, Chris Wilson wrote:
> > On Tue, Mar 25, 2014 at 03:23:34PM +0530, sourab.gupta@intel.com wrote:
> > > From: Akash Goel <akash.goel@intel.com>
> > > 
> > > Added a new rendering specific Workaround 'WaTlbInvalidateStoreDataBefore'.
> > > This workaround has to be applied before doing TLB Invalidation on render ring.
> > > In this WA, before pipecontrol with TLB invalidate set, need to add 2 MI
> > > Store data commands.
> > > Without this, hardware cannot guarantee the command after the PIPE_CONTROL
> > > with TLB inv will not use the old TLB values.
> > 
> > Note, that our command programming sequence already has multiple dword
> > writes between the flush of the last batch and the invalidation of the
> > next.
> > 
> > Is this w/a still required? Why?
> > -Chris
> > 
> Hi Chris,
> Are you referring to the MI_STORE_DWORD_INDEX commands emitted as a part
> of add_request functions?
> We couldn't find any other place where we are storing dwords between
> flush of last batch and invalidation of next.
> In that case, we agree that as a part of command programming sequence,
> we'll have one set of store dwords emitted.
> 
> But, as per spec, it is required to emit 2 sets of store dwords before
> the tlb invalidation.

At this moment, I am upset how vague this recommendation is. Why don't
the LRI count? Is there a timing requirement? Do the stores have to
be different pages to flush a TLB etc?

> Also, our motive for having this w/a is just being paranoid, and not
> assuming that dwords would already have been emitted before doing tlb
> invalidation. So, we try to explicitly ensure the same through our w/a.

Which would be as easy as doubling up the STORE_DW(seqno).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v6] drm/i915/vlv: Added a rendering specific Hw WA 'WaTlbInvalidateStoreDataBefore'
  2014-03-26  7:54                     ` Chris Wilson
@ 2014-03-26 14:13                       ` Gupta, Sourab
  0 siblings, 0 replies; 28+ messages in thread
From: Gupta, Sourab @ 2014-03-26 14:13 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx, Goel, Akash

On Wed, 2014-03-26 at 07:54 +0000, Chris Wilson wrote:
> On Wed, Mar 26, 2014 at 05:14:05AM +0000, Gupta, Sourab wrote:
> > On Tue, 2014-03-25 at 10:59 +0000, Chris Wilson wrote:
> > > On Tue, Mar 25, 2014 at 03:23:34PM +0530, sourab.gupta@intel.com wrote:
> > > > From: Akash Goel <akash.goel@intel.com>
> > > > 
> > > > Added a new rendering specific Workaround 'WaTlbInvalidateStoreDataBefore'.
> > > > This workaround has to be applied before doing TLB Invalidation on render ring.
> > > > In this WA, before pipecontrol with TLB invalidate set, need to add 2 MI
> > > > Store data commands.
> > > > Without this, hardware cannot guarantee the command after the PIPE_CONTROL
> > > > with TLB inv will not use the old TLB values.
> > > 
> > > Note, that our command programming sequence already has multiple dword
> > > writes between the flush of the last batch and the invalidation of the
> > > next.
> > > 
> > > Is this w/a still required? Why?
> > > -Chris
> > > 
> > Hi Chris,
> > Are you referring to the MI_STORE_DWORD_INDEX commands emitted as a part
> > of add_request functions?
> > We couldn't find any other place where we are storing dwords between
> > flush of last batch and invalidation of next.
> > In that case, we agree that as a part of command programming sequence,
> > we'll have one set of store dwords emitted.
> > 
> > But, as per spec, it is required to emit 2 sets of store dwords before
> > the tlb invalidation.
> 
> At this moment, I am upset how vague this recommendation is. Why don't
> the LRI count? Is there a timing requirement? Do the stores have to
> be different pages to flush a TLB etc?
> 
Hi Chris,
I see your point. We do see the MI_LOAD_REGISTER_IMM calls, but we were
not sure whether they would serve the purpose (forgive our limited
visibility on the low level pipeline details). 
If the LRI + store dword combination does the job, then this w/a may not
be required and we can drop it.
Regards,
Sourab

> > Also, our motive for having this w/a is just being paranoid, and not
> > assuming that dwords would already have been emitted before doing tlb
> > invalidation. So, we try to explicitly ensure the same through our w/a.
> 
> Which would be as easy as doubling up the STORE_DW(seqno).
> -Chris
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-03-26 14:14 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-24  6:49 [PATCH 0/6] Rendering Specific HW Workarounds for VLV sourab.gupta
2014-03-24  6:49 ` [PATCH 1/6] drm/i915/vlv: Added a rendering specific Hw WA 'WaTlbInvalidateStoreDataBefore' sourab.gupta
2014-03-24  9:32   ` Chris Wilson
2014-03-24 11:20     ` Gupta, Sourab
2014-03-24 18:32       ` Ville Syrjälä
2014-03-24 18:47         ` Chris Wilson
2014-03-25  5:17           ` Gupta, Sourab
2014-03-25  8:31           ` [PATCH v5] " sourab.gupta
2014-03-25  9:15             ` Chris Wilson
2014-03-25  9:39               ` Gupta, Sourab
2014-03-25  9:53               ` [PATCH v6] " sourab.gupta
2014-03-25 10:59                 ` Chris Wilson
2014-03-26  5:14                   ` Gupta, Sourab
2014-03-26  7:54                     ` Chris Wilson
2014-03-26 14:13                       ` Gupta, Sourab
2014-03-24  6:49 ` [PATCH 2/6] drm/i915/vlv: Added a rendering specific Hw WA 'WaSendDummy3dPrimitveAfterSetContext' sourab.gupta
2014-03-24  9:31   ` Daniel Vetter
2014-03-24 11:11     ` Gupta, Sourab
2014-03-24  9:39   ` Chris Wilson
2014-03-24  6:49 ` [PATCH 3/6] drm/i915: Enabling the TLB invalidate bit in GFX Mode register sourab.gupta
2014-03-24  9:35   ` Chris Wilson
2014-03-24  6:49 ` [PATCH 4/6] drm/i915/vlv: Remove the enabling of VS_TIMER_DISPATCH bit in MI MODE reg sourab.gupta
2014-03-24  6:49 ` [PATCH 5/6] drm/i915/vlv:Implement WaDisable_RenderCache_OperationalFlush sourab.gupta
2014-03-24  9:32   ` Chris Wilson
2014-03-24  6:49 ` [PATCH 6/6] drm/i915/vlv: Modified Implementation of WaDisableL3Bank2xClockGate sourab.gupta
2014-03-24  9:41   ` Chris Wilson
2014-03-24  9:35 ` [PATCH 0/6] Rendering Specific HW Workarounds for VLV Daniel Vetter
2014-03-24 11:05   ` Gupta, Sourab

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.