All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/i915: Fix incoherence with fence updates on Sandybridge+
@ 2013-07-10 12:36 Chris Wilson
  2013-07-10 12:36 ` [PATCH 2/2] Revert "drm/i915: Workaround incoherence between fences and LLC across multiple CPUs" Chris Wilson
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Wilson @ 2013-07-10 12:36 UTC (permalink / raw)
  To: intel-gfx
  Cc: Chris Wilson, Jon Bloomfield, Daniel Vetter, Carsten Emde, stable

This hopefully fixes the root cause behind the workaround added in

commit 25ff1195f8a0b3724541ae7bbe331b4296de9c06
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Apr 4 21:31:03 2013 +0100

    drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

Thanks to further investigation by Jon Bloomfield, he realised that
the 64-bit register might be broken up by the hardware into two 32-bit
writes (a problem we have encountered elsewhere). This non-atomicity
would then cause an issue where a second thread would see an
intermediate register state (new high dword, old low dword), and this
register would randomly be used in preference to its own thread register.
This would cause the second thread to read from and write into a fairly
random tiled location.  Breaking the operation into 3 explicit 32-bit
updates (first disable the fence, poke the upper bits, then poke the lower
bits and enable) ensures that, given proper serialisation between the
32-bit register write and the memory transfer, that the fence value is
always consistent.

Armed with this knowledge, we can explain how the previous workaround
work. The key to the corruption is that a second thread sees an
erroneous fence register that conflicts and overrides its own. By
serialising the fence update across all CPUs, we have a small window
where no GTT access is occurring and so hide the potential corruption.
This also leads to the conclusion that the earlier workaround was
incomplete.

v2: Be overly paranoid about the order in which fence updates become
visible to the GPU to make really sure that we turn the fence off before
doing the update, and then only switch the fence on afterwards.

Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Carsten Emde <C.Emde@osadl.org>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_gem.c | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3406c76..d9d664d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2777,7 +2777,6 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	int fence_reg;
 	int fence_pitch_shift;
-	uint64_t val;
 
 	if (INTEL_INFO(dev)->gen >= 6) {
 		fence_reg = FENCE_REG_SANDYBRIDGE_0;
@@ -2787,8 +2786,23 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
 		fence_pitch_shift = I965_FENCE_PITCH_SHIFT;
 	}
 
+	fence_reg += reg * 8;
+
+	/* To w/a incoherency with non-atomic 64-bit register updates,
+	 * we split the 64-bit update into two 32-bit writes. In order
+	 * for a partial fence not to be evaluated between writes, we
+	 * precede the update with write to turn off the fence register,
+	 * and only enable the fence as the last step.
+	 *
+	 * For extra levels of paranoia, we make sure each step lands
+	 * before applying the next step.
+	 */
+	I915_WRITE(fence_reg, 0);
+	POSTING_READ(fence_reg);
+
 	if (obj) {
 		u32 size = i915_gem_obj_ggtt_size(obj);
+		uint64_t val;
 
 		val = (uint64_t)((i915_gem_obj_ggtt_offset(obj) + size - 4096) &
 				 0xfffff000) << 32;
@@ -2797,12 +2811,16 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
 		if (obj->tiling_mode == I915_TILING_Y)
 			val |= 1 << I965_FENCE_TILING_Y_SHIFT;
 		val |= I965_FENCE_REG_VALID;
-	} else
-		val = 0;
 
-	fence_reg += reg * 8;
-	I915_WRITE64(fence_reg, val);
-	POSTING_READ(fence_reg);
+		I915_WRITE(fence_reg + 4, val >> 32);
+		POSTING_READ(fence_reg + 4);
+
+		I915_WRITE(fence_reg + 0, val);
+		POSTING_READ(fence_reg);
+	} else {
+		I915_WRITE(fence_reg + 4, 0);
+		POSTING_READ(fence_reg + 4);
+	}
 }
 
 static void i915_write_fence_reg(struct drm_device *dev, int reg,
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 5+ messages in thread
* [PATCH 1/2] drm/i915: Fix incoherence with fence updates on Sandybridge+
@ 2013-07-09 16:54 Chris Wilson
  2013-07-09 16:54 ` [PATCH 2/2] Revert "drm/i915: Workaround incoherence between fences and LLC across multiple CPUs" Chris Wilson
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Wilson @ 2013-07-09 16:54 UTC (permalink / raw)
  To: intel-gfx
  Cc: Chris Wilson, Jon Bloomfield, Daniel Vetter, Carsten Emde, stable

This hopefully fixes the root cause behind the workaround from

commit 25ff1195f8a0b3724541ae7bbe331b4296de9c06
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Apr 4 21:31:03 2013 +0100

    drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

Thanks to further investigation by Jon Bloomfield, he realised that
the 64-bit register might be broken up by the hardware into two 32-bit
writes (a problem we have encountered elsewhere). This non-atomicity
would then cause an issue where a second thread would see a random
content of its thread register, and so read/write into a fairly random
tiled location. Breaking the operation into 3 explicit 32-bit updates
(first disable the fence, poke the upper bits, then poke the lower bits
and enable) ensures that, given proper serialisation between the
32-bit register write and the memory transfer, that the fence value is
always consistent.

Note to self: this implies that we have multiple threads hitting the
fence whilst it is being updated, but the sequence of steps we take to
quiesce the memory accesses from the old and new fenced objects across
the update should prevent anyone using the fence register during the
update.

However, Daniel points out that the intermediate fence value may be seen
by some other random thread as the intermediate value conflicts with its
own fence register.

Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Carsten Emde <C.Emde@osadl.org>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_gem.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3406c76..ce46e777 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2777,7 +2777,6 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	int fence_reg;
 	int fence_pitch_shift;
-	uint64_t val;
 
 	if (INTEL_INFO(dev)->gen >= 6) {
 		fence_reg = FENCE_REG_SANDYBRIDGE_0;
@@ -2787,8 +2786,11 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
 		fence_pitch_shift = I965_FENCE_PITCH_SHIFT;
 	}
 
+	fence_reg += reg * 8;
+
 	if (obj) {
 		u32 size = i915_gem_obj_ggtt_size(obj);
+		uint64_t val;
 
 		val = (uint64_t)((i915_gem_obj_ggtt_offset(obj) + size - 4096) &
 				 0xfffff000) << 32;
@@ -2796,12 +2798,14 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
 		val |= (uint64_t)((obj->stride / 128) - 1) << fence_pitch_shift;
 		if (obj->tiling_mode == I915_TILING_Y)
 			val |= 1 << I965_FENCE_TILING_Y_SHIFT;
-		val |= I965_FENCE_REG_VALID;
+
+		/* W/a incoherency with non-atomic 64-bit register updates */
+		I915_WRITE(fence_reg + 0, (uint32_t)val); /* first we disable */
+		I915_WRITE(fence_reg + 4, (uint32_t)(val >> 32));
+		I915_WRITE(fence_reg + 0, (uint32_t)(val | I965_FENCE_REG_VALID));
 	} else
-		val = 0;
+		I915_WRITE64(fence_reg, 0);
 
-	fence_reg += reg * 8;
-	I915_WRITE64(fence_reg, val);
 	POSTING_READ(fence_reg);
 }
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-07-10 12:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-10 12:36 [PATCH 1/2] drm/i915: Fix incoherence with fence updates on Sandybridge+ Chris Wilson
2013-07-10 12:36 ` [PATCH 2/2] Revert "drm/i915: Workaround incoherence between fences and LLC across multiple CPUs" Chris Wilson
2013-07-10 12:44   ` Daniel Vetter
  -- strict thread matches above, loose matches on Subject: below --
2013-07-09 16:54 [PATCH 1/2] drm/i915: Fix incoherence with fence updates on Sandybridge+ Chris Wilson
2013-07-09 16:54 ` [PATCH 2/2] Revert "drm/i915: Workaround incoherence between fences and LLC across multiple CPUs" Chris Wilson
2013-07-09 20:44   ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.