All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/i915: Fix incoherence with fence updates on Sandybridge+
@ 2013-07-09 16:54 Chris Wilson
  2013-07-09 16:54 ` [PATCH 2/2] Revert "drm/i915: Workaround incoherence between fences and LLC across multiple CPUs" Chris Wilson
  2013-07-09 20:43 ` [PATCH 1/2] drm/i915: Fix incoherence with fence updates on Sandybridge+ Daniel Vetter
  0 siblings, 2 replies; 6+ messages in thread
From: Chris Wilson @ 2013-07-09 16:54 UTC (permalink / raw)
  To: intel-gfx
  Cc: Chris Wilson, Jon Bloomfield, Daniel Vetter, Carsten Emde, stable

This hopefully fixes the root cause behind the workaround from

commit 25ff1195f8a0b3724541ae7bbe331b4296de9c06
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Apr 4 21:31:03 2013 +0100

    drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

Thanks to further investigation by Jon Bloomfield, he realised that
the 64-bit register might be broken up by the hardware into two 32-bit
writes (a problem we have encountered elsewhere). This non-atomicity
would then cause an issue where a second thread would see a random
content of its thread register, and so read/write into a fairly random
tiled location. Breaking the operation into 3 explicit 32-bit updates
(first disable the fence, poke the upper bits, then poke the lower bits
and enable) ensures that, given proper serialisation between the
32-bit register write and the memory transfer, that the fence value is
always consistent.

Note to self: this implies that we have multiple threads hitting the
fence whilst it is being updated, but the sequence of steps we take to
quiesce the memory accesses from the old and new fenced objects across
the update should prevent anyone using the fence register during the
update.

However, Daniel points out that the intermediate fence value may be seen
by some other random thread as the intermediate value conflicts with its
own fence register.

Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Carsten Emde <C.Emde@osadl.org>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_gem.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3406c76..ce46e777 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2777,7 +2777,6 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	int fence_reg;
 	int fence_pitch_shift;
-	uint64_t val;
 
 	if (INTEL_INFO(dev)->gen >= 6) {
 		fence_reg = FENCE_REG_SANDYBRIDGE_0;
@@ -2787,8 +2786,11 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
 		fence_pitch_shift = I965_FENCE_PITCH_SHIFT;
 	}
 
+	fence_reg += reg * 8;
+
 	if (obj) {
 		u32 size = i915_gem_obj_ggtt_size(obj);
+		uint64_t val;
 
 		val = (uint64_t)((i915_gem_obj_ggtt_offset(obj) + size - 4096) &
 				 0xfffff000) << 32;
@@ -2796,12 +2798,14 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
 		val |= (uint64_t)((obj->stride / 128) - 1) << fence_pitch_shift;
 		if (obj->tiling_mode == I915_TILING_Y)
 			val |= 1 << I965_FENCE_TILING_Y_SHIFT;
-		val |= I965_FENCE_REG_VALID;
+
+		/* W/a incoherency with non-atomic 64-bit register updates */
+		I915_WRITE(fence_reg + 0, (uint32_t)val); /* first we disable */
+		I915_WRITE(fence_reg + 4, (uint32_t)(val >> 32));
+		I915_WRITE(fence_reg + 0, (uint32_t)(val | I965_FENCE_REG_VALID));
 	} else
-		val = 0;
+		I915_WRITE64(fence_reg, 0);
 
-	fence_reg += reg * 8;
-	I915_WRITE64(fence_reg, val);
 	POSTING_READ(fence_reg);
 }
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] Revert "drm/i915: Workaround incoherence between fences and LLC across multiple CPUs"
  2013-07-09 16:54 [PATCH 1/2] drm/i915: Fix incoherence with fence updates on Sandybridge+ Chris Wilson
@ 2013-07-09 16:54 ` Chris Wilson
  2013-07-09 20:44   ` Daniel Vetter
  2013-07-09 20:43 ` [PATCH 1/2] drm/i915: Fix incoherence with fence updates on Sandybridge+ Daniel Vetter
  1 sibling, 1 reply; 6+ messages in thread
From: Chris Wilson @ 2013-07-09 16:54 UTC (permalink / raw)
  To: intel-gfx
  Cc: Chris Wilson, Daniel Vetter, Jon Bloomfield, Carsten Emde, stable

This reverts commit 25ff119 and the follow on for Valleyview commit 2dc8aae.

commit 25ff1195f8a0b3724541ae7bbe331b4296de9c06
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Apr 4 21:31:03 2013 +0100

    drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

commit 2dc8aae06d53458dd3624dc0accd4f81100ee631
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed May 22 17:08:06 2013 +0100

    drm/i915: Workaround incoherence with fence updates on Valleyview

Jon Bloomfield came up with a plausible explanation and cheap fix for the
race condition, so lets run with it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Carsten Emde<C.Emde@osadl.org>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_gem.c | 47 ++++-------------------------------------
 1 file changed, 4 insertions(+), 43 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ce46e777..96ede8c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2924,56 +2924,17 @@ static inline int fence_number(struct drm_i915_private *dev_priv,
 	return fence - dev_priv->fence_regs;
 }
 
-struct write_fence {
-	struct drm_device *dev;
-	struct drm_i915_gem_object *obj;
-	int fence;
-};
-
-static void i915_gem_write_fence__ipi(void *data)
-{
-	struct write_fence *args = data;
-
-	/* Required for SNB+ with LLC */
-	wbinvd();
-
-	/* Required for VLV */
-	i915_gem_write_fence(args->dev, args->fence, args->obj);
-}
-
 static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj,
 					 struct drm_i915_fence_reg *fence,
 					 bool enable)
 {
 	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
-	struct write_fence args = {
-		.dev = obj->base.dev,
-		.fence = fence_number(dev_priv, fence),
-		.obj = enable ? obj : NULL,
-	};
-
-	/* In order to fully serialize access to the fenced region and
-	 * the update to the fence register we need to take extreme
-	 * measures on SNB+. In theory, the write to the fence register
-	 * flushes all memory transactions before, and coupled with the
-	 * mb() placed around the register write we serialise all memory
-	 * operations with respect to the changes in the tiler. Yet, on
-	 * SNB+ we need to take a step further and emit an explicit wbinvd()
-	 * on each processor in order to manually flush all memory
-	 * transactions before updating the fence register.
-	 *
-	 * However, Valleyview complicates matter. There the wbinvd is
-	 * insufficient and unlike SNB/IVB requires the serialising
-	 * register write. (Note that that register write by itself is
-	 * conversely not sufficient for SNB+.) To compromise, we do both.
-	 */
-	if (INTEL_INFO(args.dev)->gen >= 6)
-		on_each_cpu(i915_gem_write_fence__ipi, &args, 1);
-	else
-		i915_gem_write_fence(args.dev, args.fence, args.obj);
+	int reg = fence_number(dev_priv, fence);
+
+	i915_gem_write_fence(obj->base.dev, reg, enable ? obj : NULL);
 
 	if (enable) {
-		obj->fence_reg = args.fence;
+		obj->fence_reg = reg;
 		fence->obj = obj;
 		list_move_tail(&fence->lru_list, &dev_priv->mm.fence_list);
 	} else {
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] drm/i915: Fix incoherence with fence updates on Sandybridge+
  2013-07-09 16:54 [PATCH 1/2] drm/i915: Fix incoherence with fence updates on Sandybridge+ Chris Wilson
  2013-07-09 16:54 ` [PATCH 2/2] Revert "drm/i915: Workaround incoherence between fences and LLC across multiple CPUs" Chris Wilson
@ 2013-07-09 20:43 ` Daniel Vetter
  2013-07-10  7:30   ` Daniel Vetter
  1 sibling, 1 reply; 6+ messages in thread
From: Daniel Vetter @ 2013-07-09 20:43 UTC (permalink / raw)
  To: Chris Wilson
  Cc: intel-gfx, Jon Bloomfield, Daniel Vetter, Carsten Emde, stable

On Tue, Jul 09, 2013 at 05:54:39PM +0100, Chris Wilson wrote:
> This hopefully fixes the root cause behind the workaround from
> 
> commit 25ff1195f8a0b3724541ae7bbe331b4296de9c06
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Thu Apr 4 21:31:03 2013 +0100
> 
>     drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
> 
> Thanks to further investigation by Jon Bloomfield, he realised that
> the 64-bit register might be broken up by the hardware into two 32-bit
> writes (a problem we have encountered elsewhere). This non-atomicity
> would then cause an issue where a second thread would see a random
> content of its thread register, and so read/write into a fairly random

I think this needs a bit clarification (or I misunderstand how stuff blows
up):

"This non-atomicity can result in fences where area affected by the
fence can be any combinition of the old (start, end) and the new (start,
end) pair, depending upon how the hw executes the two 32bit writes (which
we don't know). For example for a short amount of time we can have a fence
spanning (new start, old end), with the new tiling parameters (since those
are in the first dword). If that gtt area is access right in that small window
by a 2nd cpu core can corrupt whatever's stored there (this depends upon
how the hw handles overlapping fences). Note that to actually get a bogus
fenced range in the gtt we need to steal fences, since otherwise the old
and new (start, end) pair will be the same if we only update the tiling
mode.

"We avoid this race by first disabling the fence (resetting bit 31 in the
first dword) and then (serialized by a posting read) writing the 2nd
dword. Then (again after ensuring serialization with a postin read) we can
update the first dword with the correct new value. This way we never have
a fence enabled spanning a bogus intervall."

I think the above is a theory that fits with all the observed evidence:
- fence thrashing is required
- running on multipler cpus is required

> tiled location. Breaking the operation into 3 explicit 32-bit updates
> (first disable the fence, poke the upper bits, then poke the lower bits
> and enable) ensures that, given proper serialisation between the
> 32-bit register write and the memory transfer, that the fence value is
> always consistent.
> 
> Note to self: this implies that we have multiple threads hitting the
> fence whilst it is being updated, but the sequence of steps we take to
> quiesce the memory accesses from the old and new fenced objects across
> the update should prevent anyone using the fence register during the
> update.
> 
> However, Daniel points out that the intermediate fence value may be seen
> by some other random thread as the intermediate value conflicts with its
> own fence register.
> 
> Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Carsten Emde <C.Emde@osadl.org>
> Cc: stable@vger.kernel.org
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 3406c76..ce46e777 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2777,7 +2777,6 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
>  	drm_i915_private_t *dev_priv = dev->dev_private;
>  	int fence_reg;
>  	int fence_pitch_shift;
> -	uint64_t val;
>  
>  	if (INTEL_INFO(dev)->gen >= 6) {
>  		fence_reg = FENCE_REG_SANDYBRIDGE_0;
> @@ -2787,8 +2786,11 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
>  		fence_pitch_shift = I965_FENCE_PITCH_SHIFT;
>  	}
>  
> +	fence_reg += reg * 8;
> +
>  	if (obj) {
>  		u32 size = i915_gem_obj_ggtt_size(obj);
> +		uint64_t val;
>  
>  		val = (uint64_t)((i915_gem_obj_ggtt_offset(obj) + size - 4096) &
>  				 0xfffff000) << 32;
> @@ -2796,12 +2798,14 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
>  		val |= (uint64_t)((obj->stride / 128) - 1) << fence_pitch_shift;
>  		if (obj->tiling_mode == I915_TILING_Y)
>  			val |= 1 << I965_FENCE_TILING_Y_SHIFT;
> -		val |= I965_FENCE_REG_VALID;
> +
> +		/* W/a incoherency with non-atomic 64-bit register updates */
> +		I915_WRITE(fence_reg + 0, (uint32_t)val); /* first we disable */

I vote for a posting read here ...

> +		I915_WRITE(fence_reg + 4, (uint32_t)(val >> 32));

... and here.

Cheers, Daniel

> +		I915_WRITE(fence_reg + 0, (uint32_t)(val | I965_FENCE_REG_VALID));
>  	} else
> -		val = 0;
> +		I915_WRITE64(fence_reg, 0);
>  
> -	fence_reg += reg * 8;
> -	I915_WRITE64(fence_reg, val);
>  	POSTING_READ(fence_reg);
>  }
>  
> -- 
> 1.8.3.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] Revert "drm/i915: Workaround incoherence between fences and LLC across multiple CPUs"
  2013-07-09 16:54 ` [PATCH 2/2] Revert "drm/i915: Workaround incoherence between fences and LLC across multiple CPUs" Chris Wilson
@ 2013-07-09 20:44   ` Daniel Vetter
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Vetter @ 2013-07-09 20:44 UTC (permalink / raw)
  To: Chris Wilson
  Cc: intel-gfx, Daniel Vetter, Jon Bloomfield, Carsten Emde, stable

On Tue, Jul 09, 2013 at 05:54:40PM +0100, Chris Wilson wrote:
> This reverts commit 25ff119 and the follow on for Valleyview commit 2dc8aae.
> 
> commit 25ff1195f8a0b3724541ae7bbe331b4296de9c06
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Thu Apr 4 21:31:03 2013 +0100
> 
>     drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
> 
> commit 2dc8aae06d53458dd3624dc0accd4f81100ee631
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Wed May 22 17:08:06 2013 +0100
> 
>     drm/i915: Workaround incoherence with fence updates on Valleyview
> 
> Jon Bloomfield came up with a plausible explanation and cheap fix for the
> race condition, so lets run with it.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> Cc: Carsten Emde<C.Emde@osadl.org>
> Cc: stable@vger.kernel.org

Can you please add  reference for the various bug reports we've gotten
from the realtime linux folks? Also we have some bug report from QA iirc
about a perf regression on UXA due to this.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 47 ++++-------------------------------------
>  1 file changed, 4 insertions(+), 43 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ce46e777..96ede8c 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2924,56 +2924,17 @@ static inline int fence_number(struct drm_i915_private *dev_priv,
>  	return fence - dev_priv->fence_regs;
>  }
>  
> -struct write_fence {
> -	struct drm_device *dev;
> -	struct drm_i915_gem_object *obj;
> -	int fence;
> -};
> -
> -static void i915_gem_write_fence__ipi(void *data)
> -{
> -	struct write_fence *args = data;
> -
> -	/* Required for SNB+ with LLC */
> -	wbinvd();
> -
> -	/* Required for VLV */
> -	i915_gem_write_fence(args->dev, args->fence, args->obj);
> -}
> -
>  static void i915_gem_object_update_fence(struct drm_i915_gem_object *obj,
>  					 struct drm_i915_fence_reg *fence,
>  					 bool enable)
>  {
>  	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
> -	struct write_fence args = {
> -		.dev = obj->base.dev,
> -		.fence = fence_number(dev_priv, fence),
> -		.obj = enable ? obj : NULL,
> -	};
> -
> -	/* In order to fully serialize access to the fenced region and
> -	 * the update to the fence register we need to take extreme
> -	 * measures on SNB+. In theory, the write to the fence register
> -	 * flushes all memory transactions before, and coupled with the
> -	 * mb() placed around the register write we serialise all memory
> -	 * operations with respect to the changes in the tiler. Yet, on
> -	 * SNB+ we need to take a step further and emit an explicit wbinvd()
> -	 * on each processor in order to manually flush all memory
> -	 * transactions before updating the fence register.
> -	 *
> -	 * However, Valleyview complicates matter. There the wbinvd is
> -	 * insufficient and unlike SNB/IVB requires the serialising
> -	 * register write. (Note that that register write by itself is
> -	 * conversely not sufficient for SNB+.) To compromise, we do both.
> -	 */
> -	if (INTEL_INFO(args.dev)->gen >= 6)
> -		on_each_cpu(i915_gem_write_fence__ipi, &args, 1);
> -	else
> -		i915_gem_write_fence(args.dev, args.fence, args.obj);
> +	int reg = fence_number(dev_priv, fence);
> +
> +	i915_gem_write_fence(obj->base.dev, reg, enable ? obj : NULL);
>  
>  	if (enable) {
> -		obj->fence_reg = args.fence;
> +		obj->fence_reg = reg;
>  		fence->obj = obj;
>  		list_move_tail(&fence->lru_list, &dev_priv->mm.fence_list);
>  	} else {
> -- 
> 1.8.3.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] drm/i915: Fix incoherence with fence updates on Sandybridge+
  2013-07-09 20:43 ` [PATCH 1/2] drm/i915: Fix incoherence with fence updates on Sandybridge+ Daniel Vetter
@ 2013-07-10  7:30   ` Daniel Vetter
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Vetter @ 2013-07-10  7:30 UTC (permalink / raw)
  To: Chris Wilson
  Cc: intel-gfx, Jon Bloomfield, Daniel Vetter, Carsten Emde, stable

On Tue, Jul 09, 2013 at 10:43:11PM +0200, Daniel Vetter wrote:
> On Tue, Jul 09, 2013 at 05:54:39PM +0100, Chris Wilson wrote:
> > This hopefully fixes the root cause behind the workaround from
> > 
> > commit 25ff1195f8a0b3724541ae7bbe331b4296de9c06
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Thu Apr 4 21:31:03 2013 +0100
> > 
> >     drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
> > 
> > Thanks to further investigation by Jon Bloomfield, he realised that
> > the 64-bit register might be broken up by the hardware into two 32-bit
> > writes (a problem we have encountered elsewhere). This non-atomicity
> > would then cause an issue where a second thread would see a random
> > content of its thread register, and so read/write into a fairly random
> 
> I think this needs a bit clarification (or I misunderstand how stuff blows
> up):
> 
> "This non-atomicity can result in fences where area affected by the
> fence can be any combinition of the old (start, end) and the new (start,
> end) pair, depending upon how the hw executes the two 32bit writes (which
> we don't know). For example for a short amount of time we can have a fence
> spanning (new start, old end), with the new tiling parameters (since those
> are in the first dword). If that gtt area is access right in that small window
> by a 2nd cpu core can corrupt whatever's stored there (this depends upon
> how the hw handles overlapping fences). Note that to actually get a bogus
> fenced range in the gtt we need to steal fences, since otherwise the old
> and new (start, end) pair will be the same if we only update the tiling
> mode.
> 
> "We avoid this race by first disabling the fence (resetting bit 31 in the
> first dword) and then (serialized by a posting read) writing the 2nd
> dword. Then (again after ensuring serialization with a postin read) we can
> update the first dword with the correct new value. This way we never have
> a fence enabled spanning a bogus intervall."
> 
> I think the above is a theory that fits with all the observed evidence:
> - fence thrashing is required
> - running on multipler cpus is required
> 
> > tiled location. Breaking the operation into 3 explicit 32-bit updates
> > (first disable the fence, poke the upper bits, then poke the lower bits
> > and enable) ensures that, given proper serialisation between the
> > 32-bit register write and the memory transfer, that the fence value is
> > always consistent.
> > 
> > Note to self: this implies that we have multiple threads hitting the
> > fence whilst it is being updated, but the sequence of steps we take to
> > quiesce the memory accesses from the old and new fenced objects across
> > the update should prevent anyone using the fence register during the
> > update.
> > 
> > However, Daniel points out that the intermediate fence value may be seen
> > by some other random thread as the intermediate value conflicts with its
> > own fence register.
> > 
> > Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Carsten Emde <C.Emde@osadl.org>
> > Cc: stable@vger.kernel.org
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 14 +++++++++-----
> >  1 file changed, 9 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 3406c76..ce46e777 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -2777,7 +2777,6 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
> >  	drm_i915_private_t *dev_priv = dev->dev_private;
> >  	int fence_reg;
> >  	int fence_pitch_shift;
> > -	uint64_t val;
> >  
> >  	if (INTEL_INFO(dev)->gen >= 6) {
> >  		fence_reg = FENCE_REG_SANDYBRIDGE_0;
> > @@ -2787,8 +2786,11 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
> >  		fence_pitch_shift = I965_FENCE_PITCH_SHIFT;
> >  	}
> >  
> > +	fence_reg += reg * 8;
> > +
> >  	if (obj) {
> >  		u32 size = i915_gem_obj_ggtt_size(obj);
> > +		uint64_t val;
> >  
> >  		val = (uint64_t)((i915_gem_obj_ggtt_offset(obj) + size - 4096) &
> >  				 0xfffff000) << 32;
> > @@ -2796,12 +2798,14 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
> >  		val |= (uint64_t)((obj->stride / 128) - 1) << fence_pitch_shift;
> >  		if (obj->tiling_mode == I915_TILING_Y)
> >  			val |= 1 << I965_FENCE_TILING_Y_SHIFT;
> > -		val |= I965_FENCE_REG_VALID;
> > +
> > +		/* W/a incoherency with non-atomic 64-bit register updates */
> > +		I915_WRITE(fence_reg + 0, (uint32_t)val); /* first we disable */
> 
> I vote for a posting read here ...
> 
> > +		I915_WRITE(fence_reg + 4, (uint32_t)(val >> 32));
> 
> ... and here.
> 
> Cheers, Daniel
> 
> > +		I915_WRITE(fence_reg + 0, (uint32_t)(val | I965_FENCE_REG_VALID));
> >  	} else
> > -		val = 0;
> > +		I915_WRITE64(fence_reg, 0);

Hm, for extra paranoia points shouldn't we give the fence disabling the
same treatment, i.e. write first dword with 0, posting read, write 2nd
dword with 0. The only case where this could blow up is if the hw does
something funny with a fence spanning (old start, 0). That's not really a
legal fence, but given the track record I think we should prevent it, too.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] drm/i915: Fix incoherence with fence updates on Sandybridge+
@ 2013-07-10 12:36 Chris Wilson
  0 siblings, 0 replies; 6+ messages in thread
From: Chris Wilson @ 2013-07-10 12:36 UTC (permalink / raw)
  To: intel-gfx
  Cc: Chris Wilson, Jon Bloomfield, Daniel Vetter, Carsten Emde, stable

This hopefully fixes the root cause behind the workaround added in

commit 25ff1195f8a0b3724541ae7bbe331b4296de9c06
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Apr 4 21:31:03 2013 +0100

    drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

Thanks to further investigation by Jon Bloomfield, he realised that
the 64-bit register might be broken up by the hardware into two 32-bit
writes (a problem we have encountered elsewhere). This non-atomicity
would then cause an issue where a second thread would see an
intermediate register state (new high dword, old low dword), and this
register would randomly be used in preference to its own thread register.
This would cause the second thread to read from and write into a fairly
random tiled location.  Breaking the operation into 3 explicit 32-bit
updates (first disable the fence, poke the upper bits, then poke the lower
bits and enable) ensures that, given proper serialisation between the
32-bit register write and the memory transfer, that the fence value is
always consistent.

Armed with this knowledge, we can explain how the previous workaround
work. The key to the corruption is that a second thread sees an
erroneous fence register that conflicts and overrides its own. By
serialising the fence update across all CPUs, we have a small window
where no GTT access is occurring and so hide the potential corruption.
This also leads to the conclusion that the earlier workaround was
incomplete.

v2: Be overly paranoid about the order in which fence updates become
visible to the GPU to make really sure that we turn the fence off before
doing the update, and then only switch the fence on afterwards.

Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Carsten Emde <C.Emde@osadl.org>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_gem.c | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3406c76..d9d664d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2777,7 +2777,6 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	int fence_reg;
 	int fence_pitch_shift;
-	uint64_t val;
 
 	if (INTEL_INFO(dev)->gen >= 6) {
 		fence_reg = FENCE_REG_SANDYBRIDGE_0;
@@ -2787,8 +2786,23 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
 		fence_pitch_shift = I965_FENCE_PITCH_SHIFT;
 	}
 
+	fence_reg += reg * 8;
+
+	/* To w/a incoherency with non-atomic 64-bit register updates,
+	 * we split the 64-bit update into two 32-bit writes. In order
+	 * for a partial fence not to be evaluated between writes, we
+	 * precede the update with write to turn off the fence register,
+	 * and only enable the fence as the last step.
+	 *
+	 * For extra levels of paranoia, we make sure each step lands
+	 * before applying the next step.
+	 */
+	I915_WRITE(fence_reg, 0);
+	POSTING_READ(fence_reg);
+
 	if (obj) {
 		u32 size = i915_gem_obj_ggtt_size(obj);
+		uint64_t val;
 
 		val = (uint64_t)((i915_gem_obj_ggtt_offset(obj) + size - 4096) &
 				 0xfffff000) << 32;
@@ -2797,12 +2811,16 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
 		if (obj->tiling_mode == I915_TILING_Y)
 			val |= 1 << I965_FENCE_TILING_Y_SHIFT;
 		val |= I965_FENCE_REG_VALID;
-	} else
-		val = 0;
 
-	fence_reg += reg * 8;
-	I915_WRITE64(fence_reg, val);
-	POSTING_READ(fence_reg);
+		I915_WRITE(fence_reg + 4, val >> 32);
+		POSTING_READ(fence_reg + 4);
+
+		I915_WRITE(fence_reg + 0, val);
+		POSTING_READ(fence_reg);
+	} else {
+		I915_WRITE(fence_reg + 4, 0);
+		POSTING_READ(fence_reg + 4);
+	}
 }
 
 static void i915_write_fence_reg(struct drm_device *dev, int reg,
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-07-10 12:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-09 16:54 [PATCH 1/2] drm/i915: Fix incoherence with fence updates on Sandybridge+ Chris Wilson
2013-07-09 16:54 ` [PATCH 2/2] Revert "drm/i915: Workaround incoherence between fences and LLC across multiple CPUs" Chris Wilson
2013-07-09 20:44   ` Daniel Vetter
2013-07-09 20:43 ` [PATCH 1/2] drm/i915: Fix incoherence with fence updates on Sandybridge+ Daniel Vetter
2013-07-10  7:30   ` Daniel Vetter
2013-07-10 12:36 Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.