All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/i915/bxt: prevent allocating context object from HIGHMEM
@ 2015-09-17 16:17 Imre Deak
  2015-09-17 16:17 ` [PATCH v2 2/2] drm/i915/bxt: work around HW context corruption due to coherency problem Imre Deak
  0 siblings, 1 reply; 12+ messages in thread
From: Imre Deak @ 2015-09-17 16:17 UTC (permalink / raw)
  To: intel-gfx

At least on BXT A stepping we need to map part of the context object as
uncached. For this we first set the corresponding page to uncached and
then use kmap/kunmap to get a kernel mapping whenever we want to update
the context. Since kmap for a HIGHMEM page always returns a write-back
mapping we need to prevent allocating the context object from HIGHMEM.

Needed by the next patch implementing the actual workaround for BXT A.

Signed-off-by: Imre Deak <imre.deak@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  2 ++
 drivers/gpu/drm/i915/i915_gem.c  | 21 ++++++++++++++++++---
 drivers/gpu/drm/i915/intel_lrc.c |  5 ++++-
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4ea3e7b..6e7f91e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2806,6 +2806,8 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 			 const struct drm_i915_gem_object_ops *ops);
 struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
 						  size_t size);
+struct drm_i915_gem_object *
+i915_gem_alloc_object_no_highmem(struct drm_device *dev, size_t size);
 struct drm_i915_gem_object *i915_gem_object_create_from_data(
 		struct drm_device *dev, const void *data, size_t size);
 void i915_init_vm(struct drm_i915_private *dev_priv,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index cb0df7e..11c9191 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4243,8 +4243,8 @@ static const struct drm_i915_gem_object_ops i915_gem_object_ops = {
 	.put_pages = i915_gem_object_put_pages_gtt,
 };
 
-struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
-						  size_t size)
+static struct drm_i915_gem_object *
+__i915_gem_alloc_object(struct drm_device *dev, size_t size, bool highmem)
 {
 	struct drm_i915_gem_object *obj;
 	struct address_space *mapping;
@@ -4262,10 +4262,13 @@ struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
 	mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
 	if (IS_CRESTLINE(dev) || IS_BROADWATER(dev)) {
 		/* 965gm cannot relocate objects above 4GiB. */
-		mask &= ~__GFP_HIGHMEM;
+		highmem = false;
 		mask |= __GFP_DMA32;
 	}
 
+	if (!highmem)
+		mask &= ~__GFP_HIGHMEM;
+
 	mapping = file_inode(obj->base.filp)->i_mapping;
 	mapping_set_gfp_mask(mapping, mask);
 
@@ -4296,6 +4299,18 @@ struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
 	return obj;
 }
 
+struct drm_i915_gem_object *
+i915_gem_alloc_object(struct drm_device *dev, size_t size)
+{
+	return __i915_gem_alloc_object(dev, size, true);
+}
+
+struct drm_i915_gem_object *
+i915_gem_alloc_object_no_highmem(struct drm_device *dev, size_t size)
+{
+	return __i915_gem_alloc_object(dev, size, false);
+}
+
 static bool discard_backing_storage(struct drm_i915_gem_object *obj)
 {
 	/* If we are the last user of the backing storage (be it shmemfs
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index fe06accb0..942069f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2459,7 +2459,10 @@ int intel_lr_context_deferred_alloc(struct intel_context *ctx,
 	/* One extra page as the sharing data between driver and GuC */
 	context_size += PAGE_SIZE * LRC_PPHWSP_PN;
 
-	ctx_obj = i915_gem_alloc_object(dev, context_size);
+	if (IS_BROXTON(dev) && INTEL_REVID(dev) < BXT_REVID_B0)
+		ctx_obj = i915_gem_alloc_object_no_highmem(dev, context_size);
+	else
+		ctx_obj = i915_gem_alloc_object(dev, context_size);
 	if (!ctx_obj) {
 		DRM_DEBUG_DRIVER("Alloc LRC backing obj failed.\n");
 		return -ENOMEM;
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/2] drm/i915/bxt: work around HW context corruption due to coherency problem
  2015-09-17 16:17 [PATCH 1/2] drm/i915/bxt: prevent allocating context object from HIGHMEM Imre Deak
@ 2015-09-17 16:17 ` Imre Deak
  2015-09-18  9:02   ` Chris Wilson
  0 siblings, 1 reply; 12+ messages in thread
From: Imre Deak @ 2015-09-17 16:17 UTC (permalink / raw)
  To: intel-gfx

The execlist context object is mapped with a CPU/GPU coherent mapping
everywhere, but on BXT A stepping due to a HW issue the coherency is not
guaranteed. To work around this flush the context object after pinning
it (to flush cache lines left by the context initialization/read-back
from backing storage) and mark it as uncached so later updates during
context switching will be coherent.

I noticed this problem via a GPU hang, where IPEHR pointed to an invalid
opcode value. I couldn't find this value on the ring but looking at the
contents of the active context object it turned out to be a parameter
dword of a bigger command there. The original command opcode itself
was zeroed out, based on the above I assume due to a CPU writeback of
the corresponding cacheline. When restoring the context the GPU would
jump over the zeroed out opcode and hang when trying to execute the
above parameter dword.

I could easily reproduce this by running igt/gem_render_copy_redux and
gem_tiled_blits/basic in parallel, but I guess it could be triggered by
anything involving frequent switches between two separate contexts. With
this workaround I couldn't reproduce the problem.

v2:
- instead of clflushing after updating the tail and PDP values during
  context switching, map the corresponding page as uncached to avoid a
  race between CPU and GPU, both updating the same cacheline at the same
  time (Ville)

Signed-off-by: Imre Deak <imre.deak@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 942069f..f6873a0 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1036,6 +1036,17 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
 	if (ret)
 		return ret;
 
+	if (IS_BROXTON(dev_priv) && INTEL_REVID(dev_priv) < BXT_REVID_B0) {
+		struct page *page;
+
+		drm_clflush_sg(ctx_obj->pages);
+		page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
+		ret = set_pages_uc(page, 1);
+		if (ret)
+			goto unpin_ctx_obj;
+	}
+
+
 	ret = intel_pin_and_map_ringbuffer_obj(ring->dev, ringbuf);
 	if (ret)
 		goto unpin_ctx_obj;
@@ -1076,12 +1087,21 @@ reset_pin_count:
 void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
 {
 	struct intel_engine_cs *ring = rq->ring;
+	struct drm_i915_private *dev_priv = rq->i915;
 	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[ring->id].state;
 	struct intel_ringbuffer *ringbuf = rq->ringbuf;
 
 	if (ctx_obj) {
 		WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
 		if (--rq->ctx->engine[ring->id].pin_count == 0) {
+			if (IS_BROXTON(dev_priv) &&
+			    INTEL_REVID(dev_priv) < BXT_REVID_B0) {
+				struct page *page;
+
+				page = i915_gem_object_get_page(ctx_obj,
+								LRC_STATE_PN);
+				WARN_ON_ONCE(set_pages_wb(page, 1));
+			}
 			intel_unpin_ringbuffer_obj(ringbuf);
 			i915_gem_object_ggtt_unpin(ctx_obj);
 		}
-- 
2.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] drm/i915/bxt: work around HW context corruption due to coherency problem
  2015-09-17 16:17 ` [PATCH v2 2/2] drm/i915/bxt: work around HW context corruption due to coherency problem Imre Deak
@ 2015-09-18  9:02   ` Chris Wilson
  2015-09-18 12:24     ` Imre Deak
  2015-09-23 13:35     ` Daniel Vetter
  0 siblings, 2 replies; 12+ messages in thread
From: Chris Wilson @ 2015-09-18  9:02 UTC (permalink / raw)
  To: Imre Deak; +Cc: intel-gfx

On Thu, Sep 17, 2015 at 07:17:44PM +0300, Imre Deak wrote:
> The execlist context object is mapped with a CPU/GPU coherent mapping
> everywhere, but on BXT A stepping due to a HW issue the coherency is not
> guaranteed. To work around this flush the context object after pinning
> it (to flush cache lines left by the context initialization/read-back
> from backing storage) and mark it as uncached so later updates during
> context switching will be coherent.
> 
> I noticed this problem via a GPU hang, where IPEHR pointed to an invalid
> opcode value. I couldn't find this value on the ring but looking at the
> contents of the active context object it turned out to be a parameter
> dword of a bigger command there. The original command opcode itself
> was zeroed out, based on the above I assume due to a CPU writeback of
> the corresponding cacheline. When restoring the context the GPU would
> jump over the zeroed out opcode and hang when trying to execute the
> above parameter dword.
> 
> I could easily reproduce this by running igt/gem_render_copy_redux and
> gem_tiled_blits/basic in parallel, but I guess it could be triggered by
> anything involving frequent switches between two separate contexts. With
> this workaround I couldn't reproduce the problem.
> 
> v2:
> - instead of clflushing after updating the tail and PDP values during
>   context switching, map the corresponding page as uncached to avoid a
>   race between CPU and GPU, both updating the same cacheline at the same
>   time (Ville)

No. Changing PAT involves a stop_machine() and is severely detrimental
to performance (context creation overhead does impact userspace).
Mapping it as uncached doesn't remove the race anyway.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] drm/i915/bxt: work around HW context corruption due to coherency problem
  2015-09-18  9:02   ` Chris Wilson
@ 2015-09-18 12:24     ` Imre Deak
  2015-09-23 13:35     ` Daniel Vetter
  1 sibling, 0 replies; 12+ messages in thread
From: Imre Deak @ 2015-09-18 12:24 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, 2015-09-18 at 10:02 +0100, Chris Wilson wrote:
> On Thu, Sep 17, 2015 at 07:17:44PM +0300, Imre Deak wrote:
> > The execlist context object is mapped with a CPU/GPU coherent mapping
> > everywhere, but on BXT A stepping due to a HW issue the coherency is not
> > guaranteed. To work around this flush the context object after pinning
> > it (to flush cache lines left by the context initialization/read-back
> > from backing storage) and mark it as uncached so later updates during
> > context switching will be coherent.
> > 
> > I noticed this problem via a GPU hang, where IPEHR pointed to an invalid
> > opcode value. I couldn't find this value on the ring but looking at the
> > contents of the active context object it turned out to be a parameter
> > dword of a bigger command there. The original command opcode itself
> > was zeroed out, based on the above I assume due to a CPU writeback of
> > the corresponding cacheline. When restoring the context the GPU would
> > jump over the zeroed out opcode and hang when trying to execute the
> > above parameter dword.
> > 
> > I could easily reproduce this by running igt/gem_render_copy_redux and
> > gem_tiled_blits/basic in parallel, but I guess it could be triggered by
> > anything involving frequent switches between two separate contexts. With
> > this workaround I couldn't reproduce the problem.
> > 
> > v2:
> > - instead of clflushing after updating the tail and PDP values during
> >   context switching, map the corresponding page as uncached to avoid a
> >   race between CPU and GPU, both updating the same cacheline at the same
> >   time (Ville)
> 
> No. Changing PAT involves a stop_machine() and is severely detrimental
> to performance (context creation overhead does impact userspace).

Hm, where on that path is stop_machine(), I haven't found it. I guess
there is an overhead because of the TLB flush on each CPU, I haven't
benchmarked it. If that's a real issue (atm this is only a w/a for
stepping A) we could have a cache of uncached pages.

> Mapping it as uncached doesn't remove the race anyway.

Please explain, I do believe it does.

--Imre

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] drm/i915/bxt: work around HW context corruption due to coherency problem
  2015-09-18  9:02   ` Chris Wilson
  2015-09-18 12:24     ` Imre Deak
@ 2015-09-23 13:35     ` Daniel Vetter
  2015-09-23 13:39       ` Chris Wilson
  1 sibling, 1 reply; 12+ messages in thread
From: Daniel Vetter @ 2015-09-23 13:35 UTC (permalink / raw)
  To: Chris Wilson, Imre Deak, intel-gfx, Ville Syrjälä

On Fri, Sep 18, 2015 at 10:02:24AM +0100, Chris Wilson wrote:
> On Thu, Sep 17, 2015 at 07:17:44PM +0300, Imre Deak wrote:
> > The execlist context object is mapped with a CPU/GPU coherent mapping
> > everywhere, but on BXT A stepping due to a HW issue the coherency is not
> > guaranteed. To work around this flush the context object after pinning
> > it (to flush cache lines left by the context initialization/read-back
> > from backing storage) and mark it as uncached so later updates during
> > context switching will be coherent.
> > 
> > I noticed this problem via a GPU hang, where IPEHR pointed to an invalid
> > opcode value. I couldn't find this value on the ring but looking at the
> > contents of the active context object it turned out to be a parameter
> > dword of a bigger command there. The original command opcode itself
> > was zeroed out, based on the above I assume due to a CPU writeback of
> > the corresponding cacheline. When restoring the context the GPU would
> > jump over the zeroed out opcode and hang when trying to execute the
> > above parameter dword.
> > 
> > I could easily reproduce this by running igt/gem_render_copy_redux and
> > gem_tiled_blits/basic in parallel, but I guess it could be triggered by
> > anything involving frequent switches between two separate contexts. With
> > this workaround I couldn't reproduce the problem.
> > 
> > v2:
> > - instead of clflushing after updating the tail and PDP values during
> >   context switching, map the corresponding page as uncached to avoid a
> >   race between CPU and GPU, both updating the same cacheline at the same
> >   time (Ville)
> 
> No. Changing PAT involves a stop_machine() and is severely detrimental
> to performance (context creation overhead does impact userspace).
> Mapping it as uncached doesn't remove the race anyway.

Yeah it's not pretty, but otoh it's for A stepping and we'll kill it again
once bxt is shipping. I think with a big "IXME: dont ever dare to copy
this" comment this is acceptable. It's not really the worst "my gpu
crawls" workaround we've seend for early hw ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] drm/i915/bxt: work around HW context corruption due to coherency problem
  2015-09-23 13:35     ` Daniel Vetter
@ 2015-09-23 13:39       ` Chris Wilson
  2015-09-23 13:57         ` Imre Deak
  2015-09-23 13:58         ` Daniel Vetter
  0 siblings, 2 replies; 12+ messages in thread
From: Chris Wilson @ 2015-09-23 13:39 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Wed, Sep 23, 2015 at 03:35:59PM +0200, Daniel Vetter wrote:
> On Fri, Sep 18, 2015 at 10:02:24AM +0100, Chris Wilson wrote:
> > On Thu, Sep 17, 2015 at 07:17:44PM +0300, Imre Deak wrote:
> > > The execlist context object is mapped with a CPU/GPU coherent mapping
> > > everywhere, but on BXT A stepping due to a HW issue the coherency is not
> > > guaranteed. To work around this flush the context object after pinning
> > > it (to flush cache lines left by the context initialization/read-back
> > > from backing storage) and mark it as uncached so later updates during
> > > context switching will be coherent.
> > > 
> > > I noticed this problem via a GPU hang, where IPEHR pointed to an invalid
> > > opcode value. I couldn't find this value on the ring but looking at the
> > > contents of the active context object it turned out to be a parameter
> > > dword of a bigger command there. The original command opcode itself
> > > was zeroed out, based on the above I assume due to a CPU writeback of
> > > the corresponding cacheline. When restoring the context the GPU would
> > > jump over the zeroed out opcode and hang when trying to execute the
> > > above parameter dword.
> > > 
> > > I could easily reproduce this by running igt/gem_render_copy_redux and
> > > gem_tiled_blits/basic in parallel, but I guess it could be triggered by
> > > anything involving frequent switches between two separate contexts. With
> > > this workaround I couldn't reproduce the problem.
> > > 
> > > v2:
> > > - instead of clflushing after updating the tail and PDP values during
> > >   context switching, map the corresponding page as uncached to avoid a
> > >   race between CPU and GPU, both updating the same cacheline at the same
> > >   time (Ville)
> > 
> > No. Changing PAT involves a stop_machine() and is severely detrimental
> > to performance (context creation overhead does impact userspace).
> > Mapping it as uncached doesn't remove the race anyway.
> 
> Yeah it's not pretty, but otoh it's for A stepping and we'll kill it again
> once bxt is shipping. I think with a big "IXME: dont ever dare to copy
> this" comment this is acceptable. It's not really the worst "my gpu
> crawls" workaround we've seend for early hw ...

Thinking about this, an incoherent TAIL write cannot cause IPEHR !=
*ACTHD. The flush is just papering over the absence of a flush elsewhere
and the root cause remains unfixed.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] drm/i915/bxt: work around HW context corruption due to coherency problem
  2015-09-23 13:39       ` Chris Wilson
@ 2015-09-23 13:57         ` Imre Deak
  2015-09-23 14:17           ` Chris Wilson
  2015-09-23 13:58         ` Daniel Vetter
  1 sibling, 1 reply; 12+ messages in thread
From: Imre Deak @ 2015-09-23 13:57 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On ke, 2015-09-23 at 14:39 +0100, Chris Wilson wrote:
> On Wed, Sep 23, 2015 at 03:35:59PM +0200, Daniel Vetter wrote:
> > On Fri, Sep 18, 2015 at 10:02:24AM +0100, Chris Wilson wrote:
> > > On Thu, Sep 17, 2015 at 07:17:44PM +0300, Imre Deak wrote:
> > > > The execlist context object is mapped with a CPU/GPU coherent mapping
> > > > everywhere, but on BXT A stepping due to a HW issue the coherency is not
> > > > guaranteed. To work around this flush the context object after pinning
> > > > it (to flush cache lines left by the context initialization/read-back
> > > > from backing storage) and mark it as uncached so later updates during
> > > > context switching will be coherent.
> > > > 
> > > > I noticed this problem via a GPU hang, where IPEHR pointed to an invalid
> > > > opcode value. I couldn't find this value on the ring but looking at the
> > > > contents of the active context object it turned out to be a parameter
> > > > dword of a bigger command there. The original command opcode itself
> > > > was zeroed out, based on the above I assume due to a CPU writeback of
> > > > the corresponding cacheline. When restoring the context the GPU would
> > > > jump over the zeroed out opcode and hang when trying to execute the
> > > > above parameter dword.
> > > > 
> > > > I could easily reproduce this by running igt/gem_render_copy_redux and
> > > > gem_tiled_blits/basic in parallel, but I guess it could be triggered by
> > > > anything involving frequent switches between two separate contexts. With
> > > > this workaround I couldn't reproduce the problem.
> > > > 
> > > > v2:
> > > > - instead of clflushing after updating the tail and PDP values during
> > > >   context switching, map the corresponding page as uncached to avoid a
> > > >   race between CPU and GPU, both updating the same cacheline at the same
> > > >   time (Ville)
> > > 
> > > No. Changing PAT involves a stop_machine() and is severely detrimental
> > > to performance (context creation overhead does impact userspace).
> > > Mapping it as uncached doesn't remove the race anyway.
> > 
> > Yeah it's not pretty, but otoh it's for A stepping and we'll kill it again
> > once bxt is shipping. I think with a big "IXME: dont ever dare to copy
> > this" comment this is acceptable. It's not really the worst "my gpu
> > crawls" workaround we've seend for early hw ...
> 
> Thinking about this, an incoherent TAIL write cannot cause IPEHR !=
> *ACTHD. The flush is just papering over the absence of a flush elsewhere
> and the root cause remains unfixed.

It's not the TAIL write that causing IPEHR != *ACTHD. I tried to explain
this in the commit message, I guess it was not clear. The lack of flush
atm means that after initializing the context by CPU those initialized
values will be not necessarily visible by the GPU (due to the coherency
problem). These initialized values also include the zeroed out values
returned by shmem/GUP. Eventually this context will get scheduled in/out
resulting in a context save by the GPU. After this context save we still
have the contents of the context in the CPU cache that are possibly not
coherent with the GPU. If now one of these cache lines is evicted (due
to some unrelated cache thrashing) the values stored by the GPU context
save will get overwritten by the stale CPU cache values. This is exactly
what I saw, zeroed out values in the context where there should've been
valid command dwords.

--Imre

> -Chris
> 


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] drm/i915/bxt: work around HW context corruption due to coherency problem
  2015-09-23 13:39       ` Chris Wilson
  2015-09-23 13:57         ` Imre Deak
@ 2015-09-23 13:58         ` Daniel Vetter
  2015-09-23 14:39           ` Chris Wilson
  1 sibling, 1 reply; 12+ messages in thread
From: Daniel Vetter @ 2015-09-23 13:58 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Imre Deak, intel-gfx,
	Ville Syrjälä

On Wed, Sep 23, 2015 at 02:39:13PM +0100, Chris Wilson wrote:
> On Wed, Sep 23, 2015 at 03:35:59PM +0200, Daniel Vetter wrote:
> > On Fri, Sep 18, 2015 at 10:02:24AM +0100, Chris Wilson wrote:
> > > On Thu, Sep 17, 2015 at 07:17:44PM +0300, Imre Deak wrote:
> > > > The execlist context object is mapped with a CPU/GPU coherent mapping
> > > > everywhere, but on BXT A stepping due to a HW issue the coherency is not
> > > > guaranteed. To work around this flush the context object after pinning
> > > > it (to flush cache lines left by the context initialization/read-back
> > > > from backing storage) and mark it as uncached so later updates during
> > > > context switching will be coherent.
> > > > 
> > > > I noticed this problem via a GPU hang, where IPEHR pointed to an invalid
> > > > opcode value. I couldn't find this value on the ring but looking at the
> > > > contents of the active context object it turned out to be a parameter
> > > > dword of a bigger command there. The original command opcode itself
> > > > was zeroed out, based on the above I assume due to a CPU writeback of
> > > > the corresponding cacheline. When restoring the context the GPU would
> > > > jump over the zeroed out opcode and hang when trying to execute the
> > > > above parameter dword.
> > > > 
> > > > I could easily reproduce this by running igt/gem_render_copy_redux and
> > > > gem_tiled_blits/basic in parallel, but I guess it could be triggered by
> > > > anything involving frequent switches between two separate contexts. With
> > > > this workaround I couldn't reproduce the problem.
> > > > 
> > > > v2:
> > > > - instead of clflushing after updating the tail and PDP values during
> > > >   context switching, map the corresponding page as uncached to avoid a
> > > >   race between CPU and GPU, both updating the same cacheline at the same
> > > >   time (Ville)
> > > 
> > > No. Changing PAT involves a stop_machine() and is severely detrimental
> > > to performance (context creation overhead does impact userspace).
> > > Mapping it as uncached doesn't remove the race anyway.
> > 
> > Yeah it's not pretty, but otoh it's for A stepping and we'll kill it again
> > once bxt is shipping. I think with a big "IXME: dont ever dare to copy
> > this" comment this is acceptable. It's not really the worst "my gpu
> > crawls" workaround we've seend for early hw ...
> 
> Thinking about this, an incoherent TAIL write cannot cause IPEHR !=
> *ACTHD. The flush is just papering over the absence of a flush elsewhere
> and the root cause remains unfixed.

I thought the incoherence happens in the ring itself, not the TAIL update.
That one might just cause additional fun.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] drm/i915/bxt: work around HW context corruption due to coherency problem
  2015-09-23 13:57         ` Imre Deak
@ 2015-09-23 14:17           ` Chris Wilson
  2015-09-23 15:40             ` Imre Deak
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2015-09-23 14:17 UTC (permalink / raw)
  To: Imre Deak; +Cc: intel-gfx

On Wed, Sep 23, 2015 at 04:57:48PM +0300, Imre Deak wrote:
> On ke, 2015-09-23 at 14:39 +0100, Chris Wilson wrote:
> > On Wed, Sep 23, 2015 at 03:35:59PM +0200, Daniel Vetter wrote:
> > > On Fri, Sep 18, 2015 at 10:02:24AM +0100, Chris Wilson wrote:
> > > > On Thu, Sep 17, 2015 at 07:17:44PM +0300, Imre Deak wrote:
> > > > > The execlist context object is mapped with a CPU/GPU coherent mapping
> > > > > everywhere, but on BXT A stepping due to a HW issue the coherency is not
> > > > > guaranteed. To work around this flush the context object after pinning
> > > > > it (to flush cache lines left by the context initialization/read-back
> > > > > from backing storage) and mark it as uncached so later updates during
> > > > > context switching will be coherent.
> > > > > 
> > > > > I noticed this problem via a GPU hang, where IPEHR pointed to an invalid
> > > > > opcode value. I couldn't find this value on the ring but looking at the
> > > > > contents of the active context object it turned out to be a parameter
> > > > > dword of a bigger command there. The original command opcode itself
> > > > > was zeroed out, based on the above I assume due to a CPU writeback of
> > > > > the corresponding cacheline. When restoring the context the GPU would
> > > > > jump over the zeroed out opcode and hang when trying to execute the
> > > > > above parameter dword.
> > > > > 
> > > > > I could easily reproduce this by running igt/gem_render_copy_redux and
> > > > > gem_tiled_blits/basic in parallel, but I guess it could be triggered by
> > > > > anything involving frequent switches between two separate contexts. With
> > > > > this workaround I couldn't reproduce the problem.
> > > > > 
> > > > > v2:
> > > > > - instead of clflushing after updating the tail and PDP values during
> > > > >   context switching, map the corresponding page as uncached to avoid a
> > > > >   race between CPU and GPU, both updating the same cacheline at the same
> > > > >   time (Ville)
> > > > 
> > > > No. Changing PAT involves a stop_machine() and is severely detrimental
> > > > to performance (context creation overhead does impact userspace).
> > > > Mapping it as uncached doesn't remove the race anyway.
> > > 
> > > Yeah it's not pretty, but otoh it's for A stepping and we'll kill it again
> > > once bxt is shipping. I think with a big "IXME: dont ever dare to copy
> > > this" comment this is acceptable. It's not really the worst "my gpu
> > > crawls" workaround we've seend for early hw ...
> > 
> > Thinking about this, an incoherent TAIL write cannot cause IPEHR !=
> > *ACTHD. The flush is just papering over the absence of a flush elsewhere
> > and the root cause remains unfixed.
> 
> It's not the TAIL write that causing IPEHR != *ACTHD. I tried to explain
> this in the commit message, I guess it was not clear. The lack of flush
> atm means that after initializing the context by CPU those initialized
> values will be not necessarily visible by the GPU (due to the coherency
> problem). These initialized values also include the zeroed out values
> returned by shmem/GUP. Eventually this context will get scheduled in/out
> resulting in a context save by the GPU. After this context save we still
> have the contents of the context in the CPU cache that are possibly not
> coherent with the GPU. If now one of these cache lines is evicted (due
> to some unrelated cache thrashing) the values stored by the GPU context
> save will get overwritten by the stale CPU cache values. This is exactly
> what I saw, zeroed out values in the context where there should've been
> valid command dwords.

(Sorry read invalid as unexpected, since you said you couldn't find the
valud in the ring whereas what you meant to say was you couldn't find
the *address*. GPU hang pointing to invalid is nothing unusual.)

That is exactly the reason why we use set-to-domain on legacy context
objects. The standard practice for doing coherent writes is either
mapping the page into the GTT and so doing WC writes, or by explicit
clflush. I do not see why we want to do anything unusual here.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] drm/i915/bxt: work around HW context corruption due to coherency problem
  2015-09-23 13:58         ` Daniel Vetter
@ 2015-09-23 14:39           ` Chris Wilson
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2015-09-23 14:39 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Wed, Sep 23, 2015 at 03:58:37PM +0200, Daniel Vetter wrote:
> On Wed, Sep 23, 2015 at 02:39:13PM +0100, Chris Wilson wrote:
> > On Wed, Sep 23, 2015 at 03:35:59PM +0200, Daniel Vetter wrote:
> > > On Fri, Sep 18, 2015 at 10:02:24AM +0100, Chris Wilson wrote:
> > > > On Thu, Sep 17, 2015 at 07:17:44PM +0300, Imre Deak wrote:
> > > > > The execlist context object is mapped with a CPU/GPU coherent mapping
> > > > > everywhere, but on BXT A stepping due to a HW issue the coherency is not
> > > > > guaranteed. To work around this flush the context object after pinning
> > > > > it (to flush cache lines left by the context initialization/read-back
> > > > > from backing storage) and mark it as uncached so later updates during
> > > > > context switching will be coherent.
> > > > > 
> > > > > I noticed this problem via a GPU hang, where IPEHR pointed to an invalid
> > > > > opcode value. I couldn't find this value on the ring but looking at the
> > > > > contents of the active context object it turned out to be a parameter
> > > > > dword of a bigger command there. The original command opcode itself
> > > > > was zeroed out, based on the above I assume due to a CPU writeback of
> > > > > the corresponding cacheline. When restoring the context the GPU would
> > > > > jump over the zeroed out opcode and hang when trying to execute the
> > > > > above parameter dword.
> > > > > 
> > > > > I could easily reproduce this by running igt/gem_render_copy_redux and
> > > > > gem_tiled_blits/basic in parallel, but I guess it could be triggered by
> > > > > anything involving frequent switches between two separate contexts. With
> > > > > this workaround I couldn't reproduce the problem.
> > > > > 
> > > > > v2:
> > > > > - instead of clflushing after updating the tail and PDP values during
> > > > >   context switching, map the corresponding page as uncached to avoid a
> > > > >   race between CPU and GPU, both updating the same cacheline at the same
> > > > >   time (Ville)
> > > > 
> > > > No. Changing PAT involves a stop_machine() and is severely detrimental
> > > > to performance (context creation overhead does impact userspace).
> > > > Mapping it as uncached doesn't remove the race anyway.
> > > 
> > > Yeah it's not pretty, but otoh it's for A stepping and we'll kill it again
> > > once bxt is shipping. I think with a big "IXME: dont ever dare to copy
> > > this" comment this is acceptable. It's not really the worst "my gpu
> > > crawls" workaround we've seend for early hw ...
> > 
> > Thinking about this, an incoherent TAIL write cannot cause IPEHR !=
> > *ACTHD. The flush is just papering over the absence of a flush elsewhere
> > and the root cause remains unfixed.
> 
> I thought the incoherence happens in the ring itself, not the TAIL update.
> That one might just cause additional fun.

Mostly as a reminder to myself, ringbuffer writes are through the GTT
(even for llc execlists platforms currently, though there have been
patches to relax that...) The only way we could get incoherence directly
would be a missing wmb() between the ring writes and the TAIL/ELSP
commit, or missing wmb() between PTE updates.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] drm/i915/bxt: work around HW context corruption due to coherency problem
  2015-09-23 14:17           ` Chris Wilson
@ 2015-09-23 15:40             ` Imre Deak
  2015-09-23 17:07               ` Imre Deak
  0 siblings, 1 reply; 12+ messages in thread
From: Imre Deak @ 2015-09-23 15:40 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On ke, 2015-09-23 at 15:17 +0100, Chris Wilson wrote:
> On Wed, Sep 23, 2015 at 04:57:48PM +0300, Imre Deak wrote:
> > On ke, 2015-09-23 at 14:39 +0100, Chris Wilson wrote:
> > > On Wed, Sep 23, 2015 at 03:35:59PM +0200, Daniel Vetter wrote:
> > > > On Fri, Sep 18, 2015 at 10:02:24AM +0100, Chris Wilson wrote:
> > > > > On Thu, Sep 17, 2015 at 07:17:44PM +0300, Imre Deak wrote:
> > > > > > The execlist context object is mapped with a CPU/GPU coherent mapping
> > > > > > everywhere, but on BXT A stepping due to a HW issue the coherency is not
> > > > > > guaranteed. To work around this flush the context object after pinning
> > > > > > it (to flush cache lines left by the context initialization/read-back
> > > > > > from backing storage) and mark it as uncached so later updates during
> > > > > > context switching will be coherent.
> > > > > > 
> > > > > > I noticed this problem via a GPU hang, where IPEHR pointed to an invalid
> > > > > > opcode value. I couldn't find this value on the ring but looking at the
> > > > > > contents of the active context object it turned out to be a parameter
> > > > > > dword of a bigger command there. The original command opcode itself
> > > > > > was zeroed out, based on the above I assume due to a CPU writeback of
> > > > > > the corresponding cacheline. When restoring the context the GPU would
> > > > > > jump over the zeroed out opcode and hang when trying to execute the
> > > > > > above parameter dword.
> > > > > > 
> > > > > > I could easily reproduce this by running igt/gem_render_copy_redux and
> > > > > > gem_tiled_blits/basic in parallel, but I guess it could be triggered by
> > > > > > anything involving frequent switches between two separate contexts. With
> > > > > > this workaround I couldn't reproduce the problem.
> > > > > > 
> > > > > > v2:
> > > > > > - instead of clflushing after updating the tail and PDP values during
> > > > > >   context switching, map the corresponding page as uncached to avoid a
> > > > > >   race between CPU and GPU, both updating the same cacheline at the same
> > > > > >   time (Ville)
> > > > > 
> > > > > No. Changing PAT involves a stop_machine() and is severely detrimental
> > > > > to performance (context creation overhead does impact userspace).
> > > > > Mapping it as uncached doesn't remove the race anyway.
> > > > 
> > > > Yeah it's not pretty, but otoh it's for A stepping and we'll kill it again
> > > > once bxt is shipping. I think with a big "IXME: dont ever dare to copy
> > > > this" comment this is acceptable. It's not really the worst "my gpu
> > > > crawls" workaround we've seend for early hw ...
> > > 
> > > Thinking about this, an incoherent TAIL write cannot cause IPEHR !=
> > > *ACTHD. The flush is just papering over the absence of a flush elsewhere
> > > and the root cause remains unfixed.
> > 
> > It's not the TAIL write that causing IPEHR != *ACTHD. I tried to explain
> > this in the commit message, I guess it was not clear. The lack of flush
> > atm means that after initializing the context by CPU those initialized
> > values will be not necessarily visible by the GPU (due to the coherency
> > problem). These initialized values also include the zeroed out values
> > returned by shmem/GUP. Eventually this context will get scheduled in/out
> > resulting in a context save by the GPU. After this context save we still
> > have the contents of the context in the CPU cache that are possibly not
> > coherent with the GPU. If now one of these cache lines is evicted (due
> > to some unrelated cache thrashing) the values stored by the GPU context
> > save will get overwritten by the stale CPU cache values. This is exactly
> > what I saw, zeroed out values in the context where there should've been
> > valid command dwords.
> 
> (Sorry read invalid as unexpected, since you said you couldn't find the
> valud in the ring whereas what you meant to say was you couldn't find
> the *address*. GPU hang pointing to invalid is nothing unusual.)

I was looking for the IPEHR value in the ring and not finding there
started to look at the context where I did find it. But basically yes,
this means the same as not finding the address of the invalid
instruction. 

> That is exactly the reason why we use set-to-domain on legacy context
> objects. The standard practice for doing coherent writes is either
> mapping the page into the GTT and so doing WC writes, or by explicit
> clflush. I do not see why we want to do anything unusual here.

Ok. I was thinking of mapping it to GTT after you noticed the overhead
problem. This would mean mapping it to a mappable area and recently
there was a change to map the context to a high offset in GTT
(GUC_WOPCM_TOP). I guess that prevents us to map it through GTT, unless
we create an aliased mapping. I don't know if creating aliases within
GGTT is allowed.  

Having an explicit flush leaves in the race Ville noticed: when we would
do this flush after writing the TAIL during a light restore context
switch, we would flush the CPU cacheline containing the HEAD value too.
In the same time the GPU may update the HEAD value concurrently doing a
context save. Writing TAIL via an uncached mapping gets rid of this
race, since then we don't have the side-effect of overwriting the HEAD
value. 

--Imre

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] drm/i915/bxt: work around HW context corruption due to coherency problem
  2015-09-23 15:40             ` Imre Deak
@ 2015-09-23 17:07               ` Imre Deak
  0 siblings, 0 replies; 12+ messages in thread
From: Imre Deak @ 2015-09-23 17:07 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On ke, 2015-09-23 at 18:40 +0300, Imre Deak wrote:
> On ke, 2015-09-23 at 15:17 +0100, Chris Wilson wrote:
> > On Wed, Sep 23, 2015 at 04:57:48PM +0300, Imre Deak wrote:
> > > On ke, 2015-09-23 at 14:39 +0100, Chris Wilson wrote:
> > > > On Wed, Sep 23, 2015 at 03:35:59PM +0200, Daniel Vetter wrote:
> > > > > On Fri, Sep 18, 2015 at 10:02:24AM +0100, Chris Wilson wrote:
> > > > > > On Thu, Sep 17, 2015 at 07:17:44PM +0300, Imre Deak wrote:
> > > > > > > The execlist context object is mapped with a CPU/GPU coherent mapping
> > > > > > > everywhere, but on BXT A stepping due to a HW issue the coherency is not
> > > > > > > guaranteed. To work around this flush the context object after pinning
> > > > > > > it (to flush cache lines left by the context initialization/read-back
> > > > > > > from backing storage) and mark it as uncached so later updates during
> > > > > > > context switching will be coherent.
> > > > > > > 
> > > > > > > I noticed this problem via a GPU hang, where IPEHR pointed to an invalid
> > > > > > > opcode value. I couldn't find this value on the ring but looking at the
> > > > > > > contents of the active context object it turned out to be a parameter
> > > > > > > dword of a bigger command there. The original command opcode itself
> > > > > > > was zeroed out, based on the above I assume due to a CPU writeback of
> > > > > > > the corresponding cacheline. When restoring the context the GPU would
> > > > > > > jump over the zeroed out opcode and hang when trying to execute the
> > > > > > > above parameter dword.
> > > > > > > 
> > > > > > > I could easily reproduce this by running igt/gem_render_copy_redux and
> > > > > > > gem_tiled_blits/basic in parallel, but I guess it could be triggered by
> > > > > > > anything involving frequent switches between two separate contexts. With
> > > > > > > this workaround I couldn't reproduce the problem.
> > > > > > > 
> > > > > > > v2:
> > > > > > > - instead of clflushing after updating the tail and PDP values during
> > > > > > >   context switching, map the corresponding page as uncached to avoid a
> > > > > > >   race between CPU and GPU, both updating the same cacheline at the same
> > > > > > >   time (Ville)
> > > > > > 
> > > > > > No. Changing PAT involves a stop_machine() and is severely detrimental
> > > > > > to performance (context creation overhead does impact userspace).
> > > > > > Mapping it as uncached doesn't remove the race anyway.
> > > > > 
> > > > > Yeah it's not pretty, but otoh it's for A stepping and we'll kill it again
> > > > > once bxt is shipping. I think with a big "IXME: dont ever dare to copy
> > > > > this" comment this is acceptable. It's not really the worst "my gpu
> > > > > crawls" workaround we've seend for early hw ...
> > > > 
> > > > Thinking about this, an incoherent TAIL write cannot cause IPEHR !=
> > > > *ACTHD. The flush is just papering over the absence of a flush elsewhere
> > > > and the root cause remains unfixed.
> > > 
> > > It's not the TAIL write that causing IPEHR != *ACTHD. I tried to explain
> > > this in the commit message, I guess it was not clear. The lack of flush
> > > atm means that after initializing the context by CPU those initialized
> > > values will be not necessarily visible by the GPU (due to the coherency
> > > problem). These initialized values also include the zeroed out values
> > > returned by shmem/GUP. Eventually this context will get scheduled in/out
> > > resulting in a context save by the GPU. After this context save we still
> > > have the contents of the context in the CPU cache that are possibly not
> > > coherent with the GPU. If now one of these cache lines is evicted (due
> > > to some unrelated cache thrashing) the values stored by the GPU context
> > > save will get overwritten by the stale CPU cache values. This is exactly
> > > what I saw, zeroed out values in the context where there should've been
> > > valid command dwords.
> > 
> > (Sorry read invalid as unexpected, since you said you couldn't find the
> > valud in the ring whereas what you meant to say was you couldn't find
> > the *address*. GPU hang pointing to invalid is nothing unusual.)
> 
> I was looking for the IPEHR value in the ring and not finding there
> started to look at the context where I did find it. But basically yes,
> this means the same as not finding the address of the invalid
> instruction. 
> 
> > That is exactly the reason why we use set-to-domain on legacy context
> > objects. The standard practice for doing coherent writes is either
> > mapping the page into the GTT and so doing WC writes, or by explicit
> > clflush. I do not see why we want to do anything unusual here.
> 
> Ok. I was thinking of mapping it to GTT after you noticed the overhead
> problem. This would mean mapping it to a mappable area and recently
> there was a change to map the context to a high offset in GTT
> (GUC_WOPCM_TOP). I guess that prevents us to map it through GTT, unless
> we create an aliased mapping. I don't know if creating aliases within
> GGTT is allowed.

Ah sorry I misread the code, the above high offset is only 512k. So that
still allows mapping the context to the mappable area. I think that
solution would work too.

--Imre

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-09-23 17:07 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-17 16:17 [PATCH 1/2] drm/i915/bxt: prevent allocating context object from HIGHMEM Imre Deak
2015-09-17 16:17 ` [PATCH v2 2/2] drm/i915/bxt: work around HW context corruption due to coherency problem Imre Deak
2015-09-18  9:02   ` Chris Wilson
2015-09-18 12:24     ` Imre Deak
2015-09-23 13:35     ` Daniel Vetter
2015-09-23 13:39       ` Chris Wilson
2015-09-23 13:57         ` Imre Deak
2015-09-23 14:17           ` Chris Wilson
2015-09-23 15:40             ` Imre Deak
2015-09-23 17:07               ` Imre Deak
2015-09-23 13:58         ` Daniel Vetter
2015-09-23 14:39           ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.