All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Insert a command barrier on BLT/BSD cache flushes
@ 2015-01-22 13:13 Chris Wilson
  2015-01-22 13:24 ` [Intel-gfx] " Daniel Vetter
  2015-01-22 22:41 ` [PATCH] " shuang.he
  0 siblings, 2 replies; 9+ messages in thread
From: Chris Wilson @ 2015-01-22 13:13 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson, stable

This looked like an odd regression from

commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Jun 12 10:28:55 2014 +0100

    drm/i915: Restrict GPU boost to the RCS engine

but in reality it undercovered a much older coherency bug. The issue that
boosting the GPU frequency on the BCS ring was masking was that we could
wake the CPU up after completion of a BCS batch and inspect memory prior
to the write cache being fully evicted. In order to serialise the
breadcrumb interrupt (and so ensure that the CPU's view of memory is
coherent) we need to perform a post-sync operation in the MI_FLUSH_DW.

Testcase: gpuX-rcs-gpu-read-after-write
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index e18ed05dc0d5..53615eba2a97 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2019,6 +2019,14 @@ static int gen6_ring_flush(struct intel_engine_cs *ring,
 	cmd = MI_FLUSH_DW;
 	if (INTEL_INFO(ring->dev)->gen >= 8)
 		cmd += 1;
+
+	/* We always require a command barrier so that subsequent
+	 * commands, such as breadcrumb interrupts, are strictly ordered
+	 * wrt the contents of the write cache being flushed to memory
+	 * (and thus being coherent from the CPU).
+	 */
+	cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
+
 	/*
 	 * Bspec vol 1c.3 - blitter engine command streamer:
 	 * "If ENABLED, all TLBs will be invalidated once the flush
@@ -2026,8 +2034,7 @@ static int gen6_ring_flush(struct intel_engine_cs *ring,
 	 * Post-Sync Operation field is a value of 1h or 3h."
 	 */
 	if (invalidate & I915_GEM_DOMAIN_RENDER)
-		cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX |
-			MI_FLUSH_DW_OP_STOREDW;
+		cmd |= MI_INVALIDATE_TLB;
 	intel_ring_emit(ring, cmd);
 	intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT);
 	if (INTEL_INFO(ring->dev)->gen >= 8) {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Insert a command barrier on BLT/BSD cache flushes
  2015-01-22 13:13 [PATCH] drm/i915: Insert a command barrier on BLT/BSD cache flushes Chris Wilson
@ 2015-01-22 13:24 ` Daniel Vetter
  2015-01-22 13:32   ` Chris Wilson
  2015-01-22 13:42   ` [PATCH v2] " Chris Wilson
  2015-01-22 22:41 ` [PATCH] " shuang.he
  1 sibling, 2 replies; 9+ messages in thread
From: Daniel Vetter @ 2015-01-22 13:24 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, stable

On Thu, Jan 22, 2015 at 2:13 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> This looked like an odd regression from
>
> commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Thu Jun 12 10:28:55 2014 +0100
>
>     drm/i915: Restrict GPU boost to the RCS engine
>
> but in reality it undercovered a much older coherency bug. The issue that
> boosting the GPU frequency on the BCS ring was masking was that we could
> wake the CPU up after completion of a BCS batch and inspect memory prior
> to the write cache being fully evicted. In order to serialise the
> breadcrumb interrupt (and so ensure that the CPU's view of memory is
> coherent) we need to perform a post-sync operation in the MI_FLUSH_DW.
>
> Testcase: gpuX-rcs-gpu-read-after-write
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: stable@vger.kernel.org

We also need this in gen8_emit_flush and gen6_bsd_ring_flush I think.

And interesting that the subsequent seqno write can apparently be
reordered with cache flushing. Or do we just need lots more of those
(wasn't the magic number once 32 or so)?

Anyway can't argue with hw, so Acked (with the other 2 functions updated).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/i915: Insert a command barrier on BLT/BSD cache flushes
  2015-01-22 13:24 ` [Intel-gfx] " Daniel Vetter
@ 2015-01-22 13:32   ` Chris Wilson
  2015-01-22 13:42   ` [PATCH v2] " Chris Wilson
  1 sibling, 0 replies; 9+ messages in thread
From: Chris Wilson @ 2015-01-22 13:32 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, stable

On Thu, Jan 22, 2015 at 02:24:18PM +0100, Daniel Vetter wrote:
> On Thu, Jan 22, 2015 at 2:13 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > This looked like an odd regression from
> >
> > commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Thu Jun 12 10:28:55 2014 +0100
> >
> >     drm/i915: Restrict GPU boost to the RCS engine
> >
> > but in reality it undercovered a much older coherency bug. The issue that
> > boosting the GPU frequency on the BCS ring was masking was that we could
> > wake the CPU up after completion of a BCS batch and inspect memory prior
> > to the write cache being fully evicted. In order to serialise the
> > breadcrumb interrupt (and so ensure that the CPU's view of memory is
> > coherent) we need to perform a post-sync operation in the MI_FLUSH_DW.
> >
> > Testcase: gpuX-rcs-gpu-read-after-write
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: stable@vger.kernel.org
> 
> We also need this in gen8_emit_flush and gen6_bsd_ring_flush I think.
> 
> And interesting that the subsequent seqno write can apparently be
> reordered with cache flushing. Or do we just need lots more of those
> (wasn't the magic number once 32 or so)?

That magic number was for the reordering of the seqno write with
interrupt generation, iirc. But yes, it does sound oddly reminiscent. At
least they did give us post-sync operations, even if we always end up
having to use then.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2] drm/i915: Insert a command barrier on BLT/BSD cache flushes
  2015-01-22 13:24 ` [Intel-gfx] " Daniel Vetter
  2015-01-22 13:32   ` Chris Wilson
@ 2015-01-22 13:42   ` Chris Wilson
  2015-01-22 22:45     ` shuang.he
  2015-02-09 16:00     ` [Intel-gfx] " Jani Nikula
  1 sibling, 2 replies; 9+ messages in thread
From: Chris Wilson @ 2015-01-22 13:42 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson, stable

This looked like an odd regression from

commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Jun 12 10:28:55 2014 +0100

    drm/i915: Restrict GPU boost to the RCS engine

but in reality it undercovered a much older coherency bug. The issue that
boosting the GPU frequency on the BCS ring was masking was that we could
wake the CPU up after completion of a BCS batch and inspect memory prior
to the write cache being fully evicted. In order to serialise the
breadcrumb interrupt (and so ensure that the CPU's view of memory is
coherent) we need to perform a post-sync operation in the MI_FLUSH_DW.

v2: Fix all the MI_FLUSH_DW (bsd plus the duplication in execlists).

Testcase: gpuX-rcs-gpu-read-after-write
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Acked-by: Daniel Vetter <daniel@ffwll.ch>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 20 +++++++++++---------
 drivers/gpu/drm/i915/intel_ringbuffer.c | 23 +++++++++++++++++++----
 2 files changed, 30 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index e405b61cdac5..8e71d8851c9a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1237,15 +1237,17 @@ static int gen8_emit_flush(struct intel_ringbuffer *ringbuf,
 
 	cmd = MI_FLUSH_DW + 1;
 
-	if (ring == &dev_priv->ring[VCS]) {
-		if (invalidate_domains & I915_GEM_GPU_DOMAINS)
-			cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD |
-				MI_FLUSH_DW_STORE_INDEX |
-				MI_FLUSH_DW_OP_STOREDW;
-	} else {
-		if (invalidate_domains & I915_GEM_DOMAIN_RENDER)
-			cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX |
-				MI_FLUSH_DW_OP_STOREDW;
+	/* We always require a command barrier so that subsequent
+	 * commands, such as breadcrumb interrupts, are strictly ordered
+	 * wrt the contents of the write cache being flushed to memory
+	 * (and thus being coherent from the CPU).
+	 */
+	cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
+
+	if (invalidate_domains & I915_GEM_GPU_DOMAINS) {
+		cmd |= MI_INVALIDATE_TLB;
+		if (ring == &dev_priv->ring[VCS])
+			cmd |= MI_INVALIDATE_BSD;
 	}
 
 	intel_logical_ring_emit(ringbuf, cmd);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 23020d67329b..718530fd6c6b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2224,6 +2224,14 @@ static int gen6_bsd_ring_flush(struct intel_engine_cs *ring,
 	cmd = MI_FLUSH_DW;
 	if (INTEL_INFO(ring->dev)->gen >= 8)
 		cmd += 1;
+
+	/* We always require a command barrier so that subsequent
+	 * commands, such as breadcrumb interrupts, are strictly ordered
+	 * wrt the contents of the write cache being flushed to memory
+	 * (and thus being coherent from the CPU).
+	 */
+	cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
+
 	/*
 	 * Bspec vol 1c.5 - video engine command streamer:
 	 * "If ENABLED, all TLBs will be invalidated once the flush
@@ -2231,8 +2239,8 @@ static int gen6_bsd_ring_flush(struct intel_engine_cs *ring,
 	 * Post-Sync Operation field is a value of 1h or 3h."
 	 */
 	if (invalidate & I915_GEM_GPU_DOMAINS)
-		cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD |
-			MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
+		cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD;
+
 	intel_ring_emit(ring, cmd);
 	intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT);
 	if (INTEL_INFO(ring->dev)->gen >= 8) {
@@ -2328,6 +2336,14 @@ static int gen6_ring_flush(struct intel_engine_cs *ring,
 	cmd = MI_FLUSH_DW;
 	if (INTEL_INFO(ring->dev)->gen >= 8)
 		cmd += 1;
+
+	/* We always require a command barrier so that subsequent
+	 * commands, such as breadcrumb interrupts, are strictly ordered
+	 * wrt the contents of the write cache being flushed to memory
+	 * (and thus being coherent from the CPU).
+	 */
+	cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
+
 	/*
 	 * Bspec vol 1c.3 - blitter engine command streamer:
 	 * "If ENABLED, all TLBs will be invalidated once the flush
@@ -2335,8 +2351,7 @@ static int gen6_ring_flush(struct intel_engine_cs *ring,
 	 * Post-Sync Operation field is a value of 1h or 3h."
 	 */
 	if (invalidate & I915_GEM_DOMAIN_RENDER)
-		cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX |
-			MI_FLUSH_DW_OP_STOREDW;
+		cmd |= MI_INVALIDATE_TLB;
 	intel_ring_emit(ring, cmd);
 	intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT);
 	if (INTEL_INFO(ring->dev)->gen >= 8) {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/i915: Insert a command barrier on BLT/BSD cache flushes
  2015-01-22 13:13 [PATCH] drm/i915: Insert a command barrier on BLT/BSD cache flushes Chris Wilson
  2015-01-22 13:24 ` [Intel-gfx] " Daniel Vetter
@ 2015-01-22 22:41 ` shuang.he
  1 sibling, 0 replies; 9+ messages in thread
From: shuang.he @ 2015-01-22 22:41 UTC (permalink / raw)
  To: shuang.he, ethan.gao, intel-gfx, chris

Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 5625
-------------------------------------Summary-------------------------------------
Platform          Delta          drm-intel-nightly          Series Applied
PNV                 -2              353/353              351/353
ILK                                  200/200              200/200
SNB                                  400/422              400/422
IVB                                  487/487              487/487
BYT                                  296/296              296/296
HSW                                  508/508              508/508
BDW                                  401/402              401/402
-------------------------------------Detailed-------------------------------------
Platform  Test                                drm-intel-nightly          Series Applied
*PNV  igt_gen3_render_linear_blits      PASS(4, M25M23)      NRUN(1, M23)
*PNV  igt_gen3_render_mixed_blits      PASS(3, M25M23)      CRASH(1, M23)
Note: You need to pay more attention to line start with '*'
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] drm/i915: Insert a command barrier on BLT/BSD cache flushes
  2015-01-22 13:42   ` [PATCH v2] " Chris Wilson
@ 2015-01-22 22:45     ` shuang.he
  2015-02-09 16:00     ` [Intel-gfx] " Jani Nikula
  1 sibling, 0 replies; 9+ messages in thread
From: shuang.he @ 2015-01-22 22:45 UTC (permalink / raw)
  To: shuang.he, ethan.gao, intel-gfx, chris

Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 5626
-------------------------------------Summary-------------------------------------
Platform          Delta          drm-intel-nightly          Series Applied
PNV                                  353/353              353/353
ILK                                  353/353              353/353
SNB              +1                 399/422              400/422
IVB              +1-1              486/487              486/487
BYT                                  296/296              296/296
HSW              +1                 507/508              508/508
BDW              +2                 399/402              401/402
-------------------------------------Detailed-------------------------------------
Platform  Test                                drm-intel-nightly          Series Applied
 SNB  igt_kms_flip_nonexisting-fb      NSPT(1, M35)PASS(2, M35)      PASS(1, M35)
*IVB  igt_gem_pwrite_pread_display-copy-performance      PASS(2, M21M4)      DMESG_WARN(1, M4)
 IVB  igt_gem_pwrite_pread_snooped-pwrite-blt-cpu_mmap-performance      DMESG_WARN(1, M21)PASS(1, M4)      PASS(1, M4)
 HSW  igt_gem_storedw_loop_blt      DMESG_WARN(2, M19M20)PASS(1, M20)      PASS(1, M20)
 BDW  igt_gem_pwrite_pread_display-pwrite-blt-gtt_mmap-performance      DMESG_WARN(2, M28)PASS(1, M30)      PASS(1, M28)
 BDW  igt_gem_pwrite_pread_uncached-pwrite-blt-gtt_mmap-performance      DMESG_WARN(1, M28)PASS(1, M28)      PASS(1, M28)
Note: You need to pay more attention to line start with '*'
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Intel-gfx] [PATCH v2] drm/i915: Insert a command barrier on BLT/BSD cache flushes
  2015-01-22 13:42   ` [PATCH v2] " Chris Wilson
  2015-01-22 22:45     ` shuang.he
@ 2015-02-09 16:00     ` Jani Nikula
  2015-02-09 16:07       ` Chris Wilson
  1 sibling, 1 reply; 9+ messages in thread
From: Jani Nikula @ 2015-02-09 16:00 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: stable, Daniel Vetter

On Thu, 22 Jan 2015, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> This looked like an odd regression from
>
> commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Thu Jun 12 10:28:55 2014 +0100
>
>     drm/i915: Restrict GPU boost to the RCS engine
>
> but in reality it undercovered a much older coherency bug. The issue that
> boosting the GPU frequency on the BCS ring was masking was that we could
> wake the CPU up after completion of a BCS batch and inspect memory prior
> to the write cache being fully evicted. In order to serialise the
> breadcrumb interrupt (and so ensure that the CPU's view of memory is
> coherent) we need to perform a post-sync operation in the MI_FLUSH_DW.
>
> v2: Fix all the MI_FLUSH_DW (bsd plus the duplication in execlists).
>
> Testcase: gpuX-rcs-gpu-read-after-write
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: stable@vger.kernel.org
> Acked-by: Daniel Vetter <daniel@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c        | 20 +++++++++++---------
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 23 +++++++++++++++++++----
>  2 files changed, 30 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index e405b61cdac5..8e71d8851c9a 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1237,15 +1237,17 @@ static int gen8_emit_flush(struct intel_ringbuffer *ringbuf,
>  
>  	cmd = MI_FLUSH_DW + 1;
>  
> -	if (ring == &dev_priv->ring[VCS]) {
> -		if (invalidate_domains & I915_GEM_GPU_DOMAINS)
> -			cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD |
> -				MI_FLUSH_DW_STORE_INDEX |
> -				MI_FLUSH_DW_OP_STOREDW;
> -	} else {
> -		if (invalidate_domains & I915_GEM_DOMAIN_RENDER)
> -			cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX |
> -				MI_FLUSH_DW_OP_STOREDW;
> +	/* We always require a command barrier so that subsequent
> +	 * commands, such as breadcrumb interrupts, are strictly ordered
> +	 * wrt the contents of the write cache being flushed to memory
> +	 * (and thus being coherent from the CPU).
> +	 */
> +	cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
> +
> +	if (invalidate_domains & I915_GEM_GPU_DOMAINS) {

Why do you change the mask from I915_GEM_DOMAIN_RENDER to
I915_GEM_GPU_DOMAINS for ring != VCS?

BR,
Jani.

> +		cmd |= MI_INVALIDATE_TLB;
> +		if (ring == &dev_priv->ring[VCS])
> +			cmd |= MI_INVALIDATE_BSD;
>  	}
>  
>  	intel_logical_ring_emit(ringbuf, cmd);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 23020d67329b..718530fd6c6b 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2224,6 +2224,14 @@ static int gen6_bsd_ring_flush(struct intel_engine_cs *ring,
>  	cmd = MI_FLUSH_DW;
>  	if (INTEL_INFO(ring->dev)->gen >= 8)
>  		cmd += 1;
> +
> +	/* We always require a command barrier so that subsequent
> +	 * commands, such as breadcrumb interrupts, are strictly ordered
> +	 * wrt the contents of the write cache being flushed to memory
> +	 * (and thus being coherent from the CPU).
> +	 */
> +	cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
> +
>  	/*
>  	 * Bspec vol 1c.5 - video engine command streamer:
>  	 * "If ENABLED, all TLBs will be invalidated once the flush
> @@ -2231,8 +2239,8 @@ static int gen6_bsd_ring_flush(struct intel_engine_cs *ring,
>  	 * Post-Sync Operation field is a value of 1h or 3h."
>  	 */
>  	if (invalidate & I915_GEM_GPU_DOMAINS)
> -		cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD |
> -			MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
> +		cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD;
> +
>  	intel_ring_emit(ring, cmd);
>  	intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT);
>  	if (INTEL_INFO(ring->dev)->gen >= 8) {
> @@ -2328,6 +2336,14 @@ static int gen6_ring_flush(struct intel_engine_cs *ring,
>  	cmd = MI_FLUSH_DW;
>  	if (INTEL_INFO(ring->dev)->gen >= 8)
>  		cmd += 1;
> +
> +	/* We always require a command barrier so that subsequent
> +	 * commands, such as breadcrumb interrupts, are strictly ordered
> +	 * wrt the contents of the write cache being flushed to memory
> +	 * (and thus being coherent from the CPU).
> +	 */
> +	cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
> +
>  	/*
>  	 * Bspec vol 1c.3 - blitter engine command streamer:
>  	 * "If ENABLED, all TLBs will be invalidated once the flush
> @@ -2335,8 +2351,7 @@ static int gen6_ring_flush(struct intel_engine_cs *ring,
>  	 * Post-Sync Operation field is a value of 1h or 3h."
>  	 */
>  	if (invalidate & I915_GEM_DOMAIN_RENDER)
> -		cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX |
> -			MI_FLUSH_DW_OP_STOREDW;
> +		cmd |= MI_INVALIDATE_TLB;
>  	intel_ring_emit(ring, cmd);
>  	intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT);
>  	if (INTEL_INFO(ring->dev)->gen >= 8) {
> -- 
> 2.1.4
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Intel-gfx] [PATCH v2] drm/i915: Insert a command barrier on BLT/BSD cache flushes
  2015-02-09 16:00     ` [Intel-gfx] " Jani Nikula
@ 2015-02-09 16:07       ` Chris Wilson
  2015-02-09 18:15         ` Jani Nikula
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Wilson @ 2015-02-09 16:07 UTC (permalink / raw)
  To: Jani Nikula; +Cc: intel-gfx, stable, Daniel Vetter

On Mon, Feb 09, 2015 at 06:00:25PM +0200, Jani Nikula wrote:
> On Thu, 22 Jan 2015, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > This looked like an odd regression from
> >
> > commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Thu Jun 12 10:28:55 2014 +0100
> >
> >     drm/i915: Restrict GPU boost to the RCS engine
> >
> > but in reality it undercovered a much older coherency bug. The issue that
> > boosting the GPU frequency on the BCS ring was masking was that we could
> > wake the CPU up after completion of a BCS batch and inspect memory prior
> > to the write cache being fully evicted. In order to serialise the
> > breadcrumb interrupt (and so ensure that the CPU's view of memory is
> > coherent) we need to perform a post-sync operation in the MI_FLUSH_DW.
> >
> > v2: Fix all the MI_FLUSH_DW (bsd plus the duplication in execlists).
> >
> > Testcase: gpuX-rcs-gpu-read-after-write
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: stable@vger.kernel.org
> > Acked-by: Daniel Vetter <daniel@ffwll.ch>
> > ---
> >  drivers/gpu/drm/i915/intel_lrc.c        | 20 +++++++++++---------
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 23 +++++++++++++++++++----
> >  2 files changed, 30 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index e405b61cdac5..8e71d8851c9a 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -1237,15 +1237,17 @@ static int gen8_emit_flush(struct intel_ringbuffer *ringbuf,
> >  
> >  	cmd = MI_FLUSH_DW + 1;
> >  
> > -	if (ring == &dev_priv->ring[VCS]) {
> > -		if (invalidate_domains & I915_GEM_GPU_DOMAINS)
> > -			cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD |
> > -				MI_FLUSH_DW_STORE_INDEX |
> > -				MI_FLUSH_DW_OP_STOREDW;
> > -	} else {
> > -		if (invalidate_domains & I915_GEM_DOMAIN_RENDER)
> > -			cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX |
> > -				MI_FLUSH_DW_OP_STOREDW;
> > +	/* We always require a command barrier so that subsequent
> > +	 * commands, such as breadcrumb interrupts, are strictly ordered
> > +	 * wrt the contents of the write cache being flushed to memory
> > +	 * (and thus being coherent from the CPU).
> > +	 */
> > +	cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
> > +
> > +	if (invalidate_domains & I915_GEM_GPU_DOMAINS) {
> 
> Why do you change the mask from I915_GEM_DOMAIN_RENDER to
> I915_GEM_GPU_DOMAINS for ring != VCS?

My bad, I didn't notice that execlists was originally broken. The patch
is correct.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] drm/i915: Insert a command barrier on BLT/BSD cache flushes
  2015-02-09 16:07       ` Chris Wilson
@ 2015-02-09 18:15         ` Jani Nikula
  0 siblings, 0 replies; 9+ messages in thread
From: Jani Nikula @ 2015-02-09 18:15 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, stable

On Mon, 09 Feb 2015, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Mon, Feb 09, 2015 at 06:00:25PM +0200, Jani Nikula wrote:
>> On Thu, 22 Jan 2015, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> > This looked like an odd regression from
>> >
>> > commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b
>> > Author: Chris Wilson <chris@chris-wilson.co.uk>
>> > Date:   Thu Jun 12 10:28:55 2014 +0100
>> >
>> >     drm/i915: Restrict GPU boost to the RCS engine
>> >
>> > but in reality it undercovered a much older coherency bug. The issue that
>> > boosting the GPU frequency on the BCS ring was masking was that we could
>> > wake the CPU up after completion of a BCS batch and inspect memory prior
>> > to the write cache being fully evicted. In order to serialise the
>> > breadcrumb interrupt (and so ensure that the CPU's view of memory is
>> > coherent) we need to perform a post-sync operation in the MI_FLUSH_DW.
>> >
>> > v2: Fix all the MI_FLUSH_DW (bsd plus the duplication in execlists).
>> >
>> > Testcase: gpuX-rcs-gpu-read-after-write
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > Cc: stable@vger.kernel.org
>> > Acked-by: Daniel Vetter <daniel@ffwll.ch>
>> > ---
>> >  drivers/gpu/drm/i915/intel_lrc.c        | 20 +++++++++++---------
>> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 23 +++++++++++++++++++----
>> >  2 files changed, 30 insertions(+), 13 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> > index e405b61cdac5..8e71d8851c9a 100644
>> > --- a/drivers/gpu/drm/i915/intel_lrc.c
>> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> > @@ -1237,15 +1237,17 @@ static int gen8_emit_flush(struct intel_ringbuffer *ringbuf,
>> >  
>> >  	cmd = MI_FLUSH_DW + 1;
>> >  
>> > -	if (ring == &dev_priv->ring[VCS]) {
>> > -		if (invalidate_domains & I915_GEM_GPU_DOMAINS)
>> > -			cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD |
>> > -				MI_FLUSH_DW_STORE_INDEX |
>> > -				MI_FLUSH_DW_OP_STOREDW;
>> > -	} else {
>> > -		if (invalidate_domains & I915_GEM_DOMAIN_RENDER)
>> > -			cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX |
>> > -				MI_FLUSH_DW_OP_STOREDW;
>> > +	/* We always require a command barrier so that subsequent
>> > +	 * commands, such as breadcrumb interrupts, are strictly ordered
>> > +	 * wrt the contents of the write cache being flushed to memory
>> > +	 * (and thus being coherent from the CPU).
>> > +	 */
>> > +	cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
>> > +
>> > +	if (invalidate_domains & I915_GEM_GPU_DOMAINS) {
>> 
>> Why do you change the mask from I915_GEM_DOMAIN_RENDER to
>> I915_GEM_GPU_DOMAINS for ring != VCS?
>
> My bad, I didn't notice that execlists was originally broken. The patch
> is correct.

I'll take your and Daniel's word for it. I hope I won't have to regret
not asking you to split this into two patches...

Pushed to drm-intel-next-fixes, thanks for the patch and ack.

BR,
Jani.



> -Chris
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre

-- 
Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-02-09 18:16 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-22 13:13 [PATCH] drm/i915: Insert a command barrier on BLT/BSD cache flushes Chris Wilson
2015-01-22 13:24 ` [Intel-gfx] " Daniel Vetter
2015-01-22 13:32   ` Chris Wilson
2015-01-22 13:42   ` [PATCH v2] " Chris Wilson
2015-01-22 22:45     ` shuang.he
2015-02-09 16:00     ` [Intel-gfx] " Jani Nikula
2015-02-09 16:07       ` Chris Wilson
2015-02-09 18:15         ` Jani Nikula
2015-01-22 22:41 ` [PATCH] " shuang.he

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.