* [PATCH] drm/i915: Insert a command barrier on BLT/BSD cache flushes
@ 2015-01-22 13:13 Chris Wilson
2015-01-22 13:24 ` [Intel-gfx] " Daniel Vetter
2015-01-22 22:41 ` [PATCH] " shuang.he
0 siblings, 2 replies; 9+ messages in thread
From: Chris Wilson @ 2015-01-22 13:13 UTC (permalink / raw)
To: intel-gfx; +Cc: Chris Wilson, stable
This looked like an odd regression from
commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Thu Jun 12 10:28:55 2014 +0100
drm/i915: Restrict GPU boost to the RCS engine
but in reality it undercovered a much older coherency bug. The issue that
boosting the GPU frequency on the BCS ring was masking was that we could
wake the CPU up after completion of a BCS batch and inspect memory prior
to the write cache being fully evicted. In order to serialise the
breadcrumb interrupt (and so ensure that the CPU's view of memory is
coherent) we need to perform a post-sync operation in the MI_FLUSH_DW.
Testcase: gpuX-rcs-gpu-read-after-write
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
---
drivers/gpu/drm/i915/intel_ringbuffer.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index e18ed05dc0d5..53615eba2a97 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2019,6 +2019,14 @@ static int gen6_ring_flush(struct intel_engine_cs *ring,
cmd = MI_FLUSH_DW;
if (INTEL_INFO(ring->dev)->gen >= 8)
cmd += 1;
+
+ /* We always require a command barrier so that subsequent
+ * commands, such as breadcrumb interrupts, are strictly ordered
+ * wrt the contents of the write cache being flushed to memory
+ * (and thus being coherent from the CPU).
+ */
+ cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
+
/*
* Bspec vol 1c.3 - blitter engine command streamer:
* "If ENABLED, all TLBs will be invalidated once the flush
@@ -2026,8 +2034,7 @@ static int gen6_ring_flush(struct intel_engine_cs *ring,
* Post-Sync Operation field is a value of 1h or 3h."
*/
if (invalidate & I915_GEM_DOMAIN_RENDER)
- cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX |
- MI_FLUSH_DW_OP_STOREDW;
+ cmd |= MI_INVALIDATE_TLB;
intel_ring_emit(ring, cmd);
intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT);
if (INTEL_INFO(ring->dev)->gen >= 8) {
--
2.1.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Intel-gfx] [PATCH] drm/i915: Insert a command barrier on BLT/BSD cache flushes
2015-01-22 13:13 [PATCH] drm/i915: Insert a command barrier on BLT/BSD cache flushes Chris Wilson
@ 2015-01-22 13:24 ` Daniel Vetter
2015-01-22 13:32 ` Chris Wilson
2015-01-22 13:42 ` [PATCH v2] " Chris Wilson
2015-01-22 22:41 ` [PATCH] " shuang.he
1 sibling, 2 replies; 9+ messages in thread
From: Daniel Vetter @ 2015-01-22 13:24 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx, stable
On Thu, Jan 22, 2015 at 2:13 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> This looked like an odd regression from
>
> commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date: Thu Jun 12 10:28:55 2014 +0100
>
> drm/i915: Restrict GPU boost to the RCS engine
>
> but in reality it undercovered a much older coherency bug. The issue that
> boosting the GPU frequency on the BCS ring was masking was that we could
> wake the CPU up after completion of a BCS batch and inspect memory prior
> to the write cache being fully evicted. In order to serialise the
> breadcrumb interrupt (and so ensure that the CPU's view of memory is
> coherent) we need to perform a post-sync operation in the MI_FLUSH_DW.
>
> Testcase: gpuX-rcs-gpu-read-after-write
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: stable@vger.kernel.org
We also need this in gen8_emit_flush and gen6_bsd_ring_flush I think.
And interesting that the subsequent seqno write can apparently be
reordered with cache flushing. Or do we just need lots more of those
(wasn't the magic number once 32 or so)?
Anyway can't argue with hw, so Acked (with the other 2 functions updated).
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] drm/i915: Insert a command barrier on BLT/BSD cache flushes
2015-01-22 13:24 ` [Intel-gfx] " Daniel Vetter
@ 2015-01-22 13:32 ` Chris Wilson
2015-01-22 13:42 ` [PATCH v2] " Chris Wilson
1 sibling, 0 replies; 9+ messages in thread
From: Chris Wilson @ 2015-01-22 13:32 UTC (permalink / raw)
To: Daniel Vetter; +Cc: intel-gfx, stable
On Thu, Jan 22, 2015 at 02:24:18PM +0100, Daniel Vetter wrote:
> On Thu, Jan 22, 2015 at 2:13 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > This looked like an odd regression from
> >
> > commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date: Thu Jun 12 10:28:55 2014 +0100
> >
> > drm/i915: Restrict GPU boost to the RCS engine
> >
> > but in reality it undercovered a much older coherency bug. The issue that
> > boosting the GPU frequency on the BCS ring was masking was that we could
> > wake the CPU up after completion of a BCS batch and inspect memory prior
> > to the write cache being fully evicted. In order to serialise the
> > breadcrumb interrupt (and so ensure that the CPU's view of memory is
> > coherent) we need to perform a post-sync operation in the MI_FLUSH_DW.
> >
> > Testcase: gpuX-rcs-gpu-read-after-write
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: stable@vger.kernel.org
>
> We also need this in gen8_emit_flush and gen6_bsd_ring_flush I think.
>
> And interesting that the subsequent seqno write can apparently be
> reordered with cache flushing. Or do we just need lots more of those
> (wasn't the magic number once 32 or so)?
That magic number was for the reordering of the seqno write with
interrupt generation, iirc. But yes, it does sound oddly reminiscent. At
least they did give us post-sync operations, even if we always end up
having to use then.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2] drm/i915: Insert a command barrier on BLT/BSD cache flushes
2015-01-22 13:24 ` [Intel-gfx] " Daniel Vetter
2015-01-22 13:32 ` Chris Wilson
@ 2015-01-22 13:42 ` Chris Wilson
2015-01-22 22:45 ` shuang.he
2015-02-09 16:00 ` [Intel-gfx] " Jani Nikula
1 sibling, 2 replies; 9+ messages in thread
From: Chris Wilson @ 2015-01-22 13:42 UTC (permalink / raw)
To: intel-gfx; +Cc: Chris Wilson, stable
This looked like an odd regression from
commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Thu Jun 12 10:28:55 2014 +0100
drm/i915: Restrict GPU boost to the RCS engine
but in reality it undercovered a much older coherency bug. The issue that
boosting the GPU frequency on the BCS ring was masking was that we could
wake the CPU up after completion of a BCS batch and inspect memory prior
to the write cache being fully evicted. In order to serialise the
breadcrumb interrupt (and so ensure that the CPU's view of memory is
coherent) we need to perform a post-sync operation in the MI_FLUSH_DW.
v2: Fix all the MI_FLUSH_DW (bsd plus the duplication in execlists).
Testcase: gpuX-rcs-gpu-read-after-write
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Acked-by: Daniel Vetter <daniel@ffwll.ch>
---
drivers/gpu/drm/i915/intel_lrc.c | 20 +++++++++++---------
drivers/gpu/drm/i915/intel_ringbuffer.c | 23 +++++++++++++++++++----
2 files changed, 30 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index e405b61cdac5..8e71d8851c9a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1237,15 +1237,17 @@ static int gen8_emit_flush(struct intel_ringbuffer *ringbuf,
cmd = MI_FLUSH_DW + 1;
- if (ring == &dev_priv->ring[VCS]) {
- if (invalidate_domains & I915_GEM_GPU_DOMAINS)
- cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD |
- MI_FLUSH_DW_STORE_INDEX |
- MI_FLUSH_DW_OP_STOREDW;
- } else {
- if (invalidate_domains & I915_GEM_DOMAIN_RENDER)
- cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX |
- MI_FLUSH_DW_OP_STOREDW;
+ /* We always require a command barrier so that subsequent
+ * commands, such as breadcrumb interrupts, are strictly ordered
+ * wrt the contents of the write cache being flushed to memory
+ * (and thus being coherent from the CPU).
+ */
+ cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
+
+ if (invalidate_domains & I915_GEM_GPU_DOMAINS) {
+ cmd |= MI_INVALIDATE_TLB;
+ if (ring == &dev_priv->ring[VCS])
+ cmd |= MI_INVALIDATE_BSD;
}
intel_logical_ring_emit(ringbuf, cmd);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 23020d67329b..718530fd6c6b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2224,6 +2224,14 @@ static int gen6_bsd_ring_flush(struct intel_engine_cs *ring,
cmd = MI_FLUSH_DW;
if (INTEL_INFO(ring->dev)->gen >= 8)
cmd += 1;
+
+ /* We always require a command barrier so that subsequent
+ * commands, such as breadcrumb interrupts, are strictly ordered
+ * wrt the contents of the write cache being flushed to memory
+ * (and thus being coherent from the CPU).
+ */
+ cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
+
/*
* Bspec vol 1c.5 - video engine command streamer:
* "If ENABLED, all TLBs will be invalidated once the flush
@@ -2231,8 +2239,8 @@ static int gen6_bsd_ring_flush(struct intel_engine_cs *ring,
* Post-Sync Operation field is a value of 1h or 3h."
*/
if (invalidate & I915_GEM_GPU_DOMAINS)
- cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD |
- MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
+ cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD;
+
intel_ring_emit(ring, cmd);
intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT);
if (INTEL_INFO(ring->dev)->gen >= 8) {
@@ -2328,6 +2336,14 @@ static int gen6_ring_flush(struct intel_engine_cs *ring,
cmd = MI_FLUSH_DW;
if (INTEL_INFO(ring->dev)->gen >= 8)
cmd += 1;
+
+ /* We always require a command barrier so that subsequent
+ * commands, such as breadcrumb interrupts, are strictly ordered
+ * wrt the contents of the write cache being flushed to memory
+ * (and thus being coherent from the CPU).
+ */
+ cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
+
/*
* Bspec vol 1c.3 - blitter engine command streamer:
* "If ENABLED, all TLBs will be invalidated once the flush
@@ -2335,8 +2351,7 @@ static int gen6_ring_flush(struct intel_engine_cs *ring,
* Post-Sync Operation field is a value of 1h or 3h."
*/
if (invalidate & I915_GEM_DOMAIN_RENDER)
- cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX |
- MI_FLUSH_DW_OP_STOREDW;
+ cmd |= MI_INVALIDATE_TLB;
intel_ring_emit(ring, cmd);
intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT);
if (INTEL_INFO(ring->dev)->gen >= 8) {
--
2.1.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] drm/i915: Insert a command barrier on BLT/BSD cache flushes
2015-01-22 13:13 [PATCH] drm/i915: Insert a command barrier on BLT/BSD cache flushes Chris Wilson
2015-01-22 13:24 ` [Intel-gfx] " Daniel Vetter
@ 2015-01-22 22:41 ` shuang.he
1 sibling, 0 replies; 9+ messages in thread
From: shuang.he @ 2015-01-22 22:41 UTC (permalink / raw)
To: shuang.he, ethan.gao, intel-gfx, chris
Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 5625
-------------------------------------Summary-------------------------------------
Platform Delta drm-intel-nightly Series Applied
PNV -2 353/353 351/353
ILK 200/200 200/200
SNB 400/422 400/422
IVB 487/487 487/487
BYT 296/296 296/296
HSW 508/508 508/508
BDW 401/402 401/402
-------------------------------------Detailed-------------------------------------
Platform Test drm-intel-nightly Series Applied
*PNV igt_gen3_render_linear_blits PASS(4, M25M23) NRUN(1, M23)
*PNV igt_gen3_render_mixed_blits PASS(3, M25M23) CRASH(1, M23)
Note: You need to pay more attention to line start with '*'
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] drm/i915: Insert a command barrier on BLT/BSD cache flushes
2015-01-22 13:42 ` [PATCH v2] " Chris Wilson
@ 2015-01-22 22:45 ` shuang.he
2015-02-09 16:00 ` [Intel-gfx] " Jani Nikula
1 sibling, 0 replies; 9+ messages in thread
From: shuang.he @ 2015-01-22 22:45 UTC (permalink / raw)
To: shuang.he, ethan.gao, intel-gfx, chris
Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 5626
-------------------------------------Summary-------------------------------------
Platform Delta drm-intel-nightly Series Applied
PNV 353/353 353/353
ILK 353/353 353/353
SNB +1 399/422 400/422
IVB +1-1 486/487 486/487
BYT 296/296 296/296
HSW +1 507/508 508/508
BDW +2 399/402 401/402
-------------------------------------Detailed-------------------------------------
Platform Test drm-intel-nightly Series Applied
SNB igt_kms_flip_nonexisting-fb NSPT(1, M35)PASS(2, M35) PASS(1, M35)
*IVB igt_gem_pwrite_pread_display-copy-performance PASS(2, M21M4) DMESG_WARN(1, M4)
IVB igt_gem_pwrite_pread_snooped-pwrite-blt-cpu_mmap-performance DMESG_WARN(1, M21)PASS(1, M4) PASS(1, M4)
HSW igt_gem_storedw_loop_blt DMESG_WARN(2, M19M20)PASS(1, M20) PASS(1, M20)
BDW igt_gem_pwrite_pread_display-pwrite-blt-gtt_mmap-performance DMESG_WARN(2, M28)PASS(1, M30) PASS(1, M28)
BDW igt_gem_pwrite_pread_uncached-pwrite-blt-gtt_mmap-performance DMESG_WARN(1, M28)PASS(1, M28) PASS(1, M28)
Note: You need to pay more attention to line start with '*'
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Intel-gfx] [PATCH v2] drm/i915: Insert a command barrier on BLT/BSD cache flushes
2015-01-22 13:42 ` [PATCH v2] " Chris Wilson
2015-01-22 22:45 ` shuang.he
@ 2015-02-09 16:00 ` Jani Nikula
2015-02-09 16:07 ` Chris Wilson
1 sibling, 1 reply; 9+ messages in thread
From: Jani Nikula @ 2015-02-09 16:00 UTC (permalink / raw)
To: Chris Wilson, intel-gfx; +Cc: stable, Daniel Vetter
On Thu, 22 Jan 2015, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> This looked like an odd regression from
>
> commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date: Thu Jun 12 10:28:55 2014 +0100
>
> drm/i915: Restrict GPU boost to the RCS engine
>
> but in reality it undercovered a much older coherency bug. The issue that
> boosting the GPU frequency on the BCS ring was masking was that we could
> wake the CPU up after completion of a BCS batch and inspect memory prior
> to the write cache being fully evicted. In order to serialise the
> breadcrumb interrupt (and so ensure that the CPU's view of memory is
> coherent) we need to perform a post-sync operation in the MI_FLUSH_DW.
>
> v2: Fix all the MI_FLUSH_DW (bsd plus the duplication in execlists).
>
> Testcase: gpuX-rcs-gpu-read-after-write
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: stable@vger.kernel.org
> Acked-by: Daniel Vetter <daniel@ffwll.ch>
> ---
> drivers/gpu/drm/i915/intel_lrc.c | 20 +++++++++++---------
> drivers/gpu/drm/i915/intel_ringbuffer.c | 23 +++++++++++++++++++----
> 2 files changed, 30 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index e405b61cdac5..8e71d8851c9a 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1237,15 +1237,17 @@ static int gen8_emit_flush(struct intel_ringbuffer *ringbuf,
>
> cmd = MI_FLUSH_DW + 1;
>
> - if (ring == &dev_priv->ring[VCS]) {
> - if (invalidate_domains & I915_GEM_GPU_DOMAINS)
> - cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD |
> - MI_FLUSH_DW_STORE_INDEX |
> - MI_FLUSH_DW_OP_STOREDW;
> - } else {
> - if (invalidate_domains & I915_GEM_DOMAIN_RENDER)
> - cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX |
> - MI_FLUSH_DW_OP_STOREDW;
> + /* We always require a command barrier so that subsequent
> + * commands, such as breadcrumb interrupts, are strictly ordered
> + * wrt the contents of the write cache being flushed to memory
> + * (and thus being coherent from the CPU).
> + */
> + cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
> +
> + if (invalidate_domains & I915_GEM_GPU_DOMAINS) {
Why do you change the mask from I915_GEM_DOMAIN_RENDER to
I915_GEM_GPU_DOMAINS for ring != VCS?
BR,
Jani.
> + cmd |= MI_INVALIDATE_TLB;
> + if (ring == &dev_priv->ring[VCS])
> + cmd |= MI_INVALIDATE_BSD;
> }
>
> intel_logical_ring_emit(ringbuf, cmd);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 23020d67329b..718530fd6c6b 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2224,6 +2224,14 @@ static int gen6_bsd_ring_flush(struct intel_engine_cs *ring,
> cmd = MI_FLUSH_DW;
> if (INTEL_INFO(ring->dev)->gen >= 8)
> cmd += 1;
> +
> + /* We always require a command barrier so that subsequent
> + * commands, such as breadcrumb interrupts, are strictly ordered
> + * wrt the contents of the write cache being flushed to memory
> + * (and thus being coherent from the CPU).
> + */
> + cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
> +
> /*
> * Bspec vol 1c.5 - video engine command streamer:
> * "If ENABLED, all TLBs will be invalidated once the flush
> @@ -2231,8 +2239,8 @@ static int gen6_bsd_ring_flush(struct intel_engine_cs *ring,
> * Post-Sync Operation field is a value of 1h or 3h."
> */
> if (invalidate & I915_GEM_GPU_DOMAINS)
> - cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD |
> - MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
> + cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD;
> +
> intel_ring_emit(ring, cmd);
> intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT);
> if (INTEL_INFO(ring->dev)->gen >= 8) {
> @@ -2328,6 +2336,14 @@ static int gen6_ring_flush(struct intel_engine_cs *ring,
> cmd = MI_FLUSH_DW;
> if (INTEL_INFO(ring->dev)->gen >= 8)
> cmd += 1;
> +
> + /* We always require a command barrier so that subsequent
> + * commands, such as breadcrumb interrupts, are strictly ordered
> + * wrt the contents of the write cache being flushed to memory
> + * (and thus being coherent from the CPU).
> + */
> + cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
> +
> /*
> * Bspec vol 1c.3 - blitter engine command streamer:
> * "If ENABLED, all TLBs will be invalidated once the flush
> @@ -2335,8 +2351,7 @@ static int gen6_ring_flush(struct intel_engine_cs *ring,
> * Post-Sync Operation field is a value of 1h or 3h."
> */
> if (invalidate & I915_GEM_DOMAIN_RENDER)
> - cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX |
> - MI_FLUSH_DW_OP_STOREDW;
> + cmd |= MI_INVALIDATE_TLB;
> intel_ring_emit(ring, cmd);
> intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT);
> if (INTEL_INFO(ring->dev)->gen >= 8) {
> --
> 2.1.4
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Jani Nikula, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Intel-gfx] [PATCH v2] drm/i915: Insert a command barrier on BLT/BSD cache flushes
2015-02-09 16:00 ` [Intel-gfx] " Jani Nikula
@ 2015-02-09 16:07 ` Chris Wilson
2015-02-09 18:15 ` Jani Nikula
0 siblings, 1 reply; 9+ messages in thread
From: Chris Wilson @ 2015-02-09 16:07 UTC (permalink / raw)
To: Jani Nikula; +Cc: intel-gfx, stable, Daniel Vetter
On Mon, Feb 09, 2015 at 06:00:25PM +0200, Jani Nikula wrote:
> On Thu, 22 Jan 2015, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > This looked like an odd regression from
> >
> > commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date: Thu Jun 12 10:28:55 2014 +0100
> >
> > drm/i915: Restrict GPU boost to the RCS engine
> >
> > but in reality it undercovered a much older coherency bug. The issue that
> > boosting the GPU frequency on the BCS ring was masking was that we could
> > wake the CPU up after completion of a BCS batch and inspect memory prior
> > to the write cache being fully evicted. In order to serialise the
> > breadcrumb interrupt (and so ensure that the CPU's view of memory is
> > coherent) we need to perform a post-sync operation in the MI_FLUSH_DW.
> >
> > v2: Fix all the MI_FLUSH_DW (bsd plus the duplication in execlists).
> >
> > Testcase: gpuX-rcs-gpu-read-after-write
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: stable@vger.kernel.org
> > Acked-by: Daniel Vetter <daniel@ffwll.ch>
> > ---
> > drivers/gpu/drm/i915/intel_lrc.c | 20 +++++++++++---------
> > drivers/gpu/drm/i915/intel_ringbuffer.c | 23 +++++++++++++++++++----
> > 2 files changed, 30 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index e405b61cdac5..8e71d8851c9a 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -1237,15 +1237,17 @@ static int gen8_emit_flush(struct intel_ringbuffer *ringbuf,
> >
> > cmd = MI_FLUSH_DW + 1;
> >
> > - if (ring == &dev_priv->ring[VCS]) {
> > - if (invalidate_domains & I915_GEM_GPU_DOMAINS)
> > - cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD |
> > - MI_FLUSH_DW_STORE_INDEX |
> > - MI_FLUSH_DW_OP_STOREDW;
> > - } else {
> > - if (invalidate_domains & I915_GEM_DOMAIN_RENDER)
> > - cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX |
> > - MI_FLUSH_DW_OP_STOREDW;
> > + /* We always require a command barrier so that subsequent
> > + * commands, such as breadcrumb interrupts, are strictly ordered
> > + * wrt the contents of the write cache being flushed to memory
> > + * (and thus being coherent from the CPU).
> > + */
> > + cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
> > +
> > + if (invalidate_domains & I915_GEM_GPU_DOMAINS) {
>
> Why do you change the mask from I915_GEM_DOMAIN_RENDER to
> I915_GEM_GPU_DOMAINS for ring != VCS?
My bad, I didn't notice that execlists was originally broken. The patch
is correct.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] drm/i915: Insert a command barrier on BLT/BSD cache flushes
2015-02-09 16:07 ` Chris Wilson
@ 2015-02-09 18:15 ` Jani Nikula
0 siblings, 0 replies; 9+ messages in thread
From: Jani Nikula @ 2015-02-09 18:15 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx, stable
On Mon, 09 Feb 2015, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Mon, Feb 09, 2015 at 06:00:25PM +0200, Jani Nikula wrote:
>> On Thu, 22 Jan 2015, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> > This looked like an odd regression from
>> >
>> > commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b
>> > Author: Chris Wilson <chris@chris-wilson.co.uk>
>> > Date: Thu Jun 12 10:28:55 2014 +0100
>> >
>> > drm/i915: Restrict GPU boost to the RCS engine
>> >
>> > but in reality it undercovered a much older coherency bug. The issue that
>> > boosting the GPU frequency on the BCS ring was masking was that we could
>> > wake the CPU up after completion of a BCS batch and inspect memory prior
>> > to the write cache being fully evicted. In order to serialise the
>> > breadcrumb interrupt (and so ensure that the CPU's view of memory is
>> > coherent) we need to perform a post-sync operation in the MI_FLUSH_DW.
>> >
>> > v2: Fix all the MI_FLUSH_DW (bsd plus the duplication in execlists).
>> >
>> > Testcase: gpuX-rcs-gpu-read-after-write
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > Cc: stable@vger.kernel.org
>> > Acked-by: Daniel Vetter <daniel@ffwll.ch>
>> > ---
>> > drivers/gpu/drm/i915/intel_lrc.c | 20 +++++++++++---------
>> > drivers/gpu/drm/i915/intel_ringbuffer.c | 23 +++++++++++++++++++----
>> > 2 files changed, 30 insertions(+), 13 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> > index e405b61cdac5..8e71d8851c9a 100644
>> > --- a/drivers/gpu/drm/i915/intel_lrc.c
>> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> > @@ -1237,15 +1237,17 @@ static int gen8_emit_flush(struct intel_ringbuffer *ringbuf,
>> >
>> > cmd = MI_FLUSH_DW + 1;
>> >
>> > - if (ring == &dev_priv->ring[VCS]) {
>> > - if (invalidate_domains & I915_GEM_GPU_DOMAINS)
>> > - cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD |
>> > - MI_FLUSH_DW_STORE_INDEX |
>> > - MI_FLUSH_DW_OP_STOREDW;
>> > - } else {
>> > - if (invalidate_domains & I915_GEM_DOMAIN_RENDER)
>> > - cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX |
>> > - MI_FLUSH_DW_OP_STOREDW;
>> > + /* We always require a command barrier so that subsequent
>> > + * commands, such as breadcrumb interrupts, are strictly ordered
>> > + * wrt the contents of the write cache being flushed to memory
>> > + * (and thus being coherent from the CPU).
>> > + */
>> > + cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
>> > +
>> > + if (invalidate_domains & I915_GEM_GPU_DOMAINS) {
>>
>> Why do you change the mask from I915_GEM_DOMAIN_RENDER to
>> I915_GEM_GPU_DOMAINS for ring != VCS?
>
> My bad, I didn't notice that execlists was originally broken. The patch
> is correct.
I'll take your and Daniel's word for it. I hope I won't have to regret
not asking you to split this into two patches...
Pushed to drm-intel-next-fixes, thanks for the patch and ack.
BR,
Jani.
> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre
--
Jani Nikula, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-02-09 18:16 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-22 13:13 [PATCH] drm/i915: Insert a command barrier on BLT/BSD cache flushes Chris Wilson
2015-01-22 13:24 ` [Intel-gfx] " Daniel Vetter
2015-01-22 13:32 ` Chris Wilson
2015-01-22 13:42 ` [PATCH v2] " Chris Wilson
2015-01-22 22:45 ` shuang.he
2015-02-09 16:00 ` [Intel-gfx] " Jani Nikula
2015-02-09 16:07 ` Chris Wilson
2015-02-09 18:15 ` Jani Nikula
2015-01-22 22:41 ` [PATCH] " shuang.he
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.