All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Unconditionally flush writes before execbuffer
@ 2015-05-11  7:51 Chris Wilson
  2015-05-11 10:34   ` Daniel Vetter
  2015-05-14 11:52 ` shuang.he
  0 siblings, 2 replies; 19+ messages in thread
From: Chris Wilson @ 2015-05-11  7:51 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson, Akash Goel, stable

With the advent of mmap(wc), we have a path to write directly into
active GPU buffers. When combined with async updates (i.e. avoiding the
explicit domain management along with the memory barriers and GPU
stalls) we start to see the GPU read the wrong values from memory - i.e.
we have insufficient memory barriers along the execbuffer path. Writes
through the GTT should have been naturally serialised with execution
through the GTT as well and so the impact only seems to be from the WC
paths.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 650ae02484b0..4f97275ba799 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1127,8 +1127,12 @@ i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs *ring,
 	if (flush_chipset)
 		i915_gem_chipset_flush(ring->dev);
 
-	if (flush_domains & I915_GEM_DOMAIN_GTT)
-		wmb();
+	/* Unconditionally flush out writes to memory as the user may be
+	 * doing asynchronous streaming writes to active buffers (i.e.
+	 * lazy domain management to avoid serialisation) directly into
+	 * the physical pages and so not naturally serialised by the GTT.
+	 */
+	wmb();
 
 	/* Unconditionally invalidate gpu caches and ensure that we do flush
 	 * any residual writes from the previous batch.
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Unconditionally flush writes before execbuffer
  2015-05-11  7:51 [PATCH] drm/i915: Unconditionally flush writes before execbuffer Chris Wilson
@ 2015-05-11 10:34   ` Daniel Vetter
  2015-05-14 11:52 ` shuang.he
  1 sibling, 0 replies; 19+ messages in thread
From: Daniel Vetter @ 2015-05-11 10:34 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, Akash Goel, stable

On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote:
> With the advent of mmap(wc), we have a path to write directly into
> active GPU buffers. When combined with async updates (i.e. avoiding the
> explicit domain management along with the memory barriers and GPU
> stalls) we start to see the GPU read the wrong values from memory - i.e.
> we have insufficient memory barriers along the execbuffer path. Writes
> through the GTT should have been naturally serialised with execution
> through the GTT as well and so the impact only seems to be from the WC
> paths.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Akash Goel <akash.goel@intel.com>
> Cc: stable@vger.kernel.org

Do we have a nasty igt for this? Bugzilla?
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 650ae02484b0..4f97275ba799 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1127,8 +1127,12 @@ i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs *ring,
>  	if (flush_chipset)
>  		i915_gem_chipset_flush(ring->dev);
>  
> -	if (flush_domains & I915_GEM_DOMAIN_GTT)
> -		wmb();
> +	/* Unconditionally flush out writes to memory as the user may be
> +	 * doing asynchronous streaming writes to active buffers (i.e.
> +	 * lazy domain management to avoid serialisation) directly into
> +	 * the physical pages and so not naturally serialised by the GTT.
> +	 */
> +	wmb();
>  
>  	/* Unconditionally invalidate gpu caches and ensure that we do flush
>  	 * any residual writes from the previous batch.
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/i915: Unconditionally flush writes before execbuffer
@ 2015-05-11 10:34   ` Daniel Vetter
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Vetter @ 2015-05-11 10:34 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, Akash Goel, stable

On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote:
> With the advent of mmap(wc), we have a path to write directly into
> active GPU buffers. When combined with async updates (i.e. avoiding the
> explicit domain management along with the memory barriers and GPU
> stalls) we start to see the GPU read the wrong values from memory - i.e.
> we have insufficient memory barriers along the execbuffer path. Writes
> through the GTT should have been naturally serialised with execution
> through the GTT as well and so the impact only seems to be from the WC
> paths.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Akash Goel <akash.goel@intel.com>
> Cc: stable@vger.kernel.org

Do we have a nasty igt for this? Bugzilla?
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 650ae02484b0..4f97275ba799 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1127,8 +1127,12 @@ i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs *ring,
>  	if (flush_chipset)
>  		i915_gem_chipset_flush(ring->dev);
>  
> -	if (flush_domains & I915_GEM_DOMAIN_GTT)
> -		wmb();
> +	/* Unconditionally flush out writes to memory as the user may be
> +	 * doing asynchronous streaming writes to active buffers (i.e.
> +	 * lazy domain management to avoid serialisation) directly into
> +	 * the physical pages and so not naturally serialised by the GTT.
> +	 */
> +	wmb();
>  
>  	/* Unconditionally invalidate gpu caches and ensure that we do flush
>  	 * any residual writes from the previous batch.
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Unconditionally flush writes before execbuffer
  2015-05-11 10:34   ` Daniel Vetter
  (?)
@ 2015-05-11 10:37   ` Chris Wilson
  -1 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2015-05-11 10:37 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, Akash Goel, stable

On Mon, May 11, 2015 at 12:34:37PM +0200, Daniel Vetter wrote:
> On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote:
> > With the advent of mmap(wc), we have a path to write directly into
> > active GPU buffers. When combined with async updates (i.e. avoiding the
> > explicit domain management along with the memory barriers and GPU
> > stalls) we start to see the GPU read the wrong values from memory - i.e.
> > we have insufficient memory barriers along the execbuffer path. Writes
> > through the GTT should have been naturally serialised with execution
> > through the GTT as well and so the impact only seems to be from the WC
> > paths.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Akash Goel <akash.goel@intel.com>
> > Cc: stable@vger.kernel.org
> 
> Do we have a nasty igt for this? Bugzilla?

Maybe some of the 4.0 mmap(wc) regressions, we don't have an igt (yet).
The corruption, spotted whilst playing with enabling mmap(wc) for mesa
on byt, is very sporadic and so capturing it in a concise igt may be
quite difficult.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Unconditionally flush writes before execbuffer
  2015-05-11 10:34   ` Daniel Vetter
@ 2015-05-11 15:25     ` Chris Wilson
  -1 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2015-05-11 15:25 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, Akash Goel, stable

On Mon, May 11, 2015 at 12:34:37PM +0200, Daniel Vetter wrote:
> On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote:
> > With the advent of mmap(wc), we have a path to write directly into
> > active GPU buffers. When combined with async updates (i.e. avoiding the
> > explicit domain management along with the memory barriers and GPU
> > stalls) we start to see the GPU read the wrong values from memory - i.e.
> > we have insufficient memory barriers along the execbuffer path. Writes
> > through the GTT should have been naturally serialised with execution
> > through the GTT as well and so the impact only seems to be from the WC
> > paths.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Akash Goel <akash.goel@intel.com>
> > Cc: stable@vger.kernel.org
> 
> Do we have a nasty igt for this? Bugzilla?

I've added igt/gem_streaming_writes.

That wmb() is not enough for !llc. Since the wmb() made piglit happy it
is quite possible I haven't hit the same path exactly, but it's going to
take some investigation to see if igt/gem_streaming_writes can possibly
work on !llc.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/i915: Unconditionally flush writes before execbuffer
@ 2015-05-11 15:25     ` Chris Wilson
  0 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2015-05-11 15:25 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, Akash Goel, stable

On Mon, May 11, 2015 at 12:34:37PM +0200, Daniel Vetter wrote:
> On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote:
> > With the advent of mmap(wc), we have a path to write directly into
> > active GPU buffers. When combined with async updates (i.e. avoiding the
> > explicit domain management along with the memory barriers and GPU
> > stalls) we start to see the GPU read the wrong values from memory - i.e.
> > we have insufficient memory barriers along the execbuffer path. Writes
> > through the GTT should have been naturally serialised with execution
> > through the GTT as well and so the impact only seems to be from the WC
> > paths.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Akash Goel <akash.goel@intel.com>
> > Cc: stable@vger.kernel.org
> 
> Do we have a nasty igt for this? Bugzilla?

I've added igt/gem_streaming_writes.

That wmb() is not enough for !llc. Since the wmb() made piglit happy it
is quite possible I haven't hit the same path exactly, but it's going to
take some investigation to see if igt/gem_streaming_writes can possibly
work on !llc.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Unconditionally flush writes before execbuffer
  2015-05-11 15:25     ` Chris Wilson
  (?)
@ 2015-05-12 10:19     ` Chris Wilson
  -1 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2015-05-12 10:19 UTC (permalink / raw)
  To: Daniel Vetter, intel-gfx, Akash Goel, stable

On Mon, May 11, 2015 at 04:25:52PM +0100, Chris Wilson wrote:
> On Mon, May 11, 2015 at 12:34:37PM +0200, Daniel Vetter wrote:
> > On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote:
> > > With the advent of mmap(wc), we have a path to write directly into
> > > active GPU buffers. When combined with async updates (i.e. avoiding the
> > > explicit domain management along with the memory barriers and GPU
> > > stalls) we start to see the GPU read the wrong values from memory - i.e.
> > > we have insufficient memory barriers along the execbuffer path. Writes
> > > through the GTT should have been naturally serialised with execution
> > > through the GTT as well and so the impact only seems to be from the WC
> > > paths.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Akash Goel <akash.goel@intel.com>
> > > Cc: stable@vger.kernel.org
> > 
> > Do we have a nasty igt for this? Bugzilla?
> 
> I've added igt/gem_streaming_writes.

So far the only thing that makes a difference on !llc is hitting it with
wbinvd().

I wonder if EXEC_OBJECT_WBINVD (or EXEC_OBJECT_WMB to be less specific)
is acceptable?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/i915: Unconditionally flush writes before execbuffer
  2015-05-11  7:51 [PATCH] drm/i915: Unconditionally flush writes before execbuffer Chris Wilson
  2015-05-11 10:34   ` Daniel Vetter
@ 2015-05-14 11:52 ` shuang.he
  1 sibling, 0 replies; 19+ messages in thread
From: shuang.he @ 2015-05-14 11:52 UTC (permalink / raw)
  To: shuang.he, ethan.gao, intel-gfx, chris

Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 6378
-------------------------------------Summary-------------------------------------
Platform          Delta          drm-intel-nightly          Series Applied
PNV                                  272/272              272/272
ILK                                  302/302              302/302
SNB                 -1              315/315              314/315
IVB                                  343/343              343/343
BYT                                  287/287              287/287
BDW                                  317/317              317/317
-------------------------------------Detailed-------------------------------------
Platform  Test                                drm-intel-nightly          Series Applied
 SNB  igt@pm_rpm@dpms-mode-unset-non-lpsp      DMESG_WARN(6)PASS(1)      DMESG_WARN(1)
(dmesg patch applied)WARNING:at_drivers/gpu/drm/i915/intel_uncore.c:#assert_device_not_suspended[i915]()@WARNING:.* at .* assert_device_not_suspended+0x
Note: You need to pay more attention to line start with '*'
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Unconditionally flush writes before execbuffer
  2015-05-11 15:25     ` Chris Wilson
  (?)
  (?)
@ 2015-05-19 14:41     ` Chris Wilson
  2015-05-21 13:00       ` Chris Wilson
  -1 siblings, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2015-05-19 14:41 UTC (permalink / raw)
  To: Daniel Vetter, intel-gfx, Akash Goel, stable

On Mon, May 11, 2015 at 04:25:52PM +0100, Chris Wilson wrote:
> On Mon, May 11, 2015 at 12:34:37PM +0200, Daniel Vetter wrote:
> > On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote:
> > > With the advent of mmap(wc), we have a path to write directly into
> > > active GPU buffers. When combined with async updates (i.e. avoiding the
> > > explicit domain management along with the memory barriers and GPU
> > > stalls) we start to see the GPU read the wrong values from memory - i.e.
> > > we have insufficient memory barriers along the execbuffer path. Writes
> > > through the GTT should have been naturally serialised with execution
> > > through the GTT as well and so the impact only seems to be from the WC
> > > paths.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Akash Goel <akash.goel@intel.com>
> > > Cc: stable@vger.kernel.org
> > 
> > Do we have a nasty igt for this? Bugzilla?
> 
> I've added igt/gem_streaming_writes.
> 
> That wmb() is not enough for !llc. Since the wmb() made piglit happy it
> is quite possible I haven't hit the same path exactly, but it's going to
> take some investigation to see if igt/gem_streaming_writes can possibly
> work on !llc.

Humbug.

Found the bug in gem_streaming_writes, even though I still think the
wmb() is strictly required, it runs fine without (presumably I haven't
managed to avoid all barriers in the execbuffer path yet). However, I
think can improve the stress by inserting extra gpu load -- that should
help make the CPU writes / GPU reads of the buffer concurrent?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Unconditionally flush writes before execbuffer
  2015-05-19 14:41     ` Chris Wilson
@ 2015-05-21 13:00       ` Chris Wilson
  2015-05-21 13:07         ` Daniel Vetter
  2015-05-21 20:29           ` Jesse Barnes
  0 siblings, 2 replies; 19+ messages in thread
From: Chris Wilson @ 2015-05-21 13:00 UTC (permalink / raw)
  To: Daniel Vetter, intel-gfx, Akash Goel, stable

On Tue, May 19, 2015 at 03:41:48PM +0100, Chris Wilson wrote:
> On Mon, May 11, 2015 at 04:25:52PM +0100, Chris Wilson wrote:
> > On Mon, May 11, 2015 at 12:34:37PM +0200, Daniel Vetter wrote:
> > > On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote:
> > > > With the advent of mmap(wc), we have a path to write directly into
> > > > active GPU buffers. When combined with async updates (i.e. avoiding the
> > > > explicit domain management along with the memory barriers and GPU
> > > > stalls) we start to see the GPU read the wrong values from memory - i.e.
> > > > we have insufficient memory barriers along the execbuffer path. Writes
> > > > through the GTT should have been naturally serialised with execution
> > > > through the GTT as well and so the impact only seems to be from the WC
> > > > paths.
> > > > 
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Cc: Akash Goel <akash.goel@intel.com>
> > > > Cc: stable@vger.kernel.org
> > > 
> > > Do we have a nasty igt for this? Bugzilla?
> > 
> > I've added igt/gem_streaming_writes.
> > 
> > That wmb() is not enough for !llc. Since the wmb() made piglit happy it
> > is quite possible I haven't hit the same path exactly, but it's going to
> > take some investigation to see if igt/gem_streaming_writes can possibly
> > work on !llc.
> 
> Humbug.
> 
> Found the bug in gem_streaming_writes, even though I still think the
> wmb() is strictly required, it runs fine without (presumably I haven't
> managed to avoid all barriers in the execbuffer path yet). However, I
> think can improve the stress by inserting extra gpu load -- that should
> help make the CPU writes / GPU reads of the buffer concurrent?

Just a small update. I haven't found a way to reproduce this in igt yet,
but I can still observe the effect using vbo-map-unsync and the fix
there is the above patch to make the wmb() unconditional.

We need to put this into stable@ reasonably quickly (I suspect some of
the 4.0 mmap(wc) regressions are due to this as well).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Unconditionally flush writes before execbuffer
  2015-05-21 13:00       ` Chris Wilson
@ 2015-05-21 13:07         ` Daniel Vetter
  2015-05-21 13:13           ` Chris Wilson
  2015-05-21 20:29           ` Jesse Barnes
  1 sibling, 1 reply; 19+ messages in thread
From: Daniel Vetter @ 2015-05-21 13:07 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx, Akash Goel, stable

On Thu, May 21, 2015 at 02:00:34PM +0100, Chris Wilson wrote:
> On Tue, May 19, 2015 at 03:41:48PM +0100, Chris Wilson wrote:
> > On Mon, May 11, 2015 at 04:25:52PM +0100, Chris Wilson wrote:
> > > On Mon, May 11, 2015 at 12:34:37PM +0200, Daniel Vetter wrote:
> > > > On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote:
> > > > > With the advent of mmap(wc), we have a path to write directly into
> > > > > active GPU buffers. When combined with async updates (i.e. avoiding the
> > > > > explicit domain management along with the memory barriers and GPU
> > > > > stalls) we start to see the GPU read the wrong values from memory - i.e.
> > > > > we have insufficient memory barriers along the execbuffer path. Writes
> > > > > through the GTT should have been naturally serialised with execution
> > > > > through the GTT as well and so the impact only seems to be from the WC
> > > > > paths.
> > > > > 
> > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > Cc: Akash Goel <akash.goel@intel.com>
> > > > > Cc: stable@vger.kernel.org
> > > > 
> > > > Do we have a nasty igt for this? Bugzilla?
> > > 
> > > I've added igt/gem_streaming_writes.
> > > 
> > > That wmb() is not enough for !llc. Since the wmb() made piglit happy it
> > > is quite possible I haven't hit the same path exactly, but it's going to
> > > take some investigation to see if igt/gem_streaming_writes can possibly
> > > work on !llc.
> > 
> > Humbug.
> > 
> > Found the bug in gem_streaming_writes, even though I still think the
> > wmb() is strictly required, it runs fine without (presumably I haven't
> > managed to avoid all barriers in the execbuffer path yet). However, I
> > think can improve the stress by inserting extra gpu load -- that should
> > help make the CPU writes / GPU reads of the buffer concurrent?
> 
> Just a small update. I haven't found a way to reproduce this in igt yet,
> but I can still observe the effect using vbo-map-unsync and the fix
> there is the above patch to make the wmb() unconditional.
> 
> We need to put this into stable@ reasonably quickly (I suspect some of
> the 4.0 mmap(wc) regressions are due to this as well).

What about

	if (flush_domains & (GTT | CPU))
		wmb();

instead? That would imo explain things a lot better, since cpu wc is
treated as if in the CPU domains. Hm, looking at the igt that's not quite
the case, we still put it into the GTT domain for wc mmaps afaict.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Unconditionally flush writes before execbuffer
  2015-05-21 13:07         ` Daniel Vetter
@ 2015-05-21 13:13           ` Chris Wilson
  2015-05-21 14:21               ` Daniel Vetter
  0 siblings, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2015-05-21 13:13 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, Akash Goel, stable

On Thu, May 21, 2015 at 03:07:54PM +0200, Daniel Vetter wrote:
> On Thu, May 21, 2015 at 02:00:34PM +0100, Chris Wilson wrote:
> > On Tue, May 19, 2015 at 03:41:48PM +0100, Chris Wilson wrote:
> > > On Mon, May 11, 2015 at 04:25:52PM +0100, Chris Wilson wrote:
> > > > On Mon, May 11, 2015 at 12:34:37PM +0200, Daniel Vetter wrote:
> > > > > On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote:
> > > > > > With the advent of mmap(wc), we have a path to write directly into
> > > > > > active GPU buffers. When combined with async updates (i.e. avoiding the
> > > > > > explicit domain management along with the memory barriers and GPU
> > > > > > stalls) we start to see the GPU read the wrong values from memory - i.e.
> > > > > > we have insufficient memory barriers along the execbuffer path. Writes
> > > > > > through the GTT should have been naturally serialised with execution
> > > > > > through the GTT as well and so the impact only seems to be from the WC
> > > > > > paths.
> > > > > > 
> > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > > Cc: Akash Goel <akash.goel@intel.com>
> > > > > > Cc: stable@vger.kernel.org
> > > > > 
> > > > > Do we have a nasty igt for this? Bugzilla?
> > > > 
> > > > I've added igt/gem_streaming_writes.
> > > > 
> > > > That wmb() is not enough for !llc. Since the wmb() made piglit happy it
> > > > is quite possible I haven't hit the same path exactly, but it's going to
> > > > take some investigation to see if igt/gem_streaming_writes can possibly
> > > > work on !llc.
> > > 
> > > Humbug.
> > > 
> > > Found the bug in gem_streaming_writes, even though I still think the
> > > wmb() is strictly required, it runs fine without (presumably I haven't
> > > managed to avoid all barriers in the execbuffer path yet). However, I
> > > think can improve the stress by inserting extra gpu load -- that should
> > > help make the CPU writes / GPU reads of the buffer concurrent?
> > 
> > Just a small update. I haven't found a way to reproduce this in igt yet,
> > but I can still observe the effect using vbo-map-unsync and the fix
> > there is the above patch to make the wmb() unconditional.
> > 
> > We need to put this into stable@ reasonably quickly (I suspect some of
> > the 4.0 mmap(wc) regressions are due to this as well).
> 
> What about
> 
> 	if (flush_domains & (GTT | CPU))
> 		wmb();
> 
> instead? That would imo explain things a lot better, since cpu wc is
> treated as if in the CPU domains. Hm, looking at the igt that's not quite
> the case, we still put it into the GTT domain for wc mmaps afaict.

No. flush_domains is 0. We are talking about async writes which means
that userspace is not telling the kernel about susbsequent writes into
the inactive portions of the bo, and trusting that the buffer is
coherent and the writes are flushed. Putting the wmb() in the kernel is
not the only solution, but the most convenient (and allows us to just
emit one wmb() - but given the large number of other potential barriers
in this path, I am surprised that is required. Empirical evidence to the
contrary!)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Unconditionally flush writes before execbuffer
  2015-05-21 13:13           ` Chris Wilson
@ 2015-05-21 14:21               ` Daniel Vetter
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Vetter @ 2015-05-21 14:21 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx, Akash Goel, stable

On Thu, May 21, 2015 at 02:13:01PM +0100, Chris Wilson wrote:
> On Thu, May 21, 2015 at 03:07:54PM +0200, Daniel Vetter wrote:
> > On Thu, May 21, 2015 at 02:00:34PM +0100, Chris Wilson wrote:
> > > On Tue, May 19, 2015 at 03:41:48PM +0100, Chris Wilson wrote:
> > > > On Mon, May 11, 2015 at 04:25:52PM +0100, Chris Wilson wrote:
> > > > > On Mon, May 11, 2015 at 12:34:37PM +0200, Daniel Vetter wrote:
> > > > > > On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote:
> > > > > > > With the advent of mmap(wc), we have a path to write directly into
> > > > > > > active GPU buffers. When combined with async updates (i.e. avoiding the
> > > > > > > explicit domain management along with the memory barriers and GPU
> > > > > > > stalls) we start to see the GPU read the wrong values from memory - i.e.
> > > > > > > we have insufficient memory barriers along the execbuffer path. Writes
> > > > > > > through the GTT should have been naturally serialised with execution
> > > > > > > through the GTT as well and so the impact only seems to be from the WC
> > > > > > > paths.
> > > > > > > 
> > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > > > Cc: Akash Goel <akash.goel@intel.com>
> > > > > > > Cc: stable@vger.kernel.org
> > > > > > 
> > > > > > Do we have a nasty igt for this? Bugzilla?
> > > > > 
> > > > > I've added igt/gem_streaming_writes.
> > > > > 
> > > > > That wmb() is not enough for !llc. Since the wmb() made piglit happy it
> > > > > is quite possible I haven't hit the same path exactly, but it's going to
> > > > > take some investigation to see if igt/gem_streaming_writes can possibly
> > > > > work on !llc.
> > > > 
> > > > Humbug.
> > > > 
> > > > Found the bug in gem_streaming_writes, even though I still think the
> > > > wmb() is strictly required, it runs fine without (presumably I haven't
> > > > managed to avoid all barriers in the execbuffer path yet). However, I
> > > > think can improve the stress by inserting extra gpu load -- that should
> > > > help make the CPU writes / GPU reads of the buffer concurrent?
> > > 
> > > Just a small update. I haven't found a way to reproduce this in igt yet,
> > > but I can still observe the effect using vbo-map-unsync and the fix
> > > there is the above patch to make the wmb() unconditional.
> > > 
> > > We need to put this into stable@ reasonably quickly (I suspect some of
> > > the 4.0 mmap(wc) regressions are due to this as well).
> > 
> > What about
> > 
> > 	if (flush_domains & (GTT | CPU))
> > 		wmb();
> > 
> > instead? That would imo explain things a lot better, since cpu wc is
> > treated as if in the CPU domains. Hm, looking at the igt that's not quite
> > the case, we still put it into the GTT domain for wc mmaps afaict.
> 
> No. flush_domains is 0. We are talking about async writes which means
> that userspace is not telling the kernel about susbsequent writes into
> the inactive portions of the bo, and trusting that the buffer is
> coherent and the writes are flushed. Putting the wmb() in the kernel is
> not the only solution, but the most convenient (and allows us to just
> emit one wmb() - but given the large number of other potential barriers
> in this path, I am surprised that is required. Empirical evidence to the
> contrary!)

Hm right. What about emphasising this a bit more in the comment:

	/*
	 * Empirical evidence indicates that we need a write barrier to
	 * make sure write-combined writes (both to the gtt, but also to
	 * the cpu mmaps). But userspace also uses wc mmaps as
	 * unsynchronized upload paths where it inform the kernel about
	 * domain changes (to avoid the stalls). Hence we must do this
	 * barrier unconditinally.
	 */

Mostly just rewording, unsing unsynchronized as used by gl/libdrm and
clarification why we need to have the barrier unconditionally. With that

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

And I guess also

Cc: stable@vger.kernel.org
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/i915: Unconditionally flush writes before execbuffer
@ 2015-05-21 14:21               ` Daniel Vetter
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Vetter @ 2015-05-21 14:21 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx, Akash Goel, stable

On Thu, May 21, 2015 at 02:13:01PM +0100, Chris Wilson wrote:
> On Thu, May 21, 2015 at 03:07:54PM +0200, Daniel Vetter wrote:
> > On Thu, May 21, 2015 at 02:00:34PM +0100, Chris Wilson wrote:
> > > On Tue, May 19, 2015 at 03:41:48PM +0100, Chris Wilson wrote:
> > > > On Mon, May 11, 2015 at 04:25:52PM +0100, Chris Wilson wrote:
> > > > > On Mon, May 11, 2015 at 12:34:37PM +0200, Daniel Vetter wrote:
> > > > > > On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote:
> > > > > > > With the advent of mmap(wc), we have a path to write directly into
> > > > > > > active GPU buffers. When combined with async updates (i.e. avoiding the
> > > > > > > explicit domain management along with the memory barriers and GPU
> > > > > > > stalls) we start to see the GPU read the wrong values from memory - i.e.
> > > > > > > we have insufficient memory barriers along the execbuffer path. Writes
> > > > > > > through the GTT should have been naturally serialised with execution
> > > > > > > through the GTT as well and so the impact only seems to be from the WC
> > > > > > > paths.
> > > > > > > 
> > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > > > Cc: Akash Goel <akash.goel@intel.com>
> > > > > > > Cc: stable@vger.kernel.org
> > > > > > 
> > > > > > Do we have a nasty igt for this? Bugzilla?
> > > > > 
> > > > > I've added igt/gem_streaming_writes.
> > > > > 
> > > > > That wmb() is not enough for !llc. Since the wmb() made piglit happy it
> > > > > is quite possible I haven't hit the same path exactly, but it's going to
> > > > > take some investigation to see if igt/gem_streaming_writes can possibly
> > > > > work on !llc.
> > > > 
> > > > Humbug.
> > > > 
> > > > Found the bug in gem_streaming_writes, even though I still think the
> > > > wmb() is strictly required, it runs fine without (presumably I haven't
> > > > managed to avoid all barriers in the execbuffer path yet). However, I
> > > > think can improve the stress by inserting extra gpu load -- that should
> > > > help make the CPU writes / GPU reads of the buffer concurrent?
> > > 
> > > Just a small update. I haven't found a way to reproduce this in igt yet,
> > > but I can still observe the effect using vbo-map-unsync and the fix
> > > there is the above patch to make the wmb() unconditional.
> > > 
> > > We need to put this into stable@ reasonably quickly (I suspect some of
> > > the 4.0 mmap(wc) regressions are due to this as well).
> > 
> > What about
> > 
> > 	if (flush_domains & (GTT | CPU))
> > 		wmb();
> > 
> > instead? That would imo explain things a lot better, since cpu wc is
> > treated as if in the CPU domains. Hm, looking at the igt that's not quite
> > the case, we still put it into the GTT domain for wc mmaps afaict.
> 
> No. flush_domains is 0. We are talking about async writes which means
> that userspace is not telling the kernel about susbsequent writes into
> the inactive portions of the bo, and trusting that the buffer is
> coherent and the writes are flushed. Putting the wmb() in the kernel is
> not the only solution, but the most convenient (and allows us to just
> emit one wmb() - but given the large number of other potential barriers
> in this path, I am surprised that is required. Empirical evidence to the
> contrary!)

Hm right. What about emphasising this a bit more in the comment:

	/*
	 * Empirical evidence indicates that we need a write barrier to
	 * make sure write-combined writes (both to the gtt, but also to
	 * the cpu mmaps). But userspace also uses wc mmaps as
	 * unsynchronized upload paths where it inform the kernel about
	 * domain changes (to avoid the stalls). Hence we must do this
	 * barrier unconditinally.
	 */

Mostly just rewording, unsing unsynchronized as used by gl/libdrm and
clarification why we need to have the barrier unconditionally. With that

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

And I guess also

Cc: stable@vger.kernel.org
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Unconditionally flush writes before execbuffer
  2015-05-21 14:21               ` Daniel Vetter
  (?)
@ 2015-05-21 15:22               ` Chris Wilson
  2015-05-21 15:30                 ` Daniel Vetter
  -1 siblings, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2015-05-21 15:22 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, Akash Goel, stable

On Thu, May 21, 2015 at 04:21:46PM +0200, Daniel Vetter wrote:
> Hm right. What about emphasising this a bit more in the comment:
> 
> 	/*
> 	 * Empirical evidence indicates that we need a write barrier to
> 	 * make sure write-combined writes (both to the gtt, but also to
> 	 * the cpu mmaps). But userspace also uses wc mmaps as
> 	 * unsynchronized upload paths where it inform the kernel about
> 	 * domain changes (to avoid the stalls). Hence we must do this
> 	 * barrier unconditinally.
> 	 */

For reference the wording in the commit is:

/* Unconditionally flush out writes to memory as the user may be
 * doing asynchronous streaming writes to active buffers (i.e.
 * lazy domain management to avoid serialisation) directly into
 * the physical pages and so not naturally serialised by the GTT.
 */

> Mostly just rewording, unsing unsynchronized as used by gl/libdrm and
> clarification why we need to have the barrier unconditionally. With that

Hmm, glMapBufferRange does use unsynchronized, but async is almost
universally preferred when talking about io and runqueues.

/* Unconditionally flush out writes to memory as the user may be
 * doing asynchronous streaming writes to active buffers in this
 * batch (i.e. lazy domain management to avoid serialisation, for
 * example with glMapBufferRange(GL_MAP_UNSYNCHRONIZED_BIT)) directly
 * into the physical pages and so not naturally serialised by the GTT.
 */

> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> And I guess also
> 
> Cc: stable@vger.kernel.org

It already was ;-)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Unconditionally flush writes before execbuffer
  2015-05-21 15:22               ` [Intel-gfx] " Chris Wilson
@ 2015-05-21 15:30                 ` Daniel Vetter
  2015-05-26  8:00                   ` Daniel Vetter
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Vetter @ 2015-05-21 15:30 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx, Akash Goel, stable; +Cc: Jani Nikula

On Thu, May 21, 2015 at 04:22:55PM +0100, Chris Wilson wrote:
> On Thu, May 21, 2015 at 04:21:46PM +0200, Daniel Vetter wrote:
> > Hm right. What about emphasising this a bit more in the comment:
> > 
> > 	/*
> > 	 * Empirical evidence indicates that we need a write barrier to
> > 	 * make sure write-combined writes (both to the gtt, but also to
> > 	 * the cpu mmaps). But userspace also uses wc mmaps as
> > 	 * unsynchronized upload paths where it inform the kernel about
> > 	 * domain changes (to avoid the stalls). Hence we must do this
> > 	 * barrier unconditinally.
> > 	 */
> 
> For reference the wording in the commit is:
> 
> /* Unconditionally flush out writes to memory as the user may be
>  * doing asynchronous streaming writes to active buffers (i.e.
>  * lazy domain management to avoid serialisation) directly into
>  * the physical pages and so not naturally serialised by the GTT.
>  */
> 
> > Mostly just rewording, unsing unsynchronized as used by gl/libdrm and
> > clarification why we need to have the barrier unconditionally. With that
> 
> Hmm, glMapBufferRange does use unsynchronized, but async is almost
> universally preferred when talking about io and runqueues.
> 
> /* Unconditionally flush out writes to memory as the user may be
>  * doing asynchronous streaming writes to active buffers in this
>  * batch (i.e. lazy domain management to avoid serialisation, for
>  * example with glMapBufferRange(GL_MAP_UNSYNCHRONIZED_BIT)) directly
>  * into the physical pages and so not naturally serialised by the GTT.
>  */
> 
> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Yeah, r-b: me also with this one. Jani, can you please exchange the
comment and apply to -fixes?
-Daniel

> > 
> > And I guess also
> > 
> > Cc: stable@vger.kernel.org
> 
> It already was ;-)
> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Unconditionally flush writes before execbuffer
  2015-05-21 13:00       ` Chris Wilson
@ 2015-05-21 20:29           ` Jesse Barnes
  2015-05-21 20:29           ` Jesse Barnes
  1 sibling, 0 replies; 19+ messages in thread
From: Jesse Barnes @ 2015-05-21 20:29 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx, Akash Goel, stable

On 05/21/2015 06:00 AM, Chris Wilson wrote:
> On Tue, May 19, 2015 at 03:41:48PM +0100, Chris Wilson wrote:
>> On Mon, May 11, 2015 at 04:25:52PM +0100, Chris Wilson wrote:
>>> On Mon, May 11, 2015 at 12:34:37PM +0200, Daniel Vetter wrote:
>>>> On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote:
>>>>> With the advent of mmap(wc), we have a path to write directly into
>>>>> active GPU buffers. When combined with async updates (i.e. avoiding the
>>>>> explicit domain management along with the memory barriers and GPU
>>>>> stalls) we start to see the GPU read the wrong values from memory - i.e.
>>>>> we have insufficient memory barriers along the execbuffer path. Writes
>>>>> through the GTT should have been naturally serialised with execution
>>>>> through the GTT as well and so the impact only seems to be from the WC
>>>>> paths.
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Cc: Akash Goel <akash.goel@intel.com>
>>>>> Cc: stable@vger.kernel.org
>>>>
>>>> Do we have a nasty igt for this? Bugzilla?
>>>
>>> I've added igt/gem_streaming_writes.
>>>
>>> That wmb() is not enough for !llc. Since the wmb() made piglit happy it
>>> is quite possible I haven't hit the same path exactly, but it's going to
>>> take some investigation to see if igt/gem_streaming_writes can possibly
>>> work on !llc.
>>
>> Humbug.
>>
>> Found the bug in gem_streaming_writes, even though I still think the
>> wmb() is strictly required, it runs fine without (presumably I haven't
>> managed to avoid all barriers in the execbuffer path yet). However, I
>> think can improve the stress by inserting extra gpu load -- that should
>> help make the CPU writes / GPU reads of the buffer concurrent?
> 
> Just a small update. I haven't found a way to reproduce this in igt yet,
> but I can still observe the effect using vbo-map-unsync and the fix
> there is the above patch to make the wmb() unconditional.
> 
> We need to put this into stable@ reasonably quickly (I suspect some of
> the 4.0 mmap(wc) regressions are due to this as well).

So the symptom is that the GPU picks up older values from memory, and
your theory is that the wmb() kicks the values out of the store buffer
or WC buffer prior to the execbuf?

I think that's reasonable, and I'm hoping "globally visible" is spec'd
to include the GPU and other system agents in the sfence definition.

Jesse


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] drm/i915: Unconditionally flush writes before execbuffer
@ 2015-05-21 20:29           ` Jesse Barnes
  0 siblings, 0 replies; 19+ messages in thread
From: Jesse Barnes @ 2015-05-21 20:29 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx, Akash Goel, stable

On 05/21/2015 06:00 AM, Chris Wilson wrote:
> On Tue, May 19, 2015 at 03:41:48PM +0100, Chris Wilson wrote:
>> On Mon, May 11, 2015 at 04:25:52PM +0100, Chris Wilson wrote:
>>> On Mon, May 11, 2015 at 12:34:37PM +0200, Daniel Vetter wrote:
>>>> On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote:
>>>>> With the advent of mmap(wc), we have a path to write directly into
>>>>> active GPU buffers. When combined with async updates (i.e. avoiding the
>>>>> explicit domain management along with the memory barriers and GPU
>>>>> stalls) we start to see the GPU read the wrong values from memory - i.e.
>>>>> we have insufficient memory barriers along the execbuffer path. Writes
>>>>> through the GTT should have been naturally serialised with execution
>>>>> through the GTT as well and so the impact only seems to be from the WC
>>>>> paths.
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Cc: Akash Goel <akash.goel@intel.com>
>>>>> Cc: stable@vger.kernel.org
>>>>
>>>> Do we have a nasty igt for this? Bugzilla?
>>>
>>> I've added igt/gem_streaming_writes.
>>>
>>> That wmb() is not enough for !llc. Since the wmb() made piglit happy it
>>> is quite possible I haven't hit the same path exactly, but it's going to
>>> take some investigation to see if igt/gem_streaming_writes can possibly
>>> work on !llc.
>>
>> Humbug.
>>
>> Found the bug in gem_streaming_writes, even though I still think the
>> wmb() is strictly required, it runs fine without (presumably I haven't
>> managed to avoid all barriers in the execbuffer path yet). However, I
>> think can improve the stress by inserting extra gpu load -- that should
>> help make the CPU writes / GPU reads of the buffer concurrent?
> 
> Just a small update. I haven't found a way to reproduce this in igt yet,
> but I can still observe the effect using vbo-map-unsync and the fix
> there is the above patch to make the wmb() unconditional.
> 
> We need to put this into stable@ reasonably quickly (I suspect some of
> the 4.0 mmap(wc) regressions are due to this as well).

So the symptom is that the GPU picks up older values from memory, and
your theory is that the wmb() kicks the values out of the store buffer
or WC buffer prior to the execbuf?

I think that's reasonable, and I'm hoping "globally visible" is spec'd
to include the GPU and other system agents in the sfence definition.

Jesse

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Unconditionally flush writes before execbuffer
  2015-05-21 15:30                 ` Daniel Vetter
@ 2015-05-26  8:00                   ` Daniel Vetter
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Vetter @ 2015-05-26  8:00 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx, Akash Goel, stable; +Cc: Jani Nikula

On Thu, May 21, 2015 at 5:30 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Thu, May 21, 2015 at 04:22:55PM +0100, Chris Wilson wrote:
>> On Thu, May 21, 2015 at 04:21:46PM +0200, Daniel Vetter wrote:
>> > Hm right. What about emphasising this a bit more in the comment:
>> >
>> >     /*
>> >      * Empirical evidence indicates that we need a write barrier to
>> >      * make sure write-combined writes (both to the gtt, but also to
>> >      * the cpu mmaps). But userspace also uses wc mmaps as
>> >      * unsynchronized upload paths where it inform the kernel about
>> >      * domain changes (to avoid the stalls). Hence we must do this
>> >      * barrier unconditinally.
>> >      */
>>
>> For reference the wording in the commit is:
>>
>> /* Unconditionally flush out writes to memory as the user may be
>>  * doing asynchronous streaming writes to active buffers (i.e.
>>  * lazy domain management to avoid serialisation) directly into
>>  * the physical pages and so not naturally serialised by the GTT.
>>  */
>>
>> > Mostly just rewording, unsing unsynchronized as used by gl/libdrm and
>> > clarification why we need to have the barrier unconditionally. With that
>>
>> Hmm, glMapBufferRange does use unsynchronized, but async is almost
>> universally preferred when talking about io and runqueues.
>>
>> /* Unconditionally flush out writes to memory as the user may be
>>  * doing asynchronous streaming writes to active buffers in this
>>  * batch (i.e. lazy domain management to avoid serialisation, for
>>  * example with glMapBufferRange(GL_MAP_UNSYNCHRONIZED_BIT)) directly
>>  * into the physical pages and so not naturally serialised by the GTT.
>>  */
>>
>> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> Yeah, r-b: me also with this one. Jani, can you please exchange the
> comment and apply to -fixes?

I retract my r-b for now, this seems to be a much bigger can of worms
- it seems like we need to make all the mb()/wmb() we have all over
the place unconditional.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-05-26  8:00 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-11  7:51 [PATCH] drm/i915: Unconditionally flush writes before execbuffer Chris Wilson
2015-05-11 10:34 ` [Intel-gfx] " Daniel Vetter
2015-05-11 10:34   ` Daniel Vetter
2015-05-11 10:37   ` [Intel-gfx] " Chris Wilson
2015-05-11 15:25   ` Chris Wilson
2015-05-11 15:25     ` Chris Wilson
2015-05-12 10:19     ` [Intel-gfx] " Chris Wilson
2015-05-19 14:41     ` Chris Wilson
2015-05-21 13:00       ` Chris Wilson
2015-05-21 13:07         ` Daniel Vetter
2015-05-21 13:13           ` Chris Wilson
2015-05-21 14:21             ` Daniel Vetter
2015-05-21 14:21               ` Daniel Vetter
2015-05-21 15:22               ` [Intel-gfx] " Chris Wilson
2015-05-21 15:30                 ` Daniel Vetter
2015-05-26  8:00                   ` Daniel Vetter
2015-05-21 20:29         ` Jesse Barnes
2015-05-21 20:29           ` Jesse Barnes
2015-05-14 11:52 ` shuang.he

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.