All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Chris Wilson <chris@chris-wilson.co.uk>,
	Daniel Vetter <daniel@ffwll.ch>,
	intel-gfx@lists.freedesktop.org,
	Akash Goel <akash.goel@intel.com>,
	stable@vger.kernel.org
Subject: Re: [Intel-gfx] [PATCH] drm/i915: Unconditionally flush writes before execbuffer
Date: Thu, 21 May 2015 16:21:46 +0200	[thread overview]
Message-ID: <20150521142146.GF15256@phenom.ffwll.local> (raw)
In-Reply-To: <20150521131301.GQ17761@nuc-i3427.alporthouse.com>

On Thu, May 21, 2015 at 02:13:01PM +0100, Chris Wilson wrote:
> On Thu, May 21, 2015 at 03:07:54PM +0200, Daniel Vetter wrote:
> > On Thu, May 21, 2015 at 02:00:34PM +0100, Chris Wilson wrote:
> > > On Tue, May 19, 2015 at 03:41:48PM +0100, Chris Wilson wrote:
> > > > On Mon, May 11, 2015 at 04:25:52PM +0100, Chris Wilson wrote:
> > > > > On Mon, May 11, 2015 at 12:34:37PM +0200, Daniel Vetter wrote:
> > > > > > On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote:
> > > > > > > With the advent of mmap(wc), we have a path to write directly into
> > > > > > > active GPU buffers. When combined with async updates (i.e. avoiding the
> > > > > > > explicit domain management along with the memory barriers and GPU
> > > > > > > stalls) we start to see the GPU read the wrong values from memory - i.e.
> > > > > > > we have insufficient memory barriers along the execbuffer path. Writes
> > > > > > > through the GTT should have been naturally serialised with execution
> > > > > > > through the GTT as well and so the impact only seems to be from the WC
> > > > > > > paths.
> > > > > > > 
> > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > > > Cc: Akash Goel <akash.goel@intel.com>
> > > > > > > Cc: stable@vger.kernel.org
> > > > > > 
> > > > > > Do we have a nasty igt for this? Bugzilla?
> > > > > 
> > > > > I've added igt/gem_streaming_writes.
> > > > > 
> > > > > That wmb() is not enough for !llc. Since the wmb() made piglit happy it
> > > > > is quite possible I haven't hit the same path exactly, but it's going to
> > > > > take some investigation to see if igt/gem_streaming_writes can possibly
> > > > > work on !llc.
> > > > 
> > > > Humbug.
> > > > 
> > > > Found the bug in gem_streaming_writes, even though I still think the
> > > > wmb() is strictly required, it runs fine without (presumably I haven't
> > > > managed to avoid all barriers in the execbuffer path yet). However, I
> > > > think can improve the stress by inserting extra gpu load -- that should
> > > > help make the CPU writes / GPU reads of the buffer concurrent?
> > > 
> > > Just a small update. I haven't found a way to reproduce this in igt yet,
> > > but I can still observe the effect using vbo-map-unsync and the fix
> > > there is the above patch to make the wmb() unconditional.
> > > 
> > > We need to put this into stable@ reasonably quickly (I suspect some of
> > > the 4.0 mmap(wc) regressions are due to this as well).
> > 
> > What about
> > 
> > 	if (flush_domains & (GTT | CPU))
> > 		wmb();
> > 
> > instead? That would imo explain things a lot better, since cpu wc is
> > treated as if in the CPU domains. Hm, looking at the igt that's not quite
> > the case, we still put it into the GTT domain for wc mmaps afaict.
> 
> No. flush_domains is 0. We are talking about async writes which means
> that userspace is not telling the kernel about susbsequent writes into
> the inactive portions of the bo, and trusting that the buffer is
> coherent and the writes are flushed. Putting the wmb() in the kernel is
> not the only solution, but the most convenient (and allows us to just
> emit one wmb() - but given the large number of other potential barriers
> in this path, I am surprised that is required. Empirical evidence to the
> contrary!)

Hm right. What about emphasising this a bit more in the comment:

	/*
	 * Empirical evidence indicates that we need a write barrier to
	 * make sure write-combined writes (both to the gtt, but also to
	 * the cpu mmaps). But userspace also uses wc mmaps as
	 * unsynchronized upload paths where it inform the kernel about
	 * domain changes (to avoid the stalls). Hence we must do this
	 * barrier unconditinally.
	 */

Mostly just rewording, unsing unsynchronized as used by gl/libdrm and
clarification why we need to have the barrier unconditionally. With that

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

And I guess also

Cc: stable@vger.kernel.org
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

WARNING: multiple messages have this Message-ID (diff)
From: Daniel Vetter <daniel@ffwll.ch>
To: Chris Wilson <chris@chris-wilson.co.uk>,
	Daniel Vetter <daniel@ffwll.ch>,
	intel-gfx@lists.freedesktop.org,
	Akash Goel <akash.goel@intel.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH] drm/i915: Unconditionally flush writes before execbuffer
Date: Thu, 21 May 2015 16:21:46 +0200	[thread overview]
Message-ID: <20150521142146.GF15256@phenom.ffwll.local> (raw)
In-Reply-To: <20150521131301.GQ17761@nuc-i3427.alporthouse.com>

On Thu, May 21, 2015 at 02:13:01PM +0100, Chris Wilson wrote:
> On Thu, May 21, 2015 at 03:07:54PM +0200, Daniel Vetter wrote:
> > On Thu, May 21, 2015 at 02:00:34PM +0100, Chris Wilson wrote:
> > > On Tue, May 19, 2015 at 03:41:48PM +0100, Chris Wilson wrote:
> > > > On Mon, May 11, 2015 at 04:25:52PM +0100, Chris Wilson wrote:
> > > > > On Mon, May 11, 2015 at 12:34:37PM +0200, Daniel Vetter wrote:
> > > > > > On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote:
> > > > > > > With the advent of mmap(wc), we have a path to write directly into
> > > > > > > active GPU buffers. When combined with async updates (i.e. avoiding the
> > > > > > > explicit domain management along with the memory barriers and GPU
> > > > > > > stalls) we start to see the GPU read the wrong values from memory - i.e.
> > > > > > > we have insufficient memory barriers along the execbuffer path. Writes
> > > > > > > through the GTT should have been naturally serialised with execution
> > > > > > > through the GTT as well and so the impact only seems to be from the WC
> > > > > > > paths.
> > > > > > > 
> > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > > > Cc: Akash Goel <akash.goel@intel.com>
> > > > > > > Cc: stable@vger.kernel.org
> > > > > > 
> > > > > > Do we have a nasty igt for this? Bugzilla?
> > > > > 
> > > > > I've added igt/gem_streaming_writes.
> > > > > 
> > > > > That wmb() is not enough for !llc. Since the wmb() made piglit happy it
> > > > > is quite possible I haven't hit the same path exactly, but it's going to
> > > > > take some investigation to see if igt/gem_streaming_writes can possibly
> > > > > work on !llc.
> > > > 
> > > > Humbug.
> > > > 
> > > > Found the bug in gem_streaming_writes, even though I still think the
> > > > wmb() is strictly required, it runs fine without (presumably I haven't
> > > > managed to avoid all barriers in the execbuffer path yet). However, I
> > > > think can improve the stress by inserting extra gpu load -- that should
> > > > help make the CPU writes / GPU reads of the buffer concurrent?
> > > 
> > > Just a small update. I haven't found a way to reproduce this in igt yet,
> > > but I can still observe the effect using vbo-map-unsync and the fix
> > > there is the above patch to make the wmb() unconditional.
> > > 
> > > We need to put this into stable@ reasonably quickly (I suspect some of
> > > the 4.0 mmap(wc) regressions are due to this as well).
> > 
> > What about
> > 
> > 	if (flush_domains & (GTT | CPU))
> > 		wmb();
> > 
> > instead? That would imo explain things a lot better, since cpu wc is
> > treated as if in the CPU domains. Hm, looking at the igt that's not quite
> > the case, we still put it into the GTT domain for wc mmaps afaict.
> 
> No. flush_domains is 0. We are talking about async writes which means
> that userspace is not telling the kernel about susbsequent writes into
> the inactive portions of the bo, and trusting that the buffer is
> coherent and the writes are flushed. Putting the wmb() in the kernel is
> not the only solution, but the most convenient (and allows us to just
> emit one wmb() - but given the large number of other potential barriers
> in this path, I am surprised that is required. Empirical evidence to the
> contrary!)

Hm right. What about emphasising this a bit more in the comment:

	/*
	 * Empirical evidence indicates that we need a write barrier to
	 * make sure write-combined writes (both to the gtt, but also to
	 * the cpu mmaps). But userspace also uses wc mmaps as
	 * unsynchronized upload paths where it inform the kernel about
	 * domain changes (to avoid the stalls). Hence we must do this
	 * barrier unconditinally.
	 */

Mostly just rewording, unsing unsynchronized as used by gl/libdrm and
clarification why we need to have the barrier unconditionally. With that

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

And I guess also

Cc: stable@vger.kernel.org
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2015-05-21 14:19 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-11  7:51 [PATCH] drm/i915: Unconditionally flush writes before execbuffer Chris Wilson
2015-05-11 10:34 ` [Intel-gfx] " Daniel Vetter
2015-05-11 10:34   ` Daniel Vetter
2015-05-11 10:37   ` [Intel-gfx] " Chris Wilson
2015-05-11 15:25   ` Chris Wilson
2015-05-11 15:25     ` Chris Wilson
2015-05-12 10:19     ` [Intel-gfx] " Chris Wilson
2015-05-19 14:41     ` Chris Wilson
2015-05-21 13:00       ` Chris Wilson
2015-05-21 13:07         ` Daniel Vetter
2015-05-21 13:13           ` Chris Wilson
2015-05-21 14:21             ` Daniel Vetter [this message]
2015-05-21 14:21               ` Daniel Vetter
2015-05-21 15:22               ` [Intel-gfx] " Chris Wilson
2015-05-21 15:30                 ` Daniel Vetter
2015-05-26  8:00                   ` Daniel Vetter
2015-05-21 20:29         ` Jesse Barnes
2015-05-21 20:29           ` Jesse Barnes
2015-05-14 11:52 ` shuang.he

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150521142146.GF15256@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=akash.goel@intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.