intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain"
@ 2011-06-14 23:43 Eric Anholt
  2011-06-15  9:39 ` Chris Wilson
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Anholt @ 2011-06-14 23:43 UTC (permalink / raw)
  To: intel-gfx

This reverts commit 4a684a4117abd756291969336af454e8a958802f.
Userland has always been required to set the object's domain to GTT
before using it through a GTT mapping, it's not something that the
kernel is supposed to enforce.  (The pagefault support is so that we
can handle multiple mappings without userland having to pin across
them, not so that userland can use GTT after GPU domains without
telling the kernel).

Fixes 19.2% +/- 0.8% (n=6) performance regression in cairo-gl
firefox-talos-gfx on my T420 latop.
---
 drivers/gpu/drm/i915/i915_gem.c            |   10 ++++------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |    4 ----
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d231729..7bbd355 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1218,11 +1218,11 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 		ret = i915_gem_object_bind_to_gtt(obj, 0, true);
 		if (ret)
 			goto unlock;
-	}
 
-	ret = i915_gem_object_set_to_gtt_domain(obj, write);
-	if (ret)
-		goto unlock;
+		ret = i915_gem_object_set_to_gtt_domain(obj, write);
+		if (ret)
+			goto unlock;
+	}
 
 	if (obj->tiling_mode == I915_TILING_NONE)
 		ret = i915_gem_object_put_fence(obj);
@@ -2953,8 +2953,6 @@ i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj)
 	 */
 	wmb();
 
-	i915_gem_release_mmap(obj);
-
 	old_write_domain = obj->base.write_domain;
 	obj->base.write_domain = 0;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 20a4cc5..4934cf8 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -187,10 +187,6 @@ i915_gem_object_set_to_gpu_domain(struct drm_i915_gem_object *obj,
 	if ((flush_domains | invalidate_domains) & I915_GEM_DOMAIN_CPU)
 		i915_gem_clflush_object(obj);
 
-	/* blow away mappings if mapped through GTT */
-	if ((flush_domains | invalidate_domains) & I915_GEM_DOMAIN_GTT)
-		i915_gem_release_mmap(obj);
-
 	if (obj->base.pending_write_domain)
 		cd->flips |= atomic_read(&obj->pending_flip);
 
-- 
1.7.5.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain"
  2011-06-14 23:43 [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain" Eric Anholt
@ 2011-06-15  9:39 ` Chris Wilson
  2011-06-15 15:08   ` Eric Anholt
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2011-06-15  9:39 UTC (permalink / raw)
  To: Eric Anholt, intel-gfx

On Tue, 14 Jun 2011 16:43:09 -0700, Eric Anholt <eric@anholt.net> wrote:
> This reverts commit 4a684a4117abd756291969336af454e8a958802f.
> Userland has always been required to set the object's domain to GTT
> before using it through a GTT mapping, it's not something that the
> kernel is supposed to enforce.  (The pagefault support is so that we
> can handle multiple mappings without userland having to pin across
> them, not so that userland can use GTT after GPU domains without
> telling the kernel).

The only place that can do accurate per-page tracking of domains is the
kernel. (And avoid clflushes for GTT eviction. Ok, so we're not quite
there yet.) Relying on userspace to get it right, and for co-operating
processes to coordinate their access to buffers is a misfeature.
 
> Fixes 19.2% +/- 0.8% (n=6) performance regression in cairo-gl
> firefox-talos-gfx on my T420 latop.

Since GTT mappings are fundamentally slow, don't you think this suggests
that mesa is doing something wrong? Or a bug in cairo-gl to make mesa act
this way?

For the record, on my SugarBay, this patch on top of drm-intel-fixes and
mesa.git + your instruction streaming, regresses cairo-xlib 4.5s to 4.7s
(4%, with a claim of .1% stddev) and improves cairo-gl from 21.5s to 20.1s
(6.5%, with a claim of .5% stddev) for the firefox-talos-gfx. [This is a
system that can do under 2s for cairo-xlib using vmap and the perf
patches I posted before.]
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain"
  2011-06-15  9:39 ` Chris Wilson
@ 2011-06-15 15:08   ` Eric Anholt
  2011-06-15 16:03     ` Chris Wilson
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Anholt @ 2011-06-15 15:08 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 2358 bytes --]

On Wed, 15 Jun 2011 10:39:06 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Tue, 14 Jun 2011 16:43:09 -0700, Eric Anholt <eric@anholt.net> wrote:
> > This reverts commit 4a684a4117abd756291969336af454e8a958802f.
> > Userland has always been required to set the object's domain to GTT
> > before using it through a GTT mapping, it's not something that the
> > kernel is supposed to enforce.  (The pagefault support is so that we
> > can handle multiple mappings without userland having to pin across
> > them, not so that userland can use GTT after GPU domains without
> > telling the kernel).
> 
> The only place that can do accurate per-page tracking of domains is the
> kernel. (And avoid clflushes for GTT eviction. Ok, so we're not quite
> there yet.) Relying on userspace to get it right, and for co-operating
> processes to coordinate their access to buffers is a misfeature.

We rely on userspace to get it right all the time, which was a design
decision made early on for performance reasons.  For example, userspace
can submit the wrong domains with an exec call, and non-GTT mappings
also behave like I'm changing GTT back to.  Unless you quietly changed
that, too?

> > Fixes 19.2% +/- 0.8% (n=6) performance regression in cairo-gl
> > firefox-talos-gfx on my T420 latop.
> 
> Since GTT mappings are fundamentally slow, don't you think this suggests
> that mesa is doing something wrong? Or a bug in cairo-gl to make mesa act
> this way?

Page table changes are expensive, yes.  That's why we cache BOs in
userspace and hang onto the mappings of them.  We only moved to GTT
mappings in Mesa in places where it was faster than the alternatives (or
for tiled buffers, where due to the a17 swizzling there is no other
option).

> For the record, on my SugarBay, this patch on top of drm-intel-fixes and
> mesa.git + your instruction streaming, regresses cairo-xlib 4.5s to 4.7s
> (4%, with a claim of .1% stddev) and improves cairo-gl from 21.5s to 20.1s
> (6.5%, with a claim of .5% stddev) for the firefox-talos-gfx. [This is a
> system that can do under 2s for cairo-xlib using vmap and the perf
> patches I posted before.]
> -Chris

I was testing with semaphores enabled on the LLC series.  I'm curious
how this revert could possibly regress cairo-xlib, though -- that seems
very strange.

[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain"
  2011-06-15 15:08   ` Eric Anholt
@ 2011-06-15 16:03     ` Chris Wilson
  2011-06-17 19:06       ` Eric Anholt
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2011-06-15 16:03 UTC (permalink / raw)
  To: Eric Anholt, intel-gfx

On Wed, 15 Jun 2011 08:08:59 -0700, Eric Anholt <eric@anholt.net> wrote:
> On Wed, 15 Jun 2011 10:39:06 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > For the record, on my SugarBay, this patch on top of drm-intel-fixes and
> > mesa.git + your instruction streaming, regresses cairo-xlib 4.5s to 4.7s
> > (4%, with a claim of .1% stddev) and improves cairo-gl from 21.5s to 20.1s
> > (6.5%, with a claim of .5% stddev) for the firefox-talos-gfx. [This is a
> > system that can do under 2s for cairo-xlib using vmap and the perf
> > patches I posted before.]
> > -Chris
> 
> I was testing with semaphores enabled on the LLC series.  I'm curious
> how this revert could possibly regress cairo-xlib, though -- that seems
> very strange.

Moving back to LLC+semaphores (2.6.39-rc2+), firefox-talos-gfx:
xlib: 4.473
gl:  20.753

applying the patch:
xlib: 4.472
gl:  20.824

I'm just not reproducing the same issue you are seeing. Are you using a
standard distro Kconfig, or if not, can you send me yours?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain"
  2011-06-15 16:03     ` Chris Wilson
@ 2011-06-17 19:06       ` Eric Anholt
  2011-06-18 11:43         ` Chris Wilson
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Anholt @ 2011-06-17 19:06 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 2614 bytes --]

On Wed, 15 Jun 2011 17:03:58 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Wed, 15 Jun 2011 08:08:59 -0700, Eric Anholt <eric@anholt.net> wrote:
> > On Wed, 15 Jun 2011 10:39:06 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > For the record, on my SugarBay, this patch on top of drm-intel-fixes and
> > > mesa.git + your instruction streaming, regresses cairo-xlib 4.5s to 4.7s
> > > (4%, with a claim of .1% stddev) and improves cairo-gl from 21.5s to 20.1s
> > > (6.5%, with a claim of .5% stddev) for the firefox-talos-gfx. [This is a
> > > system that can do under 2s for cairo-xlib using vmap and the perf
> > > patches I posted before.]
> > > -Chris
> > 
> > I was testing with semaphores enabled on the LLC series.  I'm curious
> > how this revert could possibly regress cairo-xlib, though -- that seems
> > very strange.
> 
> Moving back to LLC+semaphores (2.6.39-rc2+), firefox-talos-gfx:
> xlib: 4.473
> gl:  20.753
> 
> applying the patch:
> xlib: 4.472
> gl:  20.824
> 
> I'm just not reproducing the same issue you are seeing. Are you using a
> standard distro Kconfig, or if not, can you send me yours?
> -Chris

http://people.freedesktop.org/~anholt/dotconfig

Pushed my kernel tree to "gtt-revert" branch.

Fresh set of numbers on two fresh boots, just about as identical testing
environments as I could produce:

(without revert)
anholt@eliezer:src/cairo/perf% cat /home/anholt/cairogl-before
21791099815
21677090234
22060471069
22049244399
21721362892
21966143347

(with revert)
anholt@eliezer:src/cairo/perf% cat /home/anholt/cairogl-after 
19162606198
19127697366
19462253708
19172535715
19339999910
19221539215

anholt@eliezer:src/cairo/perf% ministat ~/cairogl-before ~/cairogl-after
x /home/anholt/cairogl-before
+ /home/anholt/cairogl-after
+------------------------------------------------------------------------------+
| +                                                                           x|
|++ +  +  +                                                         xx x    x x|
||__A___|                                                            |___A__M_||
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   6  2.167709e+10 2.2060471e+10 2.1966143e+10 2.1877569e+10 1.6902073e+08
+   6 1.9127697e+10 1.9462254e+10 1.9221539e+10 1.9247772e+10 1.2847426e+08
Difference at 95.0% confidence
	-2.6298e+09 +/- 1.93108e+08
	-12.0205% +/- 0.882677%
	(Student's t, pooled s = 1.50123e+08)

[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain"
  2011-06-17 19:06       ` Eric Anholt
@ 2011-06-18 11:43         ` Chris Wilson
  2011-06-18 20:20           ` Eric Anholt
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2011-06-18 11:43 UTC (permalink / raw)
  To: Eric Anholt, intel-gfx

On Fri, 17 Jun 2011 12:06:54 -0700, Eric Anholt <eric@anholt.net> wrote:
> On Wed, 15 Jun 2011 17:03:58 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > Moving back to LLC+semaphores (2.6.39-rc2+), firefox-talos-gfx:
> > xlib: 4.473
> > gl:  20.753
> > 
> > applying the patch:
> > xlib: 4.472
> > gl:  20.824
> > 
> > I'm just not reproducing the same issue you are seeing. Are you using a
> > standard distro Kconfig, or if not, can you send me yours?
> > -Chris
> 
> http://people.freedesktop.org/~anholt/dotconfig
> 
> Pushed my kernel tree to "gtt-revert" branch.

Thanks, I'm testing with those on my SNB desktop, still only see around
a 5% hit for firefox-talos-gfx.

I still think this is optimising for bad behaviour of the client, and
papering over two slow linear lookups in the kernel. [One for find_vma
which is exacerbated by the cached vma on the bo and the other checking
whether the GTT aperture is RAM.]

Looking at what cairo-gl is doing, the root cause of why we are
repeatedly rewriting textures is:

commit 3b1c0a4bd66660780095e6016e3db451f34503a3
Author: Benjamin Otte <otte@redhat.com>
Date:   Fri May 14 15:56:17 2010 +0200

    fallback: Remove span renderer paths
    
    Those paths were broken, as they didn't properly translate the polygon
    to the destination size. And rather than adding lots of code that allows
    translation, it's easier to just delete this code.
    
    Note that the only user of the code was the GL backend anyway.

i.e. cairo-gl is no longer using spans for firefox-talos-gfx where
unaligned clipping dominates.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain"
  2011-06-18 11:43         ` Chris Wilson
@ 2011-06-18 20:20           ` Eric Anholt
  2011-06-19 16:28             ` Chris Wilson
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Anholt @ 2011-06-18 20:20 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 1345 bytes --]

On Sat, 18 Jun 2011 12:43:58 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Fri, 17 Jun 2011 12:06:54 -0700, Eric Anholt <eric@anholt.net> wrote:
> > On Wed, 15 Jun 2011 17:03:58 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > Moving back to LLC+semaphores (2.6.39-rc2+), firefox-talos-gfx:
> > > xlib: 4.473
> > > gl:  20.753
> > > 
> > > applying the patch:
> > > xlib: 4.472
> > > gl:  20.824
> > > 
> > > I'm just not reproducing the same issue you are seeing. Are you using a
> > > standard distro Kconfig, or if not, can you send me yours?
> > > -Chris
> > 
> > http://people.freedesktop.org/~anholt/dotconfig
> > 
> > Pushed my kernel tree to "gtt-revert" branch.
> 
> Thanks, I'm testing with those on my SNB desktop, still only see around
> a 5% hit for firefox-talos-gfx.

GTT mappings are important.  They're how textures are uploaded in GL in
general.  One of the longstanding things I've wanted to do for GL
applications that repeatedly allocate new texture images is to userland
BO cache those objects, which we don't do because of tiling.  With the
patch I'm reverting in place, there's basically no reason to do so
because remapping the BO is much of the cost of recreating the BO from
scratch.  Remapping is the reason we do userland caching instead of
kernel-side caching.

[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain"
  2011-06-18 20:20           ` Eric Anholt
@ 2011-06-19 16:28             ` Chris Wilson
  2011-06-19 17:01               ` Eric Anholt
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2011-06-19 16:28 UTC (permalink / raw)
  To: Eric Anholt, intel-gfx

On Sat, 18 Jun 2011 13:20:05 -0700, Eric Anholt <eric@anholt.net> wrote:
> On Sat, 18 Jun 2011 12:43:58 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > On Fri, 17 Jun 2011 12:06:54 -0700, Eric Anholt <eric@anholt.net> wrote:
> > > On Wed, 15 Jun 2011 17:03:58 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > > Moving back to LLC+semaphores (2.6.39-rc2+), firefox-talos-gfx:
> > > > xlib: 4.473
> > > > gl:  20.753
> > > > 
> > > > applying the patch:
> > > > xlib: 4.472
> > > > gl:  20.824
> > > > 
> > > > I'm just not reproducing the same issue you are seeing. Are you using a
> > > > standard distro Kconfig, or if not, can you send me yours?
> > > > -Chris
> > > 
> > > http://people.freedesktop.org/~anholt/dotconfig
> > > 
> > > Pushed my kernel tree to "gtt-revert" branch.
> > 
> > Thanks, I'm testing with those on my SNB desktop, still only see around
> > a 5% hit for firefox-talos-gfx.
> 
> GTT mappings are important.  They're how textures are uploaded in GL in
> general.

For lack of a better mechanism. Even using anholt/gtt-revert, I question
the value of caching the GTT mapping in drm_intel_bo. For the cairo-gl and
pts benchmarks I've run, the efficacy of the cached vma is very small and
there is a very slight improvement by unmapping the vma after use. (The
difference is so small, that it will take a lot more runs to determine if
it is statistically significant.)

> One of the longstanding things I've wanted to do for GL
> applications that repeatedly allocate new texture images is to userland
> BO cache those objects, which we don't do because of tiling.

I'm failing to see the difference between this cache and a slightly
smarter (i.e. one that preferentially returns an object with appropriate
tiling and busy status) libdrm_intel cache.

> With the
> patch I'm reverting in place, there's basically no reason to do so
> because remapping the BO is much of the cost of recreating the BO from
> scratch.  Remapping is the reason we do userland caching instead of
> kernel-side caching.

Conversely, it is the lack of such a kernel bo and page cache that is
determinental to the ddx (at least on pre-SNB). And for any system running
more than one renderer. [Except for the thorny issue of whether having to
clear the pages returned from such a cache will nullify its benefits. I
have yet to measure the impact of doing so.]
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain"
  2011-06-19 16:28             ` Chris Wilson
@ 2011-06-19 17:01               ` Eric Anholt
  2011-06-19 17:14                 ` Chris Wilson
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Anholt @ 2011-06-19 17:01 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 2294 bytes --]

On Sun, 19 Jun 2011 17:28:11 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Sat, 18 Jun 2011 13:20:05 -0700, Eric Anholt <eric@anholt.net> wrote:
> > On Sat, 18 Jun 2011 12:43:58 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > On Fri, 17 Jun 2011 12:06:54 -0700, Eric Anholt <eric@anholt.net> wrote:
> > > > On Wed, 15 Jun 2011 17:03:58 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > > > Moving back to LLC+semaphores (2.6.39-rc2+), firefox-talos-gfx:
> > > > > xlib: 4.473
> > > > > gl:  20.753
> > > > > 
> > > > > applying the patch:
> > > > > xlib: 4.472
> > > > > gl:  20.824
> > > > > 
> > > > > I'm just not reproducing the same issue you are seeing. Are you using a
> > > > > standard distro Kconfig, or if not, can you send me yours?
> > > > > -Chris
> > > > 
> > > > http://people.freedesktop.org/~anholt/dotconfig
> > > > 
> > > > Pushed my kernel tree to "gtt-revert" branch.
> > > 
> > > Thanks, I'm testing with those on my SNB desktop, still only see around
> > > a 5% hit for firefox-talos-gfx.
> > 
> > GTT mappings are important.  They're how textures are uploaded in GL in
> > general.
> 
> For lack of a better mechanism. Even using anholt/gtt-revert, I question
> the value of caching the GTT mapping in drm_intel_bo. For the cairo-gl and
> pts benchmarks I've run, the efficacy of the cached vma is very small and
> there is a very slight improvement by unmapping the vma after use. (The
> difference is so small, that it will take a lot more runs to determine if
> it is statistically significant.)

I'm confused.  You've measured a 5% impact from removing this part of
the caching, and I've measured 12-19%, so what are you planning that
doesn't involve caching the mapping that's faster than caching the
mapping?

> > One of the longstanding things I've wanted to do for GL
> > applications that repeatedly allocate new texture images is to userland
> > BO cache those objects, which we don't do because of tiling.
> 
> I'm failing to see the difference between this cache and a slightly
> smarter (i.e. one that preferentially returns an object with appropriate
> tiling and busy status) libdrm_intel cache.

I was talking about it being in libdrm_intel, sorry if that was
unclear.

[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain"
  2011-06-19 17:01               ` Eric Anholt
@ 2011-06-19 17:14                 ` Chris Wilson
  2011-06-19 21:20                   ` Eric Anholt
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2011-06-19 17:14 UTC (permalink / raw)
  To: Eric Anholt, intel-gfx

On Sun, 19 Jun 2011 10:01:23 -0700, Eric Anholt <eric@anholt.net> wrote:
> On Sun, 19 Jun 2011 17:28:11 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > For lack of a better mechanism. Even using anholt/gtt-revert, I question
> > the value of caching the GTT mapping in drm_intel_bo. For the cairo-gl and
> > pts benchmarks I've run, the efficacy of the cached vma is very small and
> > there is a very slight improvement by unmapping the vma after use. (The
> > difference is so small, that it will take a lot more runs to determine if
> > it is statistically significant.)
> 
> I'm confused.  You've measured a 5% impact from removing this part of
> the caching, and I've measured 12-19%, so what are you planning that
> doesn't involve caching the mapping that's faster than caching the
> mapping?

As I pointed out, cairo-gl is using fallback code and not spans. Once that
is corrected, cairo-gl no longer continually recreating textures and so
unaffected by the patch. Removing the cached vma is then arguably
beneficial, though the difference looks to be in the noise.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain"
  2011-06-19 17:14                 ` Chris Wilson
@ 2011-06-19 21:20                   ` Eric Anholt
  2011-06-19 21:49                     ` Chris Wilson
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Anholt @ 2011-06-19 21:20 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 1510 bytes --]

On Sun, 19 Jun 2011 18:14:11 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Sun, 19 Jun 2011 10:01:23 -0700, Eric Anholt <eric@anholt.net> wrote:
> > On Sun, 19 Jun 2011 17:28:11 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > For lack of a better mechanism. Even using anholt/gtt-revert, I question
> > > the value of caching the GTT mapping in drm_intel_bo. For the cairo-gl and
> > > pts benchmarks I've run, the efficacy of the cached vma is very small and
> > > there is a very slight improvement by unmapping the vma after use. (The
> > > difference is so small, that it will take a lot more runs to determine if
> > > it is statistically significant.)
> > 
> > I'm confused.  You've measured a 5% impact from removing this part of
> > the caching, and I've measured 12-19%, so what are you planning that
> > doesn't involve caching the mapping that's faster than caching the
> > mapping?
> 
> As I pointed out, cairo-gl is using fallback code and not spans. Once that
> is corrected, cairo-gl no longer continually recreating textures and so
> unaffected by the patch. Removing the cached vma is then arguably
> beneficial, though the difference looks to be in the noise.

So you don't actually want us to fix the performance regression in
texture uploads, and instead want to just tell applications not to
upload textures?

What about applications where we're not the authors, and where they
don't have a choice but to upload textures (media players)?

[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain"
  2011-06-19 21:20                   ` Eric Anholt
@ 2011-06-19 21:49                     ` Chris Wilson
  2011-06-21 16:17                       ` Remove the overhead of vm_insert_pfn() Chris Wilson
  2011-06-21 18:09                       ` [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain" Keith Packard
  0 siblings, 2 replies; 17+ messages in thread
From: Chris Wilson @ 2011-06-19 21:49 UTC (permalink / raw)
  To: Eric Anholt, intel-gfx

On Sun, 19 Jun 2011 14:20:14 -0700, Eric Anholt <eric@anholt.net> wrote:
> On Sun, 19 Jun 2011 18:14:11 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > On Sun, 19 Jun 2011 10:01:23 -0700, Eric Anholt <eric@anholt.net> wrote:
> > > On Sun, 19 Jun 2011 17:28:11 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > > For lack of a better mechanism. Even using anholt/gtt-revert, I question
> > > > the value of caching the GTT mapping in drm_intel_bo. For the cairo-gl and
> > > > pts benchmarks I've run, the efficacy of the cached vma is very small and
> > > > there is a very slight improvement by unmapping the vma after use. (The
> > > > difference is so small, that it will take a lot more runs to determine if
> > > > it is statistically significant.)
> > > 
> > > I'm confused.  You've measured a 5% impact from removing this part of
> > > the caching, and I've measured 12-19%, so what are you planning that
> > > doesn't involve caching the mapping that's faster than caching the
> > > mapping?
> > 
> > As I pointed out, cairo-gl is using fallback code and not spans. Once that
> > is corrected, cairo-gl no longer continually recreating textures and so
> > unaffected by the patch. Removing the cached vma is then arguably
> > beneficial, though the difference looks to be in the noise.
> 
> So you don't actually want us to fix the performance regression in
> texture uploads, and instead want to just tell applications not to
> upload textures?

Nowhere, did I say that. I just said you were optimising for broken code
and papering over bugs and lack of functionality elsewhere.

The patch closes a race condition. The essence of your complaint is that
the kernel is not as fast as we need it to be, and that the initial upload
to any object is slower than expected. I presume you also have a plan for
fixing the underwhelming performance of a glibc memcpy through an
established GTT mapping?

> What about applications where we're not the authors, and where they
> don't have a choice but to upload textures (media players)?

No, I'm not necessarily saying application code is broken, just the
library.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Remove the overhead of vm_insert_pfn()
  2011-06-19 21:49                     ` Chris Wilson
@ 2011-06-21 16:17                       ` Chris Wilson
  2011-06-21 16:17                         ` [PATCH 1/3] resource: Use a common string for .name = "System RAM" Chris Wilson
                                           ` (2 more replies)
  2011-06-21 18:09                       ` [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain" Keith Packard
  1 sibling, 3 replies; 17+ messages in thread
From: Chris Wilson @ 2011-06-21 16:17 UTC (permalink / raw)
  To: intel-gfx

Poking around in x86,pat I found one way of avoiding the linear walk for
pat_pagerange_is_ram() which appears to nullify the regression. I would
appreciate confirmation and some review before I go poking dragons.
-Chris

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/3] resource: Use a common string for .name = "System RAM"
  2011-06-21 16:17                       ` Remove the overhead of vm_insert_pfn() Chris Wilson
@ 2011-06-21 16:17                         ` Chris Wilson
  2011-06-21 16:17                         ` [PATCH 2/3] x86, pat: Perform interval rbtree lookup of memtype first Chris Wilson
  2011-06-21 16:17                         ` [PATCH 3/3] x86, pat: Convert memtype_lock spinlock to a rwlock Chris Wilson
  2 siblings, 0 replies; 17+ messages in thread
From: Chris Wilson @ 2011-06-21 16:17 UTC (permalink / raw)
  To: intel-gfx

On a PAT system, inserting a pfn is quite expensive due to the query of
the protections for the page based on its physical location. A large
part of this overhead is due to the linear walk of resource regions and
the strcmp to find "System RAM". By restricting memory to use a global
string for its resource name, we can reduce this strcmp to a direct
pointer comparison and so eliminate the largest single overhead for
vma_insert_pfn().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 arch/arm/kernel/setup.c          |    8 ++++----
 arch/arm/plat-samsung/pm-check.c |    2 +-
 arch/avr32/kernel/setup.c        |    2 +-
 arch/ia64/kernel/efi.c           |    8 ++++----
 arch/mips/kernel/setup.c         |    4 ++--
 arch/parisc/mm/init.c            |    2 +-
 arch/s390/kernel/setup.c         |    6 +++---
 arch/score/kernel/setup.c        |    2 +-
 arch/sh/kernel/setup.c           |    2 +-
 arch/sparc/kernel/pci_common.c   |    2 +-
 arch/tile/kernel/setup.c         |    2 +-
 arch/unicore32/kernel/setup.c    |    2 +-
 arch/x86/kernel/e820.c           |    4 ++--
 arch/x86/kernel/probe_roms.c     |    2 +-
 include/linux/ioport.h           |    5 +++++
 kernel/resource.c                |   15 +++++++++++----
 mm/memory_hotplug.c              |    4 ++--
 17 files changed, 42 insertions(+), 30 deletions(-)

diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index ed11fb0..e525baa 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -166,19 +166,19 @@ static struct resource mem_res[] = {
 
 static struct resource io_res[] = {
 	{
-		.name = "reserved",
+		.name = resource_name__reserved,
 		.start = 0x3bc,
 		.end = 0x3be,
 		.flags = IORESOURCE_IO | IORESOURCE_BUSY
 	},
 	{
-		.name = "reserved",
+		.name = resource_name__reserved,
 		.start = 0x378,
 		.end = 0x37f,
 		.flags = IORESOURCE_IO | IORESOURCE_BUSY
 	},
 	{
-		.name = "reserved",
+		.name = resource_name__reserved,
 		.start = 0x278,
 		.end = 0x27f,
 		.flags = IORESOURCE_IO | IORESOURCE_BUSY
@@ -543,7 +543,7 @@ static void __init request_standard_resources(struct machine_desc *mdesc)
 
 	for_each_memblock(memory, region) {
 		res = alloc_bootmem_low(sizeof(*res));
-		res->name  = "System RAM";
+		res->name  = resource_name__system_ram;
 		res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
 		res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
 		res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
diff --git a/arch/arm/plat-samsung/pm-check.c b/arch/arm/plat-samsung/pm-check.c
index 6b733fa..d402409 100644
--- a/arch/arm/plat-samsung/pm-check.c
+++ b/arch/arm/plat-samsung/pm-check.c
@@ -54,7 +54,7 @@ static void s3c_pm_run_res(struct resource *ptr, run_fn_t fn, u32 *arg)
 			s3c_pm_run_res(ptr->child, fn, arg);
 
 		if ((ptr->flags & IORESOURCE_MEM) &&
-		    strcmp(ptr->name, "System RAM") == 0) {
+		    ptr->name == resource_name__system_ram) {
 			S3C_PMDBG("Found system RAM at %08lx..%08lx\n",
 				  (unsigned long)ptr->start,
 				  (unsigned long)ptr->end);
diff --git a/arch/avr32/kernel/setup.c b/arch/avr32/kernel/setup.c
index bb0974c..90ded04 100644
--- a/arch/avr32/kernel/setup.c
+++ b/arch/avr32/kernel/setup.c
@@ -133,7 +133,7 @@ add_physical_memory(resource_size_t start, resource_size_t end)
 	new = &res_cache[res_cache_next_free++];
 	new->start = start;
 	new->end = end;
-	new->name = "System RAM";
+	new->name = resource_name__system_ram;
 	new->flags = IORESOURCE_MEM;
 
 	*pprev = new;
diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c
index 6fc03af..44d5116 100644
--- a/arch/ia64/kernel/efi.c
+++ b/arch/ia64/kernel/efi.c
@@ -1233,12 +1233,12 @@ efi_initialize_iomem_resources(struct resource *code_resource,
 			case EFI_BOOT_SERVICES_CODE:
 			case EFI_CONVENTIONAL_MEMORY:
 				if (md->attribute & EFI_MEMORY_WP) {
-					name = "System ROM";
+					name = resource_name__system_rom;
 					flags |= IORESOURCE_READONLY;
 				} else if (md->attribute == EFI_MEMORY_UC)
 					name = "Uncached RAM";
 				else
-					name = "System RAM";
+					name = resource_name__system_ram;
 				break;
 
 			case EFI_ACPI_MEMORY_NVS:
@@ -1246,7 +1246,7 @@ efi_initialize_iomem_resources(struct resource *code_resource,
 				break;
 
 			case EFI_UNUSABLE_MEMORY:
-				name = "reserved";
+				name = resource_name__reserved;
 				flags |= IORESOURCE_DISABLED;
 				break;
 
@@ -1255,7 +1255,7 @@ efi_initialize_iomem_resources(struct resource *code_resource,
 			case EFI_RUNTIME_SERVICES_DATA:
 			case EFI_ACPI_RECLAIM_MEMORY:
 			default:
-				name = "reserved";
+				name = resource_name__reserved;
 				break;
 		}
 
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 8ad1d56..713352d 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -524,11 +524,11 @@ static void __init resource_init(void)
 		switch (boot_mem_map.map[i].type) {
 		case BOOT_MEM_RAM:
 		case BOOT_MEM_ROM_DATA:
-			res->name = "System RAM";
+			res->name = resource_name__system_ram;
 			break;
 		case BOOT_MEM_RESERVED:
 		default:
-			res->name = "reserved";
+			res->name = resource_name__reserved;
 		}
 
 		res->start = start;
diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
index 82f364e..4915ca8 100644
--- a/arch/parisc/mm/init.c
+++ b/arch/parisc/mm/init.c
@@ -183,7 +183,7 @@ static void __init setup_bootmem(void)
 	sysram_resource_count = npmem_ranges;
 	for (i = 0; i < sysram_resource_count; i++) {
 		struct resource *res = &sysram_resources[i];
-		res->name = "System RAM";
+		res->name = resource_name__system_ram;
 		res->start = pmem_ranges[i].start_pfn << PAGE_SHIFT;
 		res->end = res->start + (pmem_ranges[i].pages << PAGE_SHIFT)-1;
 		res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index 0c35dee..f892eb4 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -439,14 +439,14 @@ static void __init setup_resources(void)
 		res->flags = IORESOURCE_BUSY | IORESOURCE_MEM;
 		switch (memory_chunk[i].type) {
 		case CHUNK_READ_WRITE:
-			res->name = "System RAM";
+			res->name = resource_name__system_ram;
 			break;
 		case CHUNK_READ_ONLY:
-			res->name = "System ROM";
+			res->name = resource_name__system_rom;
 			res->flags |= IORESOURCE_READONLY;
 			break;
 		default:
-			res->name = "reserved";
+			res->name = resource_name__reserved;
 		}
 		res->start = memory_chunk[i].addr;
 		res->end = res->start + memory_chunk[i].size - 1;
diff --git a/arch/score/kernel/setup.c b/arch/score/kernel/setup.c
index 6f898c0..5feab05 100644
--- a/arch/score/kernel/setup.c
+++ b/arch/score/kernel/setup.c
@@ -96,7 +96,7 @@ static void __init resource_init(void)
 	data_resource.end = __pa(&_edata) - 1;
 
 	res = alloc_bootmem(sizeof(struct resource));
-	res->name = "System RAM";
+	res->name = resource_name__system_ram;
 	res->start = MEMORY_START;
 	res->end = MEMORY_START + MEMORY_SIZE - 1;
 	res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
diff --git a/arch/sh/kernel/setup.c b/arch/sh/kernel/setup.c
index 58bff45..4d8edae 100644
--- a/arch/sh/kernel/setup.c
+++ b/arch/sh/kernel/setup.c
@@ -199,7 +199,7 @@ void __init __add_active_range(unsigned int nid, unsigned long start_pfn,
 	start = start_pfn << PAGE_SHIFT;
 	end = end_pfn << PAGE_SHIFT;
 
-	res->name = "System RAM";
+	res->name = resource_name__system_ram;
 	res->start = start;
 	res->end = end - 1;
 	res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
diff --git a/arch/sparc/kernel/pci_common.c b/arch/sparc/kernel/pci_common.c
index 6e3874b..e7cf5ea 100644
--- a/arch/sparc/kernel/pci_common.c
+++ b/arch/sparc/kernel/pci_common.c
@@ -349,7 +349,7 @@ static void pci_register_legacy_regions(struct resource *io_res,
 	if (!p)
 		return;
 
-	p->name = "System ROM";
+	p->name = resource_name__system_rom;
 	p->start = mem_res->start + 0xf0000UL;
 	p->end = p->start + 0xffffUL;
 	p->flags = IORESOURCE_BUSY;
diff --git a/arch/tile/kernel/setup.c b/arch/tile/kernel/setup.c
index 6cdc9ba..901d908 100644
--- a/arch/tile/kernel/setup.c
+++ b/arch/tile/kernel/setup.c
@@ -1466,7 +1466,7 @@ insert_ram_resource(u64 start_pfn, u64 end_pfn)
 {
 	struct resource *res =
 		kzalloc(sizeof(struct resource), GFP_ATOMIC);
-	res->name = "System RAM";
+	res->name = resource_name__system_ram;
 	res->start = start_pfn << PAGE_SHIFT;
 	res->end = (end_pfn << PAGE_SHIFT) - 1;
 	res->flags = IORESOURCE_BUSY | IORESOURCE_MEM;
diff --git a/arch/unicore32/kernel/setup.c b/arch/unicore32/kernel/setup.c
index 471b6bc..2522832 100644
--- a/arch/unicore32/kernel/setup.c
+++ b/arch/unicore32/kernel/setup.c
@@ -203,7 +203,7 @@ request_standard_resources(struct meminfo *mi)
 			continue;
 
 		res = alloc_bootmem_low(sizeof(*res));
-		res->name  = "System RAM";
+		res->name  = resource_name__system_ram;
 		res->start = mi->bank[i].start;
 		res->end   = mi->bank[i].start + mi->bank[i].size - 1;
 		res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 3e2ef84..8642272 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -925,11 +925,11 @@ static inline const char *e820_type_to_string(int e820_type)
 {
 	switch (e820_type) {
 	case E820_RESERVED_KERN:
-	case E820_RAM:	return "System RAM";
+	case E820_RAM:	return resource_name__system_ram;
 	case E820_ACPI:	return "ACPI Tables";
 	case E820_NVS:	return "ACPI Non-volatile Storage";
 	case E820_UNUSABLE:	return "Unusable memory";
-	default:	return "reserved";
+	default:	return resource_name__reserved;
 	}
 }
 
diff --git a/arch/x86/kernel/probe_roms.c b/arch/x86/kernel/probe_roms.c
index ba0a4cc..ce7f8d4 100644
--- a/arch/x86/kernel/probe_roms.c
+++ b/arch/x86/kernel/probe_roms.c
@@ -21,7 +21,7 @@
 #include <asm/setup_arch.h>
 
 static struct resource system_rom_resource = {
-	.name	= "System ROM",
+	.name	= resource_name__system_rom,
 	.start	= 0xf0000,
 	.end	= 0xfffff,
 	.flags	= IORESOURCE_BUSY | IORESOURCE_READONLY | IORESOURCE_MEM
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index e9bb22c..b6222ee 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -109,6 +109,11 @@ struct resource_list {
 /* PCI control bits.  Shares IORESOURCE_BITS with above PCI ROM.  */
 #define IORESOURCE_PCI_FIXED		(1<<4)	/* Do not move resource */
 
+/* Some common resource names, must be used when appropriate. */
+extern const char resource_name__system_ram[];
+extern const char resource_name__system_rom[];
+extern const char resource_name__reserved[];
+
 /* PC/ISA/whatever - the normal PC address spaces: IO and memory */
 extern struct resource ioport_resource;
 extern struct resource iomem_resource;
diff --git a/kernel/resource.c b/kernel/resource.c
index 798e2fa..8c29c2e 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -269,13 +269,20 @@ int release_resource(struct resource *old)
 
 EXPORT_SYMBOL(release_resource);
 
+const char resource_name__system_ram[] = "System RAM";
+EXPORT_SYMBOL(resource_name__system_ram);
+const char resource_name__system_rom[] = "System ROM";
+EXPORT_SYMBOL(resource_name__system_rom);
+const char resource_name__reserved[] = "reserved";
+EXPORT_SYMBOL(resource_name__reserved);
+
 #if !defined(CONFIG_ARCH_HAS_WALK_MEMORY)
 /*
  * Finds the lowest memory reosurce exists within [res->start.res->end)
  * the caller must specify res->start, res->end, res->flags and "name".
  * If found, returns 0, res is overwritten, if not found, returns -1.
  */
-static int find_next_system_ram(struct resource *res, char *name)
+static int find_next_system_ram(struct resource *res, const char *name)
 {
 	resource_size_t start, end;
 	struct resource *p;
@@ -291,7 +298,7 @@ static int find_next_system_ram(struct resource *res, char *name)
 		/* system ram is just marked as IORESOURCE_MEM */
 		if (p->flags != res->flags)
 			continue;
-		if (name && strcmp(p->name, name))
+		if (name && p->name == name)
 			continue;
 		if (p->start > end) {
 			p = NULL;
@@ -329,7 +336,7 @@ int walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages,
 	res.flags = IORESOURCE_MEM | IORESOURCE_BUSY;
 	orig_end = res.end;
 	while ((res.start < res.end) &&
-		(find_next_system_ram(&res, "System RAM") >= 0)) {
+		(find_next_system_ram(&res, resource_name__system_ram) >= 0)) {
 		pfn = (res.start + PAGE_SIZE - 1) >> PAGE_SHIFT;
 		end_pfn = (res.end + 1) >> PAGE_SHIFT;
 		if (end_pfn > pfn)
@@ -936,7 +943,7 @@ static int __init reserve_setup(char *str)
 			break;
 		if (x < MAXRESERVE) {
 			struct resource *res = reserve + x;
-			res->name = "reserved";
+			res->name = resource_name__reserved;
 			res->start = io_start;
 			res->end = io_start + io_num - 1;
 			res->flags = IORESOURCE_BUSY;
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9f64637..97aa891 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -58,7 +58,7 @@ static struct resource *register_memory_resource(u64 start, u64 size)
 	res = kzalloc(sizeof(struct resource), GFP_KERNEL);
 	BUG_ON(!res);
 
-	res->name = "System RAM";
+	res->name = resource_name__system_ram;
 	res->start = start;
 	res->end = start + size - 1;
 	res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
@@ -571,7 +571,7 @@ int __ref add_memory(int nid, u64 start, u64 size)
 	}
 
 	/* create new memmap entry */
-	firmware_map_add_hotplug(start, start + size, "System RAM");
+	firmware_map_add_hotplug(start, start + size, resource_name__system_ram);
 
 	goto out;
 
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/3] x86, pat: Perform interval rbtree lookup of memtype first
  2011-06-21 16:17                       ` Remove the overhead of vm_insert_pfn() Chris Wilson
  2011-06-21 16:17                         ` [PATCH 1/3] resource: Use a common string for .name = "System RAM" Chris Wilson
@ 2011-06-21 16:17                         ` Chris Wilson
  2011-06-21 16:17                         ` [PATCH 3/3] x86, pat: Convert memtype_lock spinlock to a rwlock Chris Wilson
  2 siblings, 0 replies; 17+ messages in thread
From: Chris Wilson @ 2011-06-21 16:17 UTC (permalink / raw)
  To: intel-gfx

The PAT interval tree is only defined for non-RAM ranges, and is both a
shorter list and log-n lookup compared to the linear walk over the
resource ranges looking for "System RAM". In the case of heavy
vm_insert_pfn() users like the gpu drivers, which regularly modify the
contents of the AGP aperture, this gives a significant reduction in the
overhead of faulting in fresh addresses.

However, note that in 1f9cc3cb6a27521ed (x86, pat: Update the page flags
for memtype atomically instead of using memtype_lock), the contention
upon the memtype_lock in lookup_memtype() was observed to be behind a
factor of 50x reduction in page fault rate for 32 cpus running
vm_insert_pfn(). By performing the locked lookup of memtype first, we
are once again exposed to that contention for is_ram pages. Though in
effect we will be just moving the contention from resource_lock (rwlock)
to memtype_lock (spinlock)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
%%Cc: Robin Holt <holt@sgi.com>
%%Cc: Suresh Siddha <suresh.b.siddha@intel.com>
%%Cc: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/mm/pat.c |   24 +++++++++++++-----------
 1 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index f6ff57b..18d4aa9 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -384,10 +384,21 @@ int free_memtype(u64 start, u64 end)
  */
 static unsigned long lookup_memtype(u64 paddr)
 {
-	int rettype = _PAGE_CACHE_WB;
+	int rettype = -1;
 	struct memtype *entry;
 
 	if (x86_platform.is_untracked_pat_range(paddr, paddr + PAGE_SIZE))
+		return _PAGE_CACHE_WB;
+
+	spin_lock(&memtype_lock);
+
+	entry = rbt_memtype_lookup(paddr);
+	if (entry != NULL)
+		rettype = entry->type;
+
+	spin_unlock(&memtype_lock);
+
+	if (rettype != -1)
 		return rettype;
 
 	if (pat_pagerange_is_ram(paddr, paddr + PAGE_SIZE)) {
@@ -404,16 +415,7 @@ static unsigned long lookup_memtype(u64 paddr)
 		return rettype;
 	}
 
-	spin_lock(&memtype_lock);
-
-	entry = rbt_memtype_lookup(paddr);
-	if (entry != NULL)
-		rettype = entry->type;
-	else
-		rettype = _PAGE_CACHE_UC_MINUS;
-
-	spin_unlock(&memtype_lock);
-	return rettype;
+	return _PAGE_CACHE_UC_MINUS;
 }
 
 /**
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/3] x86, pat: Convert memtype_lock spinlock to a rwlock
  2011-06-21 16:17                       ` Remove the overhead of vm_insert_pfn() Chris Wilson
  2011-06-21 16:17                         ` [PATCH 1/3] resource: Use a common string for .name = "System RAM" Chris Wilson
  2011-06-21 16:17                         ` [PATCH 2/3] x86, pat: Perform interval rbtree lookup of memtype first Chris Wilson
@ 2011-06-21 16:17                         ` Chris Wilson
  2 siblings, 0 replies; 17+ messages in thread
From: Chris Wilson @ 2011-06-21 16:17 UTC (permalink / raw)
  To: intel-gfx

Presuming that we lookup the memtype of an address far more often than
we modify the PAT ranges, favour the reader.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 arch/x86/mm/pat.c |   20 ++++++++++----------
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 18d4aa9..8cc67e5 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -130,7 +130,7 @@ void pat_init(void)
 
 #undef PAT
 
-static DEFINE_SPINLOCK(memtype_lock);	/* protects memtype accesses */
+static DEFINE_RWLOCK(memtype_lock);	/* protects memtype accesses */
 
 /*
  * Does intersection of PAT memory type and MTRR memory type and returns
@@ -310,7 +310,7 @@ int reserve_memtype(u64 start, u64 end, unsigned long req_type,
 	new->end	= end;
 	new->type	= actual_type;
 
-	spin_lock(&memtype_lock);
+	write_lock(&memtype_lock);
 
 	err = rbt_memtype_check_insert(new, new_type);
 	if (err) {
@@ -318,12 +318,12 @@ int reserve_memtype(u64 start, u64 end, unsigned long req_type,
 		       "track %s, req %s\n",
 		       start, end, cattr_name(new->type), cattr_name(req_type));
 		kfree(new);
-		spin_unlock(&memtype_lock);
+		write_unlock(&memtype_lock);
 
 		return err;
 	}
 
-	spin_unlock(&memtype_lock);
+	write_unlock(&memtype_lock);
 
 	dprintk("reserve_memtype added 0x%Lx-0x%Lx, track %s, req %s, ret %s\n",
 		start, end, cattr_name(new->type), cattr_name(req_type),
@@ -355,9 +355,9 @@ int free_memtype(u64 start, u64 end)
 		return -EINVAL;
 	}
 
-	spin_lock(&memtype_lock);
+	write_lock(&memtype_lock);
 	entry = rbt_memtype_erase(start, end);
-	spin_unlock(&memtype_lock);
+	write_unlock(&memtype_lock);
 
 	if (!entry) {
 		printk(KERN_INFO "%s:%d freeing invalid memtype %Lx-%Lx\n",
@@ -390,13 +390,13 @@ static unsigned long lookup_memtype(u64 paddr)
 	if (x86_platform.is_untracked_pat_range(paddr, paddr + PAGE_SIZE))
 		return _PAGE_CACHE_WB;
 
-	spin_lock(&memtype_lock);
+	read_lock(&memtype_lock);
 
 	entry = rbt_memtype_lookup(paddr);
 	if (entry != NULL)
 		rettype = entry->type;
 
-	spin_unlock(&memtype_lock);
+	read_unlock(&memtype_lock);
 
 	if (rettype != -1)
 		return rettype;
@@ -754,9 +754,9 @@ static struct memtype *memtype_get_idx(loff_t pos)
 	if (!print_entry)
 		return NULL;
 
-	spin_lock(&memtype_lock);
+	read_lock(&memtype_lock);
 	ret = rbt_memtype_copy_nth_element(print_entry, pos);
-	spin_unlock(&memtype_lock);
+	read_unlock(&memtype_lock);
 
 	if (!ret) {
 		return print_entry;
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain"
  2011-06-19 21:49                     ` Chris Wilson
  2011-06-21 16:17                       ` Remove the overhead of vm_insert_pfn() Chris Wilson
@ 2011-06-21 18:09                       ` Keith Packard
  1 sibling, 0 replies; 17+ messages in thread
From: Keith Packard @ 2011-06-21 18:09 UTC (permalink / raw)
  To: Chris Wilson, Eric Anholt, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 734 bytes --]

On Sun, 19 Jun 2011 22:49:36 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:

> The patch closes a race condition. The essence of your complaint is that
> the kernel is not as fast as we need it to be, and that the initial upload
> to any object is slower than expected. I presume you also have a plan for
> fixing the underwhelming performance of a glibc memcpy through an
> established GTT mapping?

I see a performance regression in a critical path for most applications
(texture upload), and I'm not seeing any bug report that your fix
closes. That seems pretty conclusive to me; I'll plan on pushing a
revert of this patch to eliminate the regression from the previous
version.

-- 
keith.packard@intel.com

[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-06-21 18:09 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-14 23:43 [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain" Eric Anholt
2011-06-15  9:39 ` Chris Wilson
2011-06-15 15:08   ` Eric Anholt
2011-06-15 16:03     ` Chris Wilson
2011-06-17 19:06       ` Eric Anholt
2011-06-18 11:43         ` Chris Wilson
2011-06-18 20:20           ` Eric Anholt
2011-06-19 16:28             ` Chris Wilson
2011-06-19 17:01               ` Eric Anholt
2011-06-19 17:14                 ` Chris Wilson
2011-06-19 21:20                   ` Eric Anholt
2011-06-19 21:49                     ` Chris Wilson
2011-06-21 16:17                       ` Remove the overhead of vm_insert_pfn() Chris Wilson
2011-06-21 16:17                         ` [PATCH 1/3] resource: Use a common string for .name = "System RAM" Chris Wilson
2011-06-21 16:17                         ` [PATCH 2/3] x86, pat: Perform interval rbtree lookup of memtype first Chris Wilson
2011-06-21 16:17                         ` [PATCH 3/3] x86, pat: Convert memtype_lock spinlock to a rwlock Chris Wilson
2011-06-21 18:09                       ` [PATCH] Revert "drm/i915: Kill GTT mappings when moving from GTT domain" Keith Packard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).