Intel-GFX Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 1/2] drm/i915: simplify shmem pwrite/pread slowpath handling
@ 2012-11-15 14:40 Daniel Vetter
  2012-11-15 14:40 ` [PATCH 2/2] drm/i915: optimize the shmem_pwrite " Daniel Vetter
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Vetter @ 2012-11-15 14:40 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter

The shmem paths for pwrite/pread used a clever trick to hold onto a
single page when dropping the big dev->struct_mutex for the slowpath.
But this ran the risk of reinstating (or not completely purging) the
backing storage when dropping purgeable objects.

Hence the code needed to keep track of whether it ever dropped the
lock, and if it did, manually check whether it needs to re-purge the
backing storage. But thanks to the pages pin count introduced in

commit a5570178c059cec59e9835be20bc8546377fa7b5
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Sep 4 21:02:54 2012 +0100

    drm/i915: Pin backing pages whilst exporting through a dmabuf vmap

which allowed us to pin the backing storage and remove that page
reference trick from shmem_pwrite/read in

commit f60d7f0c1d55a935475ab394955cafddefaa6533
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Sep 4 21:02:56 2012 +0100

    drm/i915: Pin backing pages for pread

and

commit 755d22184f1e5015b040acee794542d9cf8a16c5
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Sep 4 21:02:55 2012 +0100

    drm/i915: Pin backing pages for pwrite

we can now abolish this check. The slowpath cleanup completely
disappears from pread, and for pwrite we're only left with the domain
fixup in case someone moved the object out of the cpu domain from
under us. A follow-on patch will optimize that a notch more.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c |   15 ++-------------
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index cdcf19d..eaaf095 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -407,7 +407,6 @@ i915_gem_shmem_pread(struct drm_device *dev,
 	loff_t offset;
 	int shmem_page_offset, page_length, ret = 0;
 	int obj_do_bit17_swizzling, page_do_bit17_swizzling;
-	int hit_slowpath = 0;
 	int prefaulted = 0;
 	int needs_clflush = 0;
 	struct scatterlist *sg;
@@ -469,7 +468,6 @@ i915_gem_shmem_pread(struct drm_device *dev,
 		if (ret == 0)
 			goto next_page;
 
-		hit_slowpath = 1;
 		mutex_unlock(&dev->struct_mutex);
 
 		if (!prefaulted) {
@@ -502,12 +500,6 @@ next_page:
 out:
 	i915_gem_object_unpin_pages(obj);
 
-	if (hit_slowpath) {
-		/* Fixup: Kill any reinstated backing storage pages */
-		if (obj->madv == __I915_MADV_PURGED)
-			i915_gem_object_truncate(obj);
-	}
-
 	return ret;
 }
 
@@ -838,11 +830,8 @@ out:
 	i915_gem_object_unpin_pages(obj);
 
 	if (hit_slowpath) {
-		/* Fixup: Kill any reinstated backing storage pages */
-		if (obj->madv == __I915_MADV_PURGED)
-			i915_gem_object_truncate(obj);
-		/* and flush dirty cachelines in case the object isn't in the cpu write
-		 * domain anymore. */
+		/* Fixup: Flush dirty cachelines in case the object isn't in the
+		 * cpu write domain anymore. */
 		if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
 			i915_gem_clflush_object(obj);
 			i915_gem_chipset_flush(dev);
-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2] drm/i915: optimize the shmem_pwrite slowpath handling
  2012-11-15 14:40 [PATCH 1/2] drm/i915: simplify shmem pwrite/pread slowpath handling Daniel Vetter
@ 2012-11-15 14:40 ` Daniel Vetter
  2012-11-15 15:00   ` Chris Wilson
  2012-11-15 15:02   ` Chris Wilson
  0 siblings, 2 replies; 9+ messages in thread
From: Daniel Vetter @ 2012-11-15 14:40 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter

Since we drop dev->struct_mutex when going through the slowpath, the
object might have been moved out of the cpu domain. Hence we need to
clflush the entire object to ensure that after the ioctl returns,
everything is coherent again (interwoven writes are ill-defined
anyway).

But we only need to do this if we start in the cpu domain and the
object requires flushing for coherency. So don't do the flushing if
the object is coherent anyway or if we've done in-line clfushing
already.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index eaaf095..feb0b0c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -831,8 +831,9 @@ out:
 
 	if (hit_slowpath) {
 		/* Fixup: Flush dirty cachelines in case the object isn't in the
-		 * cpu write domain anymore. */
-		if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
+		 * cpu write domain anymore, and we haven't flushed it manually. */
+		if (obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
+		    !needs_clflush_after && obj->cache_level == I915_CACHE_NONE) {
 			i915_gem_clflush_object(obj);
 			i915_gem_chipset_flush(dev);
 		}
-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] drm/i915: optimize the shmem_pwrite slowpath handling
  2012-11-15 14:40 ` [PATCH 2/2] drm/i915: optimize the shmem_pwrite " Daniel Vetter
@ 2012-11-15 15:00   ` Chris Wilson
  2012-11-15 15:02   ` Chris Wilson
  1 sibling, 0 replies; 9+ messages in thread
From: Chris Wilson @ 2012-11-15 15:00 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter

On Thu, 15 Nov 2012 15:40:06 +0100, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> Since we drop dev->struct_mutex when going through the slowpath, the
> object might have been moved out of the cpu domain. Hence we need to
> clflush the entire object to ensure that after the ioctl returns,
> everything is coherent again (interwoven writes are ill-defined
> anyway).
> 
> But we only need to do this if we start in the cpu domain and the
> object requires flushing for coherency. So don't do the flushing if
> the object is coherent anyway or if we've done in-line clfushing
> already.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> ---
>  drivers/gpu/drm/i915/i915_gem.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index eaaf095..feb0b0c 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -831,8 +831,9 @@ out:
>  
>  	if (hit_slowpath) {
>  		/* Fixup: Flush dirty cachelines in case the object isn't in the
> -		 * cpu write domain anymore. */
> -		if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
> +		 * cpu write domain anymore, and we haven't flushed it manually. */
> +		if (obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
> +		    !needs_clflush_after && obj->cache_level == I915_CACHE_NONE) {
>  			i915_gem_clflush_object(obj);
>  			i915_gem_chipset_flush(dev);
>  		}
> -- 
> 1.7.10.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
From: Chris Wilson <chris@chris-wilson.co.uk>
Subject: Re: [Intel-gfx] [PATCH 1/2] drm/i915: simplify shmem pwrite/pread slowpath handling
To: Daniel Vetter <daniel.vetter@ffwll.ch>, Intel Graphics Development <intel-gfx@lists.freedesktop.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
In-Reply-To: <1352990406-3039-1-git-send-email-daniel.vetter@ffwll.ch>
References: <1352990406-3039-1-git-send-email-daniel.vetter@ffwll.ch>

On Thu, 15 Nov 2012 15:40:05 +0100, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> The shmem paths for pwrite/pread used a clever trick to hold onto a
> single page when dropping the big dev->struct_mutex for the slowpath.
> But this ran the risk of reinstating (or not completely purging) the
> backing storage when dropping purgeable objects.
> 
> Hence the code needed to keep track of whether it ever dropped the
> lock, and if it did, manually check whether it needs to re-purge the
> backing storage. But thanks to the pages pin count introduced in
> 
> commit a5570178c059cec59e9835be20bc8546377fa7b5
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Sep 4 21:02:54 2012 +0100
> 
>     drm/i915: Pin backing pages whilst exporting through a dmabuf vmap
> 
> which allowed us to pin the backing storage and remove that page
> reference trick from shmem_pwrite/read in
> 
> commit f60d7f0c1d55a935475ab394955cafddefaa6533
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Sep 4 21:02:56 2012 +0100
> 
>     drm/i915: Pin backing pages for pread
> 
> and
> 
> commit 755d22184f1e5015b040acee794542d9cf8a16c5
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Sep 4 21:02:55 2012 +0100
> 
>     drm/i915: Pin backing pages for pwrite
> 
> we can now abolish this check. The slowpath cleanup completely
> disappears from pread, and for pwrite we're only left with the domain
> fixup in case someone moved the object out of the cpu domain from
> under us. A follow-on patch will optimize that a notch more.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

I approve of such removal of scary code, and indeed with the pinned
pages not disappearing beneath us we can forgo the truncate. (In fact it
*should* be redundant under such circumstances as the pages_unpin should
also call it.) The only problem that then remains is that we do not
document/prevent calling truncate on the shmem filp whilst the pages are
pinned. That's not possible with today's code, but we should clarify the
rule that truncate should only be called with obj->pages == NULL.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] drm/i915: optimize the shmem_pwrite slowpath handling
  2012-11-15 14:40 ` [PATCH 2/2] drm/i915: optimize the shmem_pwrite " Daniel Vetter
  2012-11-15 15:00   ` Chris Wilson
@ 2012-11-15 15:02   ` Chris Wilson
  2012-11-15 15:20     ` [PATCH] " Daniel Vetter
  1 sibling, 1 reply; 9+ messages in thread
From: Chris Wilson @ 2012-11-15 15:02 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter

On Thu, 15 Nov 2012 15:40:06 +0100, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> Since we drop dev->struct_mutex when going through the slowpath, the
> object might have been moved out of the cpu domain. Hence we need to
> clflush the entire object to ensure that after the ioctl returns,
> everything is coherent again (interwoven writes are ill-defined
> anyway).
> 
> But we only need to do this if we start in the cpu domain and the
> object requires flushing for coherency. So don't do the flushing if
> the object is coherent anyway or if we've done in-line clfushing
> already.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Technically true, but we do we care enough to add the extra line of
confusion? Not sold on this one yet, maybe it will be neater in future?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] drm/i915: optimize the shmem_pwrite slowpath handling
  2012-11-15 15:02   ` Chris Wilson
@ 2012-11-15 15:20     ` Daniel Vetter
  2012-11-15 15:37       ` Chris Wilson
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Vetter @ 2012-11-15 15:20 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter

Since we drop dev->struct_mutex when going through the slowpath, the
object might have been moved out of the cpu domain. Hence we need to
clflush the entire object to ensure that after the ioctl returns,
everything is coherent again (interwoven writes are ill-defined
anyway).

But we only need to do this if we start in the cpu domain and the
object requires flushing for coherency. So don't do the flushing if
the object is coherent anyway or if we've done in-line clfushing
already.

v2: i915_gem_clflush_object already checks whether the object is
coherent and if so, drops the flushing. Hence we don't need to check
that ourselves, simplifying the condition.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index eaaf095..ab66645 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -831,8 +831,9 @@ out:
 
 	if (hit_slowpath) {
 		/* Fixup: Flush dirty cachelines in case the object isn't in the
-		 * cpu write domain anymore. */
-		if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
+		 * cpu write domain anymore, and we haven't flushed it manually. */
+		if (obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
+		    !needs_clflush_after) {
 			i915_gem_clflush_object(obj);
 			i915_gem_chipset_flush(dev);
 		}
-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/i915: optimize the shmem_pwrite slowpath handling
  2012-11-15 15:20     ` [PATCH] " Daniel Vetter
@ 2012-11-15 15:37       ` Chris Wilson
  2012-11-15 15:53         ` Daniel Vetter
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Wilson @ 2012-11-15 15:37 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter

On Thu, 15 Nov 2012 16:20:49 +0100, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> Since we drop dev->struct_mutex when going through the slowpath, the
> object might have been moved out of the cpu domain. Hence we need to
> clflush the entire object to ensure that after the ioctl returns,
> everything is coherent again (interwoven writes are ill-defined
> anyway).
> 
> But we only need to do this if we start in the cpu domain and the
> object requires flushing for coherency. So don't do the flushing if
> the object is coherent anyway or if we've done in-line clfushing
> already.
> 
> v2: i915_gem_clflush_object already checks whether the object is
> coherent and if so, drops the flushing. Hence we don't need to check
> that ourselves, simplifying the condition.
> 
Getting clearer, certainly. How about reversing the order of the
checks so that it reads as
  if (used-fast-path && fast-path-no-longer-valid) do_fixup()?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] drm/i915: optimize the shmem_pwrite slowpath handling
  2012-11-15 15:37       ` Chris Wilson
@ 2012-11-15 15:53         ` Daniel Vetter
  2012-11-15 16:07           ` Chris Wilson
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Vetter @ 2012-11-15 15:53 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter

Since we drop dev->struct_mutex when going through the slowpath, the
object might have been moved out of the cpu domain. Hence we need to
clflush the entire object to ensure that after the ioctl returns,
everything is coherent again (interwoven writes are ill-defined
anyway).

But we only need to do this if we start in the cpu domain and the
object requires flushing for coherency. So don't do the flushing if
the object is coherent anyway or if we've done in-line clfushing
already.

v2: i915_gem_clflush_object already checks whether the object is
coherent and if so, drops the flushing. Hence we don't need to check
that ourselves, simplifying the condition.

v3: Reorder the checks for better clarify (and adjust the comment
accordingly), suggested by Chris Wilson.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c |   10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index eaaf095..851f787 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -830,9 +830,13 @@ out:
 	i915_gem_object_unpin_pages(obj);
 
 	if (hit_slowpath) {
-		/* Fixup: Flush dirty cachelines in case the object isn't in the
-		 * cpu write domain anymore. */
-		if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
+		/*
+		 * Fixup: Flush cpu caches in case we didn't flush the dirty
+		 * cachelines in-line while writing and the object moved
+		 * out of the cpu write domain while we've dropped the lock.
+		 */
+		if (!needs_clflush_after &&
+		    obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
 			i915_gem_clflush_object(obj);
 			i915_gem_chipset_flush(dev);
 		}
-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/i915: optimize the shmem_pwrite slowpath handling
  2012-11-15 15:53         ` Daniel Vetter
@ 2012-11-15 16:07           ` Chris Wilson
  2012-11-29 12:49             ` Daniel Vetter
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Wilson @ 2012-11-15 16:07 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter

On Thu, 15 Nov 2012 16:53:58 +0100, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> Since we drop dev->struct_mutex when going through the slowpath, the
> object might have been moved out of the cpu domain. Hence we need to
> clflush the entire object to ensure that after the ioctl returns,
> everything is coherent again (interwoven writes are ill-defined
> anyway).
> 
> But we only need to do this if we start in the cpu domain and the
> object requires flushing for coherency. So don't do the flushing if
> the object is coherent anyway or if we've done in-line clfushing
> already.
> 
> v2: i915_gem_clflush_object already checks whether the object is
> coherent and if so, drops the flushing. Hence we don't need to check
> that ourselves, simplifying the condition.
> 
> v3: Reorder the checks for better clarify (and adjust the comment
> accordingly), suggested by Chris Wilson.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Yup, the comment makes much more sense now.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/i915: optimize the shmem_pwrite slowpath handling
  2012-11-15 16:07           ` Chris Wilson
@ 2012-11-29 12:49             ` Daniel Vetter
  0 siblings, 0 replies; 9+ messages in thread
From: Daniel Vetter @ 2012-11-29 12:49 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, Intel Graphics Development

On Thu, Nov 15, 2012 at 04:07:45PM +0000, Chris Wilson wrote:
> On Thu, 15 Nov 2012 16:53:58 +0100, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > Since we drop dev->struct_mutex when going through the slowpath, the
> > object might have been moved out of the cpu domain. Hence we need to
> > clflush the entire object to ensure that after the ioctl returns,
> > everything is coherent again (interwoven writes are ill-defined
> > anyway).
> > 
> > But we only need to do this if we start in the cpu domain and the
> > object requires flushing for coherency. So don't do the flushing if
> > the object is coherent anyway or if we've done in-line clfushing
> > already.
> > 
> > v2: i915_gem_clflush_object already checks whether the object is
> > coherent and if so, drops the flushing. Hence we don't need to check
> > that ourselves, simplifying the condition.
> > 
> > v3: Reorder the checks for better clarify (and adjust the comment
> > accordingly), suggested by Chris Wilson.
> > 
> > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Yup, the comment makes much more sense now.
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Both patches merged to dinq, thanks a lot for the review comments.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, back to index

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-15 14:40 [PATCH 1/2] drm/i915: simplify shmem pwrite/pread slowpath handling Daniel Vetter
2012-11-15 14:40 ` [PATCH 2/2] drm/i915: optimize the shmem_pwrite " Daniel Vetter
2012-11-15 15:00   ` Chris Wilson
2012-11-15 15:02   ` Chris Wilson
2012-11-15 15:20     ` [PATCH] " Daniel Vetter
2012-11-15 15:37       ` Chris Wilson
2012-11-15 15:53         ` Daniel Vetter
2012-11-15 16:07           ` Chris Wilson
2012-11-29 12:49             ` Daniel Vetter

Intel-GFX Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/intel-gfx/0 intel-gfx/git/0.git
	git clone --mirror https://lore.kernel.org/intel-gfx/1 intel-gfx/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 intel-gfx intel-gfx/ https://lore.kernel.org/intel-gfx \
		intel-gfx@lists.freedesktop.org
	public-inbox-index intel-gfx

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.freedesktop.lists.intel-gfx


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git