All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/i915: Prefault the entire object on first page fault
@ 2014-06-10 11:14 Chris Wilson
  2014-06-10 11:14 ` [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass Chris Wilson
  2014-06-11 20:41 ` [PATCH 1/2] drm/i915: Prefault the entire object on first page fault Volkin, Bradley D
  0 siblings, 2 replies; 9+ messages in thread
From: Chris Wilson @ 2014-06-10 11:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: Goel, Akash

Inserting additional PTEs has no side-effect for us as the pfn are fixed
for the entire time the object is resident in the global GTT. The
downside is that we pay the entire cost of faulting the object upon the
first hit, for which we in return receive the benefit of removing the
per-page faulting overhead.

On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences,
Upload rate for 2 linear surfaces:	8127MiB/s -> 8134MiB/s
Upload rate for 2 tiled surfaces:	8607MiB/s -> 8625MiB/s
Upload rate for 4 linear surfaces:	8127MiB/s -> 8127MiB/s
Upload rate for 4 tiled surfaces:	8611MiB/s -> 8602MiB/s
Upload rate for 8 linear surfaces:	8114MiB/s -> 8124MiB/s
Upload rate for 8 tiled surfaces:	8601MiB/s -> 8603MiB/s
Upload rate for 16 linear surfaces:	8110MiB/s -> 8123MiB/s
Upload rate for 16 tiled surfaces:	8595MiB/s -> 8606MiB/s
Upload rate for 32 linear surfaces:	8104MiB/s -> 8121MiB/s
Upload rate for 32 tiled surfaces:	8589MiB/s -> 8605MiB/s
Upload rate for 64 linear surfaces:	8107MiB/s -> 8121MiB/s
Upload rate for 64 tiled surfaces:	2013MiB/s -> 3017MiB/s

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Goel, Akash" <akash.goel@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3aaf7e01235e..e1f68f06c2ef 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1704,14 +1704,26 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	if (ret)
 		goto unpin;
 
-	obj->fault_mappable = true;
-
+	/* Finally, remap it using the new GTT offset */
 	pfn = dev_priv->gtt.mappable_base + i915_gem_obj_ggtt_offset(obj);
 	pfn >>= PAGE_SHIFT;
-	pfn += page_offset;
 
-	/* Finally, remap it using the new GTT offset */
-	ret = vm_insert_pfn(vma, (unsigned long)vmf->virtual_address, pfn);
+	if (!obj->fault_mappable) {
+		int i;
+
+		for (i = 0; i < obj->base.size >> PAGE_SHIFT; i++) {
+			ret = vm_insert_pfn(vma,
+					    (unsigned long)vma->vm_start + i * PAGE_SIZE,
+					    pfn + i);
+			if (ret)
+				break;
+		}
+
+		obj->fault_mappable = true;
+	} else
+		ret = vm_insert_pfn(vma,
+				    (unsigned long)vmf->virtual_address,
+				    pfn + page_offset);
 unpin:
 	i915_gem_object_ggtt_unpin(obj);
 unlock:
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass
  2014-06-10 11:14 [PATCH 1/2] drm/i915: Prefault the entire object on first page fault Chris Wilson
@ 2014-06-10 11:14 ` Chris Wilson
  2014-06-11 21:17   ` Volkin, Bradley D
  2014-06-11 20:41 ` [PATCH 1/2] drm/i915: Prefault the entire object on first page fault Volkin, Bradley D
  1 sibling, 1 reply; 9+ messages in thread
From: Chris Wilson @ 2014-06-10 11:14 UTC (permalink / raw)
  To: intel-gfx

On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences,
Upload rate for 2 linear surfaces:  8134MiB/s -> 8154MiB/s
Upload rate for 2 tiled surfaces:   8625MiB/s -> 8632MiB/s
Upload rate for 4 linear surfaces:  8127MiB/s -> 8134MiB/s
Upload rate for 4 tiled surfaces:   8602MiB/s -> 8629MiB/s
Upload rate for 8 linear surfaces:  8124MiB/s -> 8137MiB/s
Upload rate for 8 tiled surfaces:   8603MiB/s -> 8624MiB/s
Upload rate for 16 linear surfaces: 8123MiB/s -> 8128MiB/s
Upload rate for 16 tiled surfaces:  8606MiB/s -> 8618MiB/s
Upload rate for 32 linear surfaces: 8121MiB/s -> 8128MiB/s
Upload rate for 32 tiled surfaces:  8605MiB/s -> 8614MiB/s
Upload rate for 64 linear surfaces: 8121MiB/s -> 8127MiB/s
Upload rate for 64 tiled surfaces:  3017MiB/s -> 5127MiB/s

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 28 +++++++++++++---------------
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e1f68f06c2ef..7128d389a6ff 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1709,21 +1709,19 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	pfn >>= PAGE_SHIFT;
 
 	if (!obj->fault_mappable) {
-		int i;
-
-		for (i = 0; i < obj->base.size >> PAGE_SHIFT; i++) {
-			ret = vm_insert_pfn(vma,
-					    (unsigned long)vma->vm_start + i * PAGE_SIZE,
-					    pfn + i);
-			if (ret)
-				break;
-		}
-
-		obj->fault_mappable = true;
-	} else
-		ret = vm_insert_pfn(vma,
-				    (unsigned long)vmf->virtual_address,
-				    pfn + page_offset);
+		ret = remap_pfn_range(vma,
+				      vma->vm_start,
+				      pfn,
+				      obj->base.size,
+				      vma->vm_page_prot);
+		obj->fault_mappable = ret == 0;
+	} else {
+		ret = remap_pfn_range(vma,
+				      (unsigned long)vmf->virtual_address,
+				      pfn + page_offset,
+				      PAGE_SIZE,
+				      vma->vm_page_prot);
+	}
 unpin:
 	i915_gem_object_ggtt_unpin(obj);
 unlock:
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] drm/i915: Prefault the entire object on first page fault
  2014-06-10 11:14 [PATCH 1/2] drm/i915: Prefault the entire object on first page fault Chris Wilson
  2014-06-10 11:14 ` [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass Chris Wilson
@ 2014-06-11 20:41 ` Volkin, Bradley D
  2014-06-12  7:21   ` Chris Wilson
  1 sibling, 1 reply; 9+ messages in thread
From: Volkin, Bradley D @ 2014-06-11 20:41 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, Goel, Akash

On Tue, Jun 10, 2014 at 04:14:40AM -0700, Chris Wilson wrote:
> Inserting additional PTEs has no side-effect for us as the pfn are fixed
> for the entire time the object is resident in the global GTT. The
> downside is that we pay the entire cost of faulting the object upon the
> first hit, for which we in return receive the benefit of removing the
> per-page faulting overhead.
> 
> On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences,
> Upload rate for 2 linear surfaces:	8127MiB/s -> 8134MiB/s
> Upload rate for 2 tiled surfaces:	8607MiB/s -> 8625MiB/s
> Upload rate for 4 linear surfaces:	8127MiB/s -> 8127MiB/s
> Upload rate for 4 tiled surfaces:	8611MiB/s -> 8602MiB/s
> Upload rate for 8 linear surfaces:	8114MiB/s -> 8124MiB/s
> Upload rate for 8 tiled surfaces:	8601MiB/s -> 8603MiB/s
> Upload rate for 16 linear surfaces:	8110MiB/s -> 8123MiB/s
> Upload rate for 16 tiled surfaces:	8595MiB/s -> 8606MiB/s
> Upload rate for 32 linear surfaces:	8104MiB/s -> 8121MiB/s
> Upload rate for 32 tiled surfaces:	8589MiB/s -> 8605MiB/s
> Upload rate for 64 linear surfaces:	8107MiB/s -> 8121MiB/s
> Upload rate for 64 tiled surfaces:	2013MiB/s -> 3017MiB/s
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: "Goel, Akash" <akash.goel@intel.com>

For reproducibility it would be nice to have the testcase info, assuming
it's something from i-g-t. Other than that, I think this change looks good.

Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 22 +++++++++++++++++-----
>  1 file changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 3aaf7e01235e..e1f68f06c2ef 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1704,14 +1704,26 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>  	if (ret)
>  		goto unpin;
>  
> -	obj->fault_mappable = true;
> -
> +	/* Finally, remap it using the new GTT offset */
>  	pfn = dev_priv->gtt.mappable_base + i915_gem_obj_ggtt_offset(obj);
>  	pfn >>= PAGE_SHIFT;
> -	pfn += page_offset;
>  
> -	/* Finally, remap it using the new GTT offset */
> -	ret = vm_insert_pfn(vma, (unsigned long)vmf->virtual_address, pfn);
> +	if (!obj->fault_mappable) {
> +		int i;
> +
> +		for (i = 0; i < obj->base.size >> PAGE_SHIFT; i++) {
> +			ret = vm_insert_pfn(vma,
> +					    (unsigned long)vma->vm_start + i * PAGE_SIZE,
> +					    pfn + i);
> +			if (ret)
> +				break;
> +		}
> +
> +		obj->fault_mappable = true;
> +	} else
> +		ret = vm_insert_pfn(vma,
> +				    (unsigned long)vmf->virtual_address,
> +				    pfn + page_offset);
>  unpin:
>  	i915_gem_object_ggtt_unpin(obj);
>  unlock:
> -- 
> 2.0.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass
  2014-06-10 11:14 ` [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass Chris Wilson
@ 2014-06-11 21:17   ` Volkin, Bradley D
  2014-06-12  7:15     ` Daniel Vetter
                       ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Volkin, Bradley D @ 2014-06-11 21:17 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Jun 10, 2014 at 04:14:41AM -0700, Chris Wilson wrote:
> On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences,
> Upload rate for 2 linear surfaces:  8134MiB/s -> 8154MiB/s
> Upload rate for 2 tiled surfaces:   8625MiB/s -> 8632MiB/s
> Upload rate for 4 linear surfaces:  8127MiB/s -> 8134MiB/s
> Upload rate for 4 tiled surfaces:   8602MiB/s -> 8629MiB/s
> Upload rate for 8 linear surfaces:  8124MiB/s -> 8137MiB/s
> Upload rate for 8 tiled surfaces:   8603MiB/s -> 8624MiB/s
> Upload rate for 16 linear surfaces: 8123MiB/s -> 8128MiB/s
> Upload rate for 16 tiled surfaces:  8606MiB/s -> 8618MiB/s
> Upload rate for 32 linear surfaces: 8121MiB/s -> 8128MiB/s
> Upload rate for 32 tiled surfaces:  8605MiB/s -> 8614MiB/s
> Upload rate for 64 linear surfaces: 8121MiB/s -> 8127MiB/s
> Upload rate for 64 tiled surfaces:  3017MiB/s -> 5127MiB/s
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

The translation from vm_insert_pfn to remap_pfn_range looks correct. I don't know
these APIs particularly well though. I wonder if there's any reason it would be
unsafe to call remap_pfn_range from .fault() since it seems to only be used in
.mmap() handlers in other places. Reading their implementations, nothing jumped
out, so I'll say

Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>

with the note that you may want to take a look just in case.

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 28 +++++++++++++---------------
>  1 file changed, 13 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index e1f68f06c2ef..7128d389a6ff 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1709,21 +1709,19 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>  	pfn >>= PAGE_SHIFT;
>  
>  	if (!obj->fault_mappable) {
> -		int i;
> -
> -		for (i = 0; i < obj->base.size >> PAGE_SHIFT; i++) {
> -			ret = vm_insert_pfn(vma,
> -					    (unsigned long)vma->vm_start + i * PAGE_SIZE,
> -					    pfn + i);
> -			if (ret)
> -				break;
> -		}
> -
> -		obj->fault_mappable = true;
> -	} else
> -		ret = vm_insert_pfn(vma,
> -				    (unsigned long)vmf->virtual_address,
> -				    pfn + page_offset);
> +		ret = remap_pfn_range(vma,
> +				      vma->vm_start,
> +				      pfn,
> +				      obj->base.size,
> +				      vma->vm_page_prot);
> +		obj->fault_mappable = ret == 0;
> +	} else {
> +		ret = remap_pfn_range(vma,
> +				      (unsigned long)vmf->virtual_address,
> +				      pfn + page_offset,
> +				      PAGE_SIZE,
> +				      vma->vm_page_prot);
> +	}
>  unpin:
>  	i915_gem_object_ggtt_unpin(obj);
>  unlock:
> -- 
> 2.0.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass
  2014-06-11 21:17   ` Volkin, Bradley D
@ 2014-06-12  7:15     ` Daniel Vetter
  2014-06-12  7:20     ` Chris Wilson
  2014-06-12 16:15     ` Daniel Vetter
  2 siblings, 0 replies; 9+ messages in thread
From: Daniel Vetter @ 2014-06-12  7:15 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Wed, Jun 11, 2014 at 02:17:21PM -0700, Volkin, Bradley D wrote:
> On Tue, Jun 10, 2014 at 04:14:41AM -0700, Chris Wilson wrote:
> > On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences,
> > Upload rate for 2 linear surfaces:  8134MiB/s -> 8154MiB/s
> > Upload rate for 2 tiled surfaces:   8625MiB/s -> 8632MiB/s
> > Upload rate for 4 linear surfaces:  8127MiB/s -> 8134MiB/s
> > Upload rate for 4 tiled surfaces:   8602MiB/s -> 8629MiB/s
> > Upload rate for 8 linear surfaces:  8124MiB/s -> 8137MiB/s
> > Upload rate for 8 tiled surfaces:   8603MiB/s -> 8624MiB/s
> > Upload rate for 16 linear surfaces: 8123MiB/s -> 8128MiB/s
> > Upload rate for 16 tiled surfaces:  8606MiB/s -> 8618MiB/s
> > Upload rate for 32 linear surfaces: 8121MiB/s -> 8128MiB/s
> > Upload rate for 32 tiled surfaces:  8605MiB/s -> 8614MiB/s
> > Upload rate for 64 linear surfaces: 8121MiB/s -> 8127MiB/s
> > Upload rate for 64 tiled surfaces:  3017MiB/s -> 5127MiB/s
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> The translation from vm_insert_pfn to remap_pfn_range looks correct. I don't know
> these APIs particularly well though. I wonder if there's any reason it would be
> unsafe to call remap_pfn_range from .fault() since it seems to only be used in
> .mmap() handlers in other places. Reading their implementations, nothing jumped
> out, so I'll say

Most drivers just statically relocate their BAR to userspace, we manage it
dynamically. Iirc vm_insert_pfn was more-or-less added for us, too. In any
case I think we're safe.
> 
> Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
> 
> with the note that you may want to take a look just in case.

Thanks for the review, also added the Testcase: tag now that Chris pushed
his igt patch.
-Daniel

> 
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 28 +++++++++++++---------------
> >  1 file changed, 13 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index e1f68f06c2ef..7128d389a6ff 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -1709,21 +1709,19 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> >  	pfn >>= PAGE_SHIFT;
> >  
> >  	if (!obj->fault_mappable) {
> > -		int i;
> > -
> > -		for (i = 0; i < obj->base.size >> PAGE_SHIFT; i++) {
> > -			ret = vm_insert_pfn(vma,
> > -					    (unsigned long)vma->vm_start + i * PAGE_SIZE,
> > -					    pfn + i);
> > -			if (ret)
> > -				break;
> > -		}
> > -
> > -		obj->fault_mappable = true;
> > -	} else
> > -		ret = vm_insert_pfn(vma,
> > -				    (unsigned long)vmf->virtual_address,
> > -				    pfn + page_offset);
> > +		ret = remap_pfn_range(vma,
> > +				      vma->vm_start,
> > +				      pfn,
> > +				      obj->base.size,
> > +				      vma->vm_page_prot);
> > +		obj->fault_mappable = ret == 0;
> > +	} else {
> > +		ret = remap_pfn_range(vma,
> > +				      (unsigned long)vmf->virtual_address,
> > +				      pfn + page_offset,
> > +				      PAGE_SIZE,
> > +				      vma->vm_page_prot);
> > +	}
> >  unpin:
> >  	i915_gem_object_ggtt_unpin(obj);
> >  unlock:
> > -- 
> > 2.0.0
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass
  2014-06-11 21:17   ` Volkin, Bradley D
  2014-06-12  7:15     ` Daniel Vetter
@ 2014-06-12  7:20     ` Chris Wilson
  2014-06-12 16:15     ` Daniel Vetter
  2 siblings, 0 replies; 9+ messages in thread
From: Chris Wilson @ 2014-06-12  7:20 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Wed, Jun 11, 2014 at 02:17:21PM -0700, Volkin, Bradley D wrote:
> On Tue, Jun 10, 2014 at 04:14:41AM -0700, Chris Wilson wrote:
> > On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences,
using i-g-t/gem_fence_upload:
> > Upload rate for 2 linear surfaces:  8134MiB/s -> 8154MiB/s
> > Upload rate for 2 tiled surfaces:   8625MiB/s -> 8632MiB/s
> > Upload rate for 4 linear surfaces:  8127MiB/s -> 8134MiB/s
> > Upload rate for 4 tiled surfaces:   8602MiB/s -> 8629MiB/s
> > Upload rate for 8 linear surfaces:  8124MiB/s -> 8137MiB/s
> > Upload rate for 8 tiled surfaces:   8603MiB/s -> 8624MiB/s
> > Upload rate for 16 linear surfaces: 8123MiB/s -> 8128MiB/s
> > Upload rate for 16 tiled surfaces:  8606MiB/s -> 8618MiB/s
> > Upload rate for 32 linear surfaces: 8121MiB/s -> 8128MiB/s
> > Upload rate for 32 tiled surfaces:  8605MiB/s -> 8614MiB/s
> > Upload rate for 64 linear surfaces: 8121MiB/s -> 8127MiB/s
> > Upload rate for 64 tiled surfaces:  3017MiB/s -> 5127MiB/s
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Testcase: i-g-t/gem_fence_upload
 
> The translation from vm_insert_pfn to remap_pfn_range looks correct. I don't know
> these APIs particularly well though. I wonder if there's any reason it would be
> unsafe to call remap_pfn_range from .fault() since it seems to only be used in
> .mmap() handlers in other places. Reading their implementations, nothing jumped
> out, so I'll say

Right, using it within the fault handler is fine. The underlying
operations are equivalent to the vm_insert_pfn() just unrolled for a
contiguous range. So the reason why other drivers do it in their mmap
handler rather than at fault is that they have a radically simpler
driver model than we do.

An exercise for the reader would be to erradicate the walk_system_ram()
still required by remap_pfn_range() which we avoided in our prototypical
vm_insert_pfn_from_io_mapping(). [We could simply do a
remap_pfn_from_io_mapping that shared the core with remap_pfn_range I
guess.]
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] drm/i915: Prefault the entire object on first page fault
  2014-06-11 20:41 ` [PATCH 1/2] drm/i915: Prefault the entire object on first page fault Volkin, Bradley D
@ 2014-06-12  7:21   ` Chris Wilson
  0 siblings, 0 replies; 9+ messages in thread
From: Chris Wilson @ 2014-06-12  7:21 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx, Goel, Akash

On Wed, Jun 11, 2014 at 01:41:45PM -0700, Volkin, Bradley D wrote:
> On Tue, Jun 10, 2014 at 04:14:40AM -0700, Chris Wilson wrote:
> > Inserting additional PTEs has no side-effect for us as the pfn are fixed
> > for the entire time the object is resident in the global GTT. The
> > downside is that we pay the entire cost of faulting the object upon the
> > first hit, for which we in return receive the benefit of removing the
> > per-page faulting overhead.
> > 
> > On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences,
using i-g-t/gem_fence_upload
> > Upload rate for 2 linear surfaces:	8127MiB/s -> 8134MiB/s
> > Upload rate for 2 tiled surfaces:	8607MiB/s -> 8625MiB/s
> > Upload rate for 4 linear surfaces:	8127MiB/s -> 8127MiB/s
> > Upload rate for 4 tiled surfaces:	8611MiB/s -> 8602MiB/s
> > Upload rate for 8 linear surfaces:	8114MiB/s -> 8124MiB/s
> > Upload rate for 8 tiled surfaces:	8601MiB/s -> 8603MiB/s
> > Upload rate for 16 linear surfaces:	8110MiB/s -> 8123MiB/s
> > Upload rate for 16 tiled surfaces:	8595MiB/s -> 8606MiB/s
> > Upload rate for 32 linear surfaces:	8104MiB/s -> 8121MiB/s
> > Upload rate for 32 tiled surfaces:	8589MiB/s -> 8605MiB/s
> > Upload rate for 64 linear surfaces:	8107MiB/s -> 8121MiB/s
> > Upload rate for 64 tiled surfaces:	2013MiB/s -> 3017MiB/s
> > 
Testcase: i-g-t/gem_fence_upload
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: "Goel, Akash" <akash.goel@intel.com>
> 
> For reproducibility it would be nice to have the testcase info, assuming
> it's something from i-g-t. Other than that, I think this change looks good.

It was a proposed test case along with the last set of patches. I should
have referenced it properly in the commit.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass
  2014-06-11 21:17   ` Volkin, Bradley D
  2014-06-12  7:15     ` Daniel Vetter
  2014-06-12  7:20     ` Chris Wilson
@ 2014-06-12 16:15     ` Daniel Vetter
  2 siblings, 0 replies; 9+ messages in thread
From: Daniel Vetter @ 2014-06-12 16:15 UTC (permalink / raw)
  To: Volkin, Bradley D; +Cc: intel-gfx

On Wed, Jun 11, 2014 at 02:17:21PM -0700, Volkin, Bradley D wrote:
> On Tue, Jun 10, 2014 at 04:14:41AM -0700, Chris Wilson wrote:
> > On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences,
> > Upload rate for 2 linear surfaces:  8134MiB/s -> 8154MiB/s
> > Upload rate for 2 tiled surfaces:   8625MiB/s -> 8632MiB/s
> > Upload rate for 4 linear surfaces:  8127MiB/s -> 8134MiB/s
> > Upload rate for 4 tiled surfaces:   8602MiB/s -> 8629MiB/s
> > Upload rate for 8 linear surfaces:  8124MiB/s -> 8137MiB/s
> > Upload rate for 8 tiled surfaces:   8603MiB/s -> 8624MiB/s
> > Upload rate for 16 linear surfaces: 8123MiB/s -> 8128MiB/s
> > Upload rate for 16 tiled surfaces:  8606MiB/s -> 8618MiB/s
> > Upload rate for 32 linear surfaces: 8121MiB/s -> 8128MiB/s
> > Upload rate for 32 tiled surfaces:  8605MiB/s -> 8614MiB/s
> > Upload rate for 64 linear surfaces: 8121MiB/s -> 8127MiB/s
> > Upload rate for 64 tiled surfaces:  3017MiB/s -> 5127MiB/s
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> The translation from vm_insert_pfn to remap_pfn_range looks correct. I don't know
> these APIs particularly well though. I wonder if there's any reason it would be
> unsafe to call remap_pfn_range from .fault() since it seems to only be used in
> .mmap() handlers in other places. Reading their implementations, nothing jumped
> out, so I'll say

So apparently wasn't quite the same semantics and remap_pfn_range doesn't
handle concurrent operations too well. Can you please also review Chris'
fixup patch for that?

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass
  2014-06-13 16:26 [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range() Chris Wilson
@ 2014-06-13 16:26 ` Chris Wilson
  0 siblings, 0 replies; 9+ messages in thread
From: Chris Wilson @ 2014-06-13 16:26 UTC (permalink / raw)
  To: intel-gfx; +Cc: linux-mm

On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences,
Upload rate for 2 linear surfaces:  8134MiB/s -> 8154MiB/s
Upload rate for 2 tiled surfaces:   8625MiB/s -> 8632MiB/s
Upload rate for 4 linear surfaces:  8127MiB/s -> 8134MiB/s
Upload rate for 4 tiled surfaces:   8602MiB/s -> 8629MiB/s
Upload rate for 8 linear surfaces:  8124MiB/s -> 8137MiB/s
Upload rate for 8 tiled surfaces:   8603MiB/s -> 8624MiB/s
Upload rate for 16 linear surfaces: 8123MiB/s -> 8128MiB/s
Upload rate for 16 tiled surfaces:  8606MiB/s -> 8618MiB/s
Upload rate for 32 linear surfaces: 8121MiB/s -> 8128MiB/s
Upload rate for 32 tiled surfaces:  8605MiB/s -> 8614MiB/s
Upload rate for 64 linear surfaces: 8121MiB/s -> 8127MiB/s
Upload rate for 64 tiled surfaces:  3017MiB/s -> 5127MiB/s

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Testcase: igt/gem_fence_upload/performance
Testcase: igt/gem_mmap_gtt
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
Cc: linux-mm@kvack.org
---
 drivers/gpu/drm/i915/i915_gem.c | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c313cb2b641b..e6246634b419 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1565,22 +1565,23 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	pfn = dev_priv->gtt.mappable_base + i915_gem_obj_ggtt_offset(obj);
 	pfn >>= PAGE_SHIFT;
 
-	if (!obj->fault_mappable) {
-		int i;
+	ret = remap_pfn_range(vma, vma->vm_start,
+			      pfn, vma->vm_end - vma->vm_start,
+			      vma->vm_page_prot);
+	if (ret) {
+		/* After passing the sanity checks on remap_pfn_range(), we may
+		 * abort whilst updating the pagetables due to ENOMEM and leave
+		 * the tables in an inconsistent state. Reset them all now.
+		 * However, we do not want to undo the work of another thread
+		 * that beat us to prefaulting the PTEs.
+		 */
+		if (ret != -EBUSY)
+			zap_vma_ptes(vma, vma->vm_start, vma->vm_end - vma->vm_start);
+		goto unpin;
+	}
 
-		for (i = 0; i < obj->base.size >> PAGE_SHIFT; i++) {
-			ret = vm_insert_pfn(vma,
-					    (unsigned long)vma->vm_start + i * PAGE_SIZE,
-					    pfn + i);
-			if (ret)
-				break;
-		}
+	obj->fault_mappable = true;
 
-		obj->fault_mappable = true;
-	} else
-		ret = vm_insert_pfn(vma,
-				    (unsigned long)vmf->virtual_address,
-				    pfn + page_offset);
 unpin:
 	i915_gem_object_ggtt_unpin(obj);
 unlock:
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-06-13 16:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-10 11:14 [PATCH 1/2] drm/i915: Prefault the entire object on first page fault Chris Wilson
2014-06-10 11:14 ` [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass Chris Wilson
2014-06-11 21:17   ` Volkin, Bradley D
2014-06-12  7:15     ` Daniel Vetter
2014-06-12  7:20     ` Chris Wilson
2014-06-12 16:15     ` Daniel Vetter
2014-06-11 20:41 ` [PATCH 1/2] drm/i915: Prefault the entire object on first page fault Volkin, Bradley D
2014-06-12  7:21   ` Chris Wilson
2014-06-13 16:26 [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range() Chris Wilson
2014-06-13 16:26 ` [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.