All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user()
@ 2016-10-17 14:10 Chris Wilson
  2016-10-17 17:28 ` ✗ Fi.CI.BAT: failure for " Patchwork
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Chris Wilson @ 2016-10-17 14:10 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

When handling execbuf relocations, we play a delicate dance with
pagefault. We first try to access the user pages underneath our
struct_mutex. However, if those pages were inside a GEM object, we may
trigger a pagefault and deadlock as i915_gem_fault() tries to
recursively acquire struct_mutex. Instead, we choose to disable
pagefaulting around the copy_from_user whilst inside the struct_mutex
and handle the EFAULT by falling back to a copy outside the
struct_mutex.

We however presumed that disabling pagefaults would be expensive. It is
just an operation on the local current task. Cheap enough that we can
restrict the disable/enable to the critical section around the copy, and
so avoid having to handle the atomic sections within the relocation
handling itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 64 +++++++++++++-----------------
 1 file changed, 28 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 1d02e74ce62d..22342ad0e07f 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -551,20 +551,6 @@ repeat:
 	return 0;
 }
 
-static bool object_is_idle(struct drm_i915_gem_object *obj)
-{
-	unsigned long active = i915_gem_object_get_active(obj);
-	int idx;
-
-	for_each_active(active, idx) {
-		if (!i915_gem_active_is_idle(&obj->last_read[idx],
-					     &obj->base.dev->struct_mutex))
-			return false;
-	}
-
-	return true;
-}
-
 static int
 i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 				   struct eb_vmas *eb,
@@ -648,10 +634,6 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 		return -EINVAL;
 	}
 
-	/* We can't wait for rendering with pagefaults disabled */
-	if (pagefault_disabled() && !object_is_idle(obj))
-		return -EFAULT;
-
 	ret = relocate_entry(obj, reloc, cache, target_offset);
 	if (ret)
 		return ret;
@@ -678,12 +660,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
 	remain = entry->relocation_count;
 	while (remain) {
 		struct drm_i915_gem_relocation_entry *r = stack_reloc;
-		int count = remain;
-		if (count > ARRAY_SIZE(stack_reloc))
-			count = ARRAY_SIZE(stack_reloc);
+		unsigned long unwritten;
+		unsigned int count;
+
+		count = min_t(unsigned int, remain, ARRAY_SIZE(stack_reloc));
 		remain -= count;
 
-		if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
+		/* This is the fast path and we cannot handle a pagefault
+		 * whilst holding the struct mutex lest the user pass in the
+		 * relocations contained within a mmaped bo. For in such a case
+		 * we, the page fault handler would call i915_gem_fault() and
+		 * we would try to acquire the struct mutex again. Obviously
+		 * this is bad and so lockdep complains vehemently.
+		 */
+		pagefault_disable();
+		unwritten = __copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]));
+		pagefault_enable();
+		if (unwritten) {
 			ret = -EFAULT;
 			goto out;
 		}
@@ -695,11 +688,19 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
 			if (ret)
 				goto out;
 
-			if (r->presumed_offset != offset &&
-			    __put_user(r->presumed_offset,
-				       &user_relocs->presumed_offset)) {
-				ret = -EFAULT;
-				goto out;
+			if (r->presumed_offset != offset) {
+				/* Copying back to the user is allowed to fail.
+				 * The information passed back is a hint as
+				 * to the final location. If the copy_to_user
+				 * fails after a successful copy_from_user,
+				 * it must be a readonly location and so
+				 * we presume the user knows what they are
+				 * doing!
+				 */
+				pagefault_disable();
+				__put_user(r->presumed_offset,
+					   &user_relocs->presumed_offset);
+				pagefault_enable();
 			}
 
 			user_relocs++;
@@ -739,20 +740,11 @@ i915_gem_execbuffer_relocate(struct eb_vmas *eb)
 	struct i915_vma *vma;
 	int ret = 0;
 
-	/* This is the fast path and we cannot handle a pagefault whilst
-	 * holding the struct mutex lest the user pass in the relocations
-	 * contained within a mmaped bo. For in such a case we, the page
-	 * fault handler would call i915_gem_fault() and we would try to
-	 * acquire the struct mutex again. Obviously this is bad and so
-	 * lockdep complains vehemently.
-	 */
-	pagefault_disable();
 	list_for_each_entry(vma, &eb->vmas, exec_list) {
 		ret = i915_gem_execbuffer_relocate_vma(vma, eb);
 		if (ret)
 			break;
 	}
-	pagefault_enable();
 
 	return ret;
 }
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* ✗ Fi.CI.BAT: failure for drm/i915: Restrict pagefault disabling to just around copy_from_user()
  2016-10-17 14:10 [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user() Chris Wilson
@ 2016-10-17 17:28 ` Patchwork
  2016-10-18  8:01 ` [PATCH] " Tvrtko Ursulin
  2016-10-18  9:35 ` ✗ Fi.CI.BAT: failure for drm/i915: Restrict pagefault disabling to just around copy_from_user() (rev2) Patchwork
  2 siblings, 0 replies; 12+ messages in thread
From: Patchwork @ 2016-10-17 17:28 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Restrict pagefault disabling to just around copy_from_user()
URL   : https://patchwork.freedesktop.org/series/13880/
State : failure

== Summary ==

Series 13880v1 drm/i915: Restrict pagefault disabling to just around copy_from_user()
https://patchwork.freedesktop.org/api/1.0/series/13880/revisions/1/mbox/

Test gem_exec_suspend:
        Subgroup basic-s3:
                incomplete -> DMESG-WARN (fi-byt-n2820)
                dmesg-warn -> INCOMPLETE (fi-byt-j1900)
Test kms_force_connector_basic:
        Subgroup force-load-detect:
                incomplete -> PASS       (fi-ivb-3520m)
                pass       -> INCOMPLETE (fi-ivb-3770)
Test kms_pipe_crc_basic:
        Subgroup suspend-read-crc-pipe-b:
                pass       -> INCOMPLETE (fi-snb-2600)
Test vgem_basic:
        Subgroup unload:
                pass       -> SKIP       (fi-bdw-5557u)
                skip       -> PASS       (fi-hsw-4770r)

fi-bdw-5557u     total:246  pass:230  dwarn:0   dfail:0   fail:0   skip:16 
fi-bsw-n3050     total:246  pass:203  dwarn:1   dfail:0   fail:0   skip:42 
fi-bxt-t5700     total:246  pass:216  dwarn:0   dfail:0   fail:0   skip:30 
fi-byt-j1900     total:76   pass:62   dwarn:0   dfail:0   fail:1   skip:12 
fi-byt-n2820     total:246  pass:209  dwarn:1   dfail:0   fail:1   skip:35 
fi-hsw-4770      total:246  pass:224  dwarn:0   dfail:0   fail:0   skip:22 
fi-hsw-4770r     total:246  pass:223  dwarn:1   dfail:0   fail:0   skip:22 
fi-ilk-650       total:246  pass:181  dwarn:0   dfail:0   fail:5   skip:60 
fi-ivb-3520m     total:246  pass:221  dwarn:0   dfail:0   fail:0   skip:25 
fi-ivb-3770      total:186  pass:164  dwarn:0   dfail:0   fail:0   skip:21 
fi-skl-6700hq    total:246  pass:223  dwarn:0   dfail:0   fail:0   skip:23 
fi-skl-6700k     total:246  pass:221  dwarn:1   dfail:0   fail:0   skip:24 
fi-snb-2600      total:209  pass:176  dwarn:0   dfail:0   fail:0   skip:32 
fi-kbl-7200u failed to collect. IGT log at Patchwork_2738/fi-kbl-7200u/igt.log
fi-skl-6260u failed to connect after reboot
fi-skl-6770hq failed to connect after reboot

Results at /archive/results/CI_IGT_test/Patchwork_2738/

7ec75289774dec48c46c37271fb334b7caed3d32 drm-intel-nightly: 2016y-10m-17d-14h-07m-32s UTC integration manifest
bde1775 drm/i915: Restrict pagefault disabling to just around copy_from_user()

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user()
  2016-10-17 14:10 [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user() Chris Wilson
  2016-10-17 17:28 ` ✗ Fi.CI.BAT: failure for " Patchwork
@ 2016-10-18  8:01 ` Tvrtko Ursulin
  2016-10-18  8:17   ` Chris Wilson
  2016-10-18  9:35 ` ✗ Fi.CI.BAT: failure for drm/i915: Restrict pagefault disabling to just around copy_from_user() (rev2) Patchwork
  2 siblings, 1 reply; 12+ messages in thread
From: Tvrtko Ursulin @ 2016-10-18  8:01 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Daniel Vetter


On 17/10/2016 15:10, Chris Wilson wrote:
> When handling execbuf relocations, we play a delicate dance with
> pagefault. We first try to access the user pages underneath our
> struct_mutex. However, if those pages were inside a GEM object, we may
> trigger a pagefault and deadlock as i915_gem_fault() tries to
> recursively acquire struct_mutex. Instead, we choose to disable
> pagefaulting around the copy_from_user whilst inside the struct_mutex
> and handle the EFAULT by falling back to a copy outside the
> struct_mutex.
>
> We however presumed that disabling pagefaults would be expensive. It is
> just an operation on the local current task. Cheap enough that we can
> restrict the disable/enable to the critical section around the copy, and
> so avoid having to handle the atomic sections within the relocation
> handling itself.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 64 +++++++++++++-----------------
>   1 file changed, 28 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 1d02e74ce62d..22342ad0e07f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -551,20 +551,6 @@ repeat:
>   	return 0;
>   }
>   
> -static bool object_is_idle(struct drm_i915_gem_object *obj)
> -{
> -	unsigned long active = i915_gem_object_get_active(obj);
> -	int idx;
> -
> -	for_each_active(active, idx) {
> -		if (!i915_gem_active_is_idle(&obj->last_read[idx],
> -					     &obj->base.dev->struct_mutex))
> -			return false;
> -	}
> -
> -	return true;
> -}
> -
>   static int
>   i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>   				   struct eb_vmas *eb,
> @@ -648,10 +634,6 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>   		return -EINVAL;
>   	}
>   
> -	/* We can't wait for rendering with pagefaults disabled */
> -	if (pagefault_disabled() && !object_is_idle(obj))
> -		return -EFAULT;
> -
>   	ret = relocate_entry(obj, reloc, cache, target_offset);
>   	if (ret)
>   		return ret;
> @@ -678,12 +660,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
>   	remain = entry->relocation_count;
>   	while (remain) {
>   		struct drm_i915_gem_relocation_entry *r = stack_reloc;
> -		int count = remain;
> -		if (count > ARRAY_SIZE(stack_reloc))
> -			count = ARRAY_SIZE(stack_reloc);
> +		unsigned long unwritten;
> +		unsigned int count;
> +
> +		count = min_t(unsigned int, remain, ARRAY_SIZE(stack_reloc));
>   		remain -= count;
>   
> -		if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
> +		/* This is the fast path and we cannot handle a pagefault
> +		 * whilst holding the struct mutex lest the user pass in the
> +		 * relocations contained within a mmaped bo. For in such a case
> +		 * we, the page fault handler would call i915_gem_fault() and
> +		 * we would try to acquire the struct mutex again. Obviously
> +		 * this is bad and so lockdep complains vehemently.
> +		 */
> +		pagefault_disable();
> +		unwritten = __copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]));
> +		pagefault_enable();
> +		if (unwritten) {
>   			ret = -EFAULT;
>   			goto out;
>   		}
> @@ -695,11 +688,19 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
>   			if (ret)
>   				goto out;
>   
> -			if (r->presumed_offset != offset &&
> -			    __put_user(r->presumed_offset,
> -				       &user_relocs->presumed_offset)) {
> -				ret = -EFAULT;
> -				goto out;
> +			if (r->presumed_offset != offset) {
> +				/* Copying back to the user is allowed to fail.
> +				 * The information passed back is a hint as
> +				 * to the final location. If the copy_to_user
> +				 * fails after a successful copy_from_user,
> +				 * it must be a readonly location and so
> +				 * we presume the user knows what they are
> +				 * doing!
> +				 */
> +				pagefault_disable();
> +				__put_user(r->presumed_offset,
> +					   &user_relocs->presumed_offset);
> +				pagefault_enable();

Why is a good idea to ignore potential errors here?

>   			}
>   
>   			user_relocs++;
> @@ -739,20 +740,11 @@ i915_gem_execbuffer_relocate(struct eb_vmas *eb)
>   	struct i915_vma *vma;
>   	int ret = 0;
>   
> -	/* This is the fast path and we cannot handle a pagefault whilst
> -	 * holding the struct mutex lest the user pass in the relocations
> -	 * contained within a mmaped bo. For in such a case we, the page
> -	 * fault handler would call i915_gem_fault() and we would try to
> -	 * acquire the struct mutex again. Obviously this is bad and so
> -	 * lockdep complains vehemently.
> -	 */
> -	pagefault_disable();
>   	list_for_each_entry(vma, &eb->vmas, exec_list) {
>   		ret = i915_gem_execbuffer_relocate_vma(vma, eb);
>   		if (ret)
>   			break;
>   	}
> -	pagefault_enable();
>   
>   	return ret;
>   }

Otherwise LGTM.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user()
  2016-10-18  8:01 ` [PATCH] " Tvrtko Ursulin
@ 2016-10-18  8:17   ` Chris Wilson
  2016-10-18  8:22     ` Tvrtko Ursulin
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2016-10-18  8:17 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, intel-gfx

On Tue, Oct 18, 2016 at 09:01:30AM +0100, Tvrtko Ursulin wrote:
> 
> On 17/10/2016 15:10, Chris Wilson wrote:
> >When handling execbuf relocations, we play a delicate dance with
> >pagefault. We first try to access the user pages underneath our
> >struct_mutex. However, if those pages were inside a GEM object, we may
> >trigger a pagefault and deadlock as i915_gem_fault() tries to
> >recursively acquire struct_mutex. Instead, we choose to disable
> >pagefaulting around the copy_from_user whilst inside the struct_mutex
> >and handle the EFAULT by falling back to a copy outside the
> >struct_mutex.
> >
> >We however presumed that disabling pagefaults would be expensive. It is
> >just an operation on the local current task. Cheap enough that we can
> >restrict the disable/enable to the critical section around the copy, and
> >so avoid having to handle the atomic sections within the relocation
> >handling itself.
> >
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> >Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> >---
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 64 +++++++++++++-----------------
> >  1 file changed, 28 insertions(+), 36 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >index 1d02e74ce62d..22342ad0e07f 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >@@ -551,20 +551,6 @@ repeat:
> >  	return 0;
> >  }
> >-static bool object_is_idle(struct drm_i915_gem_object *obj)
> >-{
> >-	unsigned long active = i915_gem_object_get_active(obj);
> >-	int idx;
> >-
> >-	for_each_active(active, idx) {
> >-		if (!i915_gem_active_is_idle(&obj->last_read[idx],
> >-					     &obj->base.dev->struct_mutex))
> >-			return false;
> >-	}
> >-
> >-	return true;
> >-}
> >-
> >  static int
> >  i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
> >  				   struct eb_vmas *eb,
> >@@ -648,10 +634,6 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
> >  		return -EINVAL;
> >  	}
> >-	/* We can't wait for rendering with pagefaults disabled */
> >-	if (pagefault_disabled() && !object_is_idle(obj))
> >-		return -EFAULT;
> >-
> >  	ret = relocate_entry(obj, reloc, cache, target_offset);
> >  	if (ret)
> >  		return ret;
> >@@ -678,12 +660,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
> >  	remain = entry->relocation_count;
> >  	while (remain) {
> >  		struct drm_i915_gem_relocation_entry *r = stack_reloc;
> >-		int count = remain;
> >-		if (count > ARRAY_SIZE(stack_reloc))
> >-			count = ARRAY_SIZE(stack_reloc);
> >+		unsigned long unwritten;
> >+		unsigned int count;
> >+
> >+		count = min_t(unsigned int, remain, ARRAY_SIZE(stack_reloc));
> >  		remain -= count;
> >-		if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
> >+		/* This is the fast path and we cannot handle a pagefault
> >+		 * whilst holding the struct mutex lest the user pass in the
> >+		 * relocations contained within a mmaped bo. For in such a case
> >+		 * we, the page fault handler would call i915_gem_fault() and
> >+		 * we would try to acquire the struct mutex again. Obviously
> >+		 * this is bad and so lockdep complains vehemently.
> >+		 */
> >+		pagefault_disable();
> >+		unwritten = __copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]));
> >+		pagefault_enable();
> >+		if (unwritten) {
> >  			ret = -EFAULT;
> >  			goto out;
> >  		}
> >@@ -695,11 +688,19 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
> >  			if (ret)
> >  				goto out;
> >-			if (r->presumed_offset != offset &&
> >-			    __put_user(r->presumed_offset,
> >-				       &user_relocs->presumed_offset)) {
> >-				ret = -EFAULT;
> >-				goto out;
> >+			if (r->presumed_offset != offset) {
> >+				/* Copying back to the user is allowed to fail.
> >+				 * The information passed back is a hint as
> >+				 * to the final location. If the copy_to_user
> >+				 * fails after a successful copy_from_user,
> >+				 * it must be a readonly location and so
> >+				 * we presume the user knows what they are
> >+				 * doing!
> >+				 */
> >+				pagefault_disable();
> >+				__put_user(r->presumed_offset,
> >+					   &user_relocs->presumed_offset);
> >+				pagefault_enable();
> 
> Why is a good idea to ignore potential errors here?

Wrong question: why did we think it a good idea to ignore success here?

(a) it is safe to do so, and I can legitimately setup userspace to use
this
(b) reporting an error after we have committed the change is broken
anyway.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user()
  2016-10-18  8:17   ` Chris Wilson
@ 2016-10-18  8:22     ` Tvrtko Ursulin
  2016-10-18  8:38       ` Chris Wilson
  0 siblings, 1 reply; 12+ messages in thread
From: Tvrtko Ursulin @ 2016-10-18  8:22 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Daniel Vetter


On 18/10/2016 09:17, Chris Wilson wrote:
> On Tue, Oct 18, 2016 at 09:01:30AM +0100, Tvrtko Ursulin wrote:
>> On 17/10/2016 15:10, Chris Wilson wrote:
>>> When handling execbuf relocations, we play a delicate dance with
>>> pagefault. We first try to access the user pages underneath our
>>> struct_mutex. However, if those pages were inside a GEM object, we may
>>> trigger a pagefault and deadlock as i915_gem_fault() tries to
>>> recursively acquire struct_mutex. Instead, we choose to disable
>>> pagefaulting around the copy_from_user whilst inside the struct_mutex
>>> and handle the EFAULT by falling back to a copy outside the
>>> struct_mutex.
>>>
>>> We however presumed that disabling pagefaults would be expensive. It is
>>> just an operation on the local current task. Cheap enough that we can
>>> restrict the disable/enable to the critical section around the copy, and
>>> so avoid having to handle the atomic sections within the relocation
>>> handling itself.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 64 +++++++++++++-----------------
>>>   1 file changed, 28 insertions(+), 36 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>> index 1d02e74ce62d..22342ad0e07f 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>> @@ -551,20 +551,6 @@ repeat:
>>>   	return 0;
>>>   }
>>> -static bool object_is_idle(struct drm_i915_gem_object *obj)
>>> -{
>>> -	unsigned long active = i915_gem_object_get_active(obj);
>>> -	int idx;
>>> -
>>> -	for_each_active(active, idx) {
>>> -		if (!i915_gem_active_is_idle(&obj->last_read[idx],
>>> -					     &obj->base.dev->struct_mutex))
>>> -			return false;
>>> -	}
>>> -
>>> -	return true;
>>> -}
>>> -
>>>   static int
>>>   i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>>>   				   struct eb_vmas *eb,
>>> @@ -648,10 +634,6 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>>>   		return -EINVAL;
>>>   	}
>>> -	/* We can't wait for rendering with pagefaults disabled */
>>> -	if (pagefault_disabled() && !object_is_idle(obj))
>>> -		return -EFAULT;
>>> -
>>>   	ret = relocate_entry(obj, reloc, cache, target_offset);
>>>   	if (ret)
>>>   		return ret;
>>> @@ -678,12 +660,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
>>>   	remain = entry->relocation_count;
>>>   	while (remain) {
>>>   		struct drm_i915_gem_relocation_entry *r = stack_reloc;
>>> -		int count = remain;
>>> -		if (count > ARRAY_SIZE(stack_reloc))
>>> -			count = ARRAY_SIZE(stack_reloc);
>>> +		unsigned long unwritten;
>>> +		unsigned int count;
>>> +
>>> +		count = min_t(unsigned int, remain, ARRAY_SIZE(stack_reloc));
>>>   		remain -= count;
>>> -		if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
>>> +		/* This is the fast path and we cannot handle a pagefault
>>> +		 * whilst holding the struct mutex lest the user pass in the
>>> +		 * relocations contained within a mmaped bo. For in such a case
>>> +		 * we, the page fault handler would call i915_gem_fault() and
>>> +		 * we would try to acquire the struct mutex again. Obviously
>>> +		 * this is bad and so lockdep complains vehemently.
>>> +		 */
>>> +		pagefault_disable();
>>> +		unwritten = __copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]));
>>> +		pagefault_enable();
>>> +		if (unwritten) {
>>>   			ret = -EFAULT;
>>>   			goto out;
>>>   		}
>>> @@ -695,11 +688,19 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
>>>   			if (ret)
>>>   				goto out;
>>> -			if (r->presumed_offset != offset &&
>>> -			    __put_user(r->presumed_offset,
>>> -				       &user_relocs->presumed_offset)) {
>>> -				ret = -EFAULT;
>>> -				goto out;
>>> +			if (r->presumed_offset != offset) {
>>> +				/* Copying back to the user is allowed to fail.
>>> +				 * The information passed back is a hint as
>>> +				 * to the final location. If the copy_to_user
>>> +				 * fails after a successful copy_from_user,
>>> +				 * it must be a readonly location and so
>>> +				 * we presume the user knows what they are
>>> +				 * doing!
>>> +				 */
>>> +				pagefault_disable();
>>> +				__put_user(r->presumed_offset,
>>> +					   &user_relocs->presumed_offset);
>>> +				pagefault_enable();
>> Why is a good idea to ignore potential errors here?
> Wrong question: why did we think it a good idea to ignore success here?

How were we ignoring errors, I don't follow?

> (a) it is safe to do so, and I can legitimately setup userspace to use
> this

How what where? :)

> (b) reporting an error after we have committed the change is broken
> anyway.

You mean in the half way through cases?

Regards,

Tvrkto

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user()
  2016-10-18  8:22     ` Tvrtko Ursulin
@ 2016-10-18  8:38       ` Chris Wilson
  2016-10-18  8:54         ` Tvrtko Ursulin
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2016-10-18  8:38 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, intel-gfx

On Tue, Oct 18, 2016 at 09:22:56AM +0100, Tvrtko Ursulin wrote:
> 
> On 18/10/2016 09:17, Chris Wilson wrote:
> >On Tue, Oct 18, 2016 at 09:01:30AM +0100, Tvrtko Ursulin wrote:
> >>On 17/10/2016 15:10, Chris Wilson wrote:
> >>>When handling execbuf relocations, we play a delicate dance with
> >>>pagefault. We first try to access the user pages underneath our
> >>>struct_mutex. However, if those pages were inside a GEM object, we may
> >>>trigger a pagefault and deadlock as i915_gem_fault() tries to
> >>>recursively acquire struct_mutex. Instead, we choose to disable
> >>>pagefaulting around the copy_from_user whilst inside the struct_mutex
> >>>and handle the EFAULT by falling back to a copy outside the
> >>>struct_mutex.
> >>>
> >>>We however presumed that disabling pagefaults would be expensive. It is
> >>>just an operation on the local current task. Cheap enough that we can
> >>>restrict the disable/enable to the critical section around the copy, and
> >>>so avoid having to handle the atomic sections within the relocation
> >>>handling itself.
> >>>
> >>>Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> >>>Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> >>>---
> >>>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 64 +++++++++++++-----------------
> >>>  1 file changed, 28 insertions(+), 36 deletions(-)
> >>>
> >>>diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>>index 1d02e74ce62d..22342ad0e07f 100644
> >>>--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>>+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>>@@ -551,20 +551,6 @@ repeat:
> >>>  	return 0;
> >>>  }
> >>>-static bool object_is_idle(struct drm_i915_gem_object *obj)
> >>>-{
> >>>-	unsigned long active = i915_gem_object_get_active(obj);
> >>>-	int idx;
> >>>-
> >>>-	for_each_active(active, idx) {
> >>>-		if (!i915_gem_active_is_idle(&obj->last_read[idx],
> >>>-					     &obj->base.dev->struct_mutex))
> >>>-			return false;
> >>>-	}
> >>>-
> >>>-	return true;
> >>>-}
> >>>-
> >>>  static int
> >>>  i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
> >>>  				   struct eb_vmas *eb,
> >>>@@ -648,10 +634,6 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
> >>>  		return -EINVAL;
> >>>  	}
> >>>-	/* We can't wait for rendering with pagefaults disabled */
> >>>-	if (pagefault_disabled() && !object_is_idle(obj))
> >>>-		return -EFAULT;
> >>>-
> >>>  	ret = relocate_entry(obj, reloc, cache, target_offset);
> >>>  	if (ret)
> >>>  		return ret;
> >>>@@ -678,12 +660,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
> >>>  	remain = entry->relocation_count;
> >>>  	while (remain) {
> >>>  		struct drm_i915_gem_relocation_entry *r = stack_reloc;
> >>>-		int count = remain;
> >>>-		if (count > ARRAY_SIZE(stack_reloc))
> >>>-			count = ARRAY_SIZE(stack_reloc);
> >>>+		unsigned long unwritten;
> >>>+		unsigned int count;
> >>>+
> >>>+		count = min_t(unsigned int, remain, ARRAY_SIZE(stack_reloc));
> >>>  		remain -= count;
> >>>-		if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
> >>>+		/* This is the fast path and we cannot handle a pagefault
> >>>+		 * whilst holding the struct mutex lest the user pass in the
> >>>+		 * relocations contained within a mmaped bo. For in such a case
> >>>+		 * we, the page fault handler would call i915_gem_fault() and
> >>>+		 * we would try to acquire the struct mutex again. Obviously
> >>>+		 * this is bad and so lockdep complains vehemently.
> >>>+		 */
> >>>+		pagefault_disable();
> >>>+		unwritten = __copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]));
> >>>+		pagefault_enable();
> >>>+		if (unwritten) {
> >>>  			ret = -EFAULT;
> >>>  			goto out;
> >>>  		}
> >>>@@ -695,11 +688,19 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
> >>>  			if (ret)
> >>>  				goto out;
> >>>-			if (r->presumed_offset != offset &&
> >>>-			    __put_user(r->presumed_offset,
> >>>-				       &user_relocs->presumed_offset)) {
> >>>-				ret = -EFAULT;
> >>>-				goto out;
> >>>+			if (r->presumed_offset != offset) {
> >>>+				/* Copying back to the user is allowed to fail.
> >>>+				 * The information passed back is a hint as
> >>>+				 * to the final location. If the copy_to_user
> >>>+				 * fails after a successful copy_from_user,
> >>>+				 * it must be a readonly location and so
> >>>+				 * we presume the user knows what they are
> >>>+				 * doing!
> >>>+				 */
> >>>+				pagefault_disable();
> >>>+				__put_user(r->presumed_offset,
> >>>+					   &user_relocs->presumed_offset);
> >>>+				pagefault_enable();
> >>Why is a good idea to ignore potential errors here?
> >Wrong question: why did we think it a good idea to ignore success here?
> 
> How were we ignoring errors, I don't follow?
> 
> >(a) it is safe to do so, and I can legitimately setup userspace to use
> >this
> 
> How what where? :)

igt, whereelse, has a test case to check we can execute from read only
memory.
 
> >(b) reporting an error after we have committed the change is broken
> >anyway.
> 
> You mean in the half way through cases?

We've already made the user/gpu visible change. Failing here is broken.
(User doesn't notice the change, presumed_offset doesn't match actual,
kernel is oblivious to mismatch, GPU hangs.)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user()
  2016-10-18  8:38       ` Chris Wilson
@ 2016-10-18  8:54         ` Tvrtko Ursulin
  2016-10-18  9:00           ` Chris Wilson
  0 siblings, 1 reply; 12+ messages in thread
From: Tvrtko Ursulin @ 2016-10-18  8:54 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Daniel Vetter


On 18/10/2016 09:38, Chris Wilson wrote:
> On Tue, Oct 18, 2016 at 09:22:56AM +0100, Tvrtko Ursulin wrote:
>> On 18/10/2016 09:17, Chris Wilson wrote:
>>> On Tue, Oct 18, 2016 at 09:01:30AM +0100, Tvrtko Ursulin wrote:
>>>> On 17/10/2016 15:10, Chris Wilson wrote:
>>>>> When handling execbuf relocations, we play a delicate dance with
>>>>> pagefault. We first try to access the user pages underneath our
>>>>> struct_mutex. However, if those pages were inside a GEM object, we may
>>>>> trigger a pagefault and deadlock as i915_gem_fault() tries to
>>>>> recursively acquire struct_mutex. Instead, we choose to disable
>>>>> pagefaulting around the copy_from_user whilst inside the struct_mutex
>>>>> and handle the EFAULT by falling back to a copy outside the
>>>>> struct_mutex.
>>>>>
>>>>> We however presumed that disabling pagefaults would be expensive. It is
>>>>> just an operation on the local current task. Cheap enough that we can
>>>>> restrict the disable/enable to the critical section around the copy, and
>>>>> so avoid having to handle the atomic sections within the relocation
>>>>> handling itself.
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>>>> ---
>>>>>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 64 +++++++++++++-----------------
>>>>>   1 file changed, 28 insertions(+), 36 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>>>> index 1d02e74ce62d..22342ad0e07f 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>>>> @@ -551,20 +551,6 @@ repeat:
>>>>>   	return 0;
>>>>>   }
>>>>> -static bool object_is_idle(struct drm_i915_gem_object *obj)
>>>>> -{
>>>>> -	unsigned long active = i915_gem_object_get_active(obj);
>>>>> -	int idx;
>>>>> -
>>>>> -	for_each_active(active, idx) {
>>>>> -		if (!i915_gem_active_is_idle(&obj->last_read[idx],
>>>>> -					     &obj->base.dev->struct_mutex))
>>>>> -			return false;
>>>>> -	}
>>>>> -
>>>>> -	return true;
>>>>> -}
>>>>> -
>>>>>   static int
>>>>>   i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>>>>>   				   struct eb_vmas *eb,
>>>>> @@ -648,10 +634,6 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>>>>>   		return -EINVAL;
>>>>>   	}
>>>>> -	/* We can't wait for rendering with pagefaults disabled */
>>>>> -	if (pagefault_disabled() && !object_is_idle(obj))
>>>>> -		return -EFAULT;
>>>>> -
>>>>>   	ret = relocate_entry(obj, reloc, cache, target_offset);
>>>>>   	if (ret)
>>>>>   		return ret;
>>>>> @@ -678,12 +660,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
>>>>>   	remain = entry->relocation_count;
>>>>>   	while (remain) {
>>>>>   		struct drm_i915_gem_relocation_entry *r = stack_reloc;
>>>>> -		int count = remain;
>>>>> -		if (count > ARRAY_SIZE(stack_reloc))
>>>>> -			count = ARRAY_SIZE(stack_reloc);
>>>>> +		unsigned long unwritten;
>>>>> +		unsigned int count;
>>>>> +
>>>>> +		count = min_t(unsigned int, remain, ARRAY_SIZE(stack_reloc));
>>>>>   		remain -= count;
>>>>> -		if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
>>>>> +		/* This is the fast path and we cannot handle a pagefault
>>>>> +		 * whilst holding the struct mutex lest the user pass in the
>>>>> +		 * relocations contained within a mmaped bo. For in such a case
>>>>> +		 * we, the page fault handler would call i915_gem_fault() and
>>>>> +		 * we would try to acquire the struct mutex again. Obviously
>>>>> +		 * this is bad and so lockdep complains vehemently.
>>>>> +		 */
>>>>> +		pagefault_disable();
>>>>> +		unwritten = __copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]));
>>>>> +		pagefault_enable();
>>>>> +		if (unwritten) {
>>>>>   			ret = -EFAULT;
>>>>>   			goto out;
>>>>>   		}
>>>>> @@ -695,11 +688,19 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
>>>>>   			if (ret)
>>>>>   				goto out;
>>>>> -			if (r->presumed_offset != offset &&
>>>>> -			    __put_user(r->presumed_offset,
>>>>> -				       &user_relocs->presumed_offset)) {
>>>>> -				ret = -EFAULT;
>>>>> -				goto out;
>>>>> +			if (r->presumed_offset != offset) {
>>>>> +				/* Copying back to the user is allowed to fail.
>>>>> +				 * The information passed back is a hint as
>>>>> +				 * to the final location. If the copy_to_user
>>>>> +				 * fails after a successful copy_from_user,
>>>>> +				 * it must be a readonly location and so
>>>>> +				 * we presume the user knows what they are
>>>>> +				 * doing!
>>>>> +				 */
>>>>> +				pagefault_disable();
>>>>> +				__put_user(r->presumed_offset,
>>>>> +					   &user_relocs->presumed_offset);
>>>>> +				pagefault_enable();
>>>> Why is a good idea to ignore potential errors here?
>>> Wrong question: why did we think it a good idea to ignore success here?
>> How were we ignoring errors, I don't follow?
>>
>>> (a) it is safe to do so, and I can legitimately setup userspace to use
>>> this
>> How what where? :)
> igt, whereelse, has a test case to check we can execute from read only
> memory.

With relocations?

>   
>>> (b) reporting an error after we have committed the change is broken
>>> anyway.
>> You mean in the half way through cases?
> We've already made the user/gpu visible change. Failing here is broken.
> (User doesn't notice the change, presumed_offset doesn't match actual,
> kernel is oblivious to mismatch, GPU hangs.)

How is that? I see a error path propagating all the way up to failing 
the execbuf.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user()
  2016-10-18  8:54         ` Tvrtko Ursulin
@ 2016-10-18  9:00           ` Chris Wilson
  2016-10-18  9:07             ` [PATCH v2] " Chris Wilson
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2016-10-18  9:00 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, intel-gfx

On Tue, Oct 18, 2016 at 09:54:23AM +0100, Tvrtko Ursulin wrote:
> On 18/10/2016 09:38, Chris Wilson wrote:
> >>>(a) it is safe to do so, and I can legitimately setup userspace to use
> >>>this
> >>How what where? :)
> >igt, whereelse, has a test case to check we can execute from read only
> >memory.
> 
> With relocations?

Yes. The layout is advisory.
 
> >>>(b) reporting an error after we have committed the change is broken
> >>>anyway.
> >>You mean in the half way through cases?
> >We've already made the user/gpu visible change. Failing here is broken.
> >(User doesn't notice the change, presumed_offset doesn't match actual,
> >kernel is oblivious to mismatch, GPU hangs.)
> 
> How is that? I see a error path propagating all the way up to
> failing the execbuf.

The change has already been committed to the object, but the user is not
informed and so on the next pass everyone is oblivious that the value in
the buffer no longer matches (and so can pass through unscathed resulting
in a hang). The error handling itself is fragile.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2] drm/i915: Restrict pagefault disabling to just around copy_from_user()
  2016-10-18  9:00           ` Chris Wilson
@ 2016-10-18  9:07             ` Chris Wilson
  2016-10-18  9:11               ` Tvrtko Ursulin
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2016-10-18  9:07 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

When handling execbuf relocations, we play a delicate dance with
pagefault. We first try to access the user pages underneath our
struct_mutex. However, if those pages were inside a GEM object, we may
trigger a pagefault and deadlock as i915_gem_fault() tries to
recursively acquire struct_mutex. Instead, we choose to disable
pagefaulting around the copy_from_user whilst inside the struct_mutex
and handle the EFAULT by falling back to a copy outside the
struct_mutex.

We however presumed that disabling pagefaults would be expensive. It is
just an operation on the local current task. Cheap enough that we can
restrict the disable/enable to the critical section around the copy, and
so avoid having to handle the atomic sections within the relocation
handling itself.

v2: Just illustrate the broken error handling rather than argue why it
is safer to ignore it, for now.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 71 +++++++++++++++---------------
 1 file changed, 35 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 204536a3087e..e52affdcc125 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -551,20 +551,6 @@ relocate_entry(struct drm_i915_gem_object *obj,
 	return 0;
 }
 
-static bool object_is_idle(struct drm_i915_gem_object *obj)
-{
-	unsigned long active = i915_gem_object_get_active(obj);
-	int idx;
-
-	for_each_active(active, idx) {
-		if (!i915_gem_active_is_idle(&obj->last_read[idx],
-					     &obj->base.dev->struct_mutex))
-			return false;
-	}
-
-	return true;
-}
-
 static int
 i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 				   struct eb_vmas *eb,
@@ -648,10 +634,6 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 		return -EINVAL;
 	}
 
-	/* We can't wait for rendering with pagefaults disabled */
-	if (pagefault_disabled() && !object_is_idle(obj))
-		return -EFAULT;
-
 	ret = relocate_entry(obj, reloc, cache, target_offset);
 	if (ret)
 		return ret;
@@ -678,12 +660,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
 	remain = entry->relocation_count;
 	while (remain) {
 		struct drm_i915_gem_relocation_entry *r = stack_reloc;
-		int count = remain;
-		if (count > ARRAY_SIZE(stack_reloc))
-			count = ARRAY_SIZE(stack_reloc);
+		unsigned long unwritten;
+		unsigned int count;
+
+		count = min_t(unsigned int, remain, ARRAY_SIZE(stack_reloc));
 		remain -= count;
 
-		if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
+		/* This is the fast path and we cannot handle a pagefault
+		 * whilst holding the struct mutex lest the user pass in the
+		 * relocations contained within a mmaped bo. For in such a case
+		 * we, the page fault handler would call i915_gem_fault() and
+		 * we would try to acquire the struct mutex again. Obviously
+		 * this is bad and so lockdep complains vehemently.
+		 */
+		pagefault_disable();
+		unwritten = __copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]));
+		pagefault_enable();
+		if (unlikely(unwritten)) {
 			ret = -EFAULT;
 			goto out;
 		}
@@ -695,11 +688,26 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
 			if (ret)
 				goto out;
 
-			if (r->presumed_offset != offset &&
-			    __put_user(r->presumed_offset,
-				       &user_relocs->presumed_offset)) {
-				ret = -EFAULT;
-				goto out;
+			if (r->presumed_offset != offset) {
+				pagefault_disable();
+				unwritten = __put_user(r->presumed_offset,
+						       &user_relocs->presumed_offset);
+				pagefault_enable();
+				if (unlikely(unwritten)) {
+					/* Note that reporting an error now
+					 * leaves everything in an inconsistent
+					 * state as we have *already* changed
+					 * the relocation value inside the
+					 * object. As we have not changed the
+					 * reloc.presumed_offset or will not
+					 * change the execobject.offset, on the
+					 * call we may not rewrite the value
+					 * inside the object, leaving it
+					 * dangling and causing a GPU hang.
+					 */
+					ret = -EFAULT;
+					goto out;
+				}
 			}
 
 			user_relocs++;
@@ -739,20 +747,11 @@ i915_gem_execbuffer_relocate(struct eb_vmas *eb)
 	struct i915_vma *vma;
 	int ret = 0;
 
-	/* This is the fast path and we cannot handle a pagefault whilst
-	 * holding the struct mutex lest the user pass in the relocations
-	 * contained within a mmaped bo. For in such a case we, the page
-	 * fault handler would call i915_gem_fault() and we would try to
-	 * acquire the struct mutex again. Obviously this is bad and so
-	 * lockdep complains vehemently.
-	 */
-	pagefault_disable();
 	list_for_each_entry(vma, &eb->vmas, exec_list) {
 		ret = i915_gem_execbuffer_relocate_vma(vma, eb);
 		if (ret)
 			break;
 	}
-	pagefault_enable();
 
 	return ret;
 }
-- 
2.9.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] drm/i915: Restrict pagefault disabling to just around copy_from_user()
  2016-10-18  9:07             ` [PATCH v2] " Chris Wilson
@ 2016-10-18  9:11               ` Tvrtko Ursulin
  0 siblings, 0 replies; 12+ messages in thread
From: Tvrtko Ursulin @ 2016-10-18  9:11 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Daniel Vetter


On 18/10/2016 10:07, Chris Wilson wrote:
> When handling execbuf relocations, we play a delicate dance with
> pagefault. We first try to access the user pages underneath our
> struct_mutex. However, if those pages were inside a GEM object, we may
> trigger a pagefault and deadlock as i915_gem_fault() tries to
> recursively acquire struct_mutex. Instead, we choose to disable
> pagefaulting around the copy_from_user whilst inside the struct_mutex
> and handle the EFAULT by falling back to a copy outside the
> struct_mutex.
>
> We however presumed that disabling pagefaults would be expensive. It is
> just an operation on the local current task. Cheap enough that we can
> restrict the disable/enable to the critical section around the copy, and
> so avoid having to handle the atomic sections within the relocation
> handling itself.
>
> v2: Just illustrate the broken error handling rather than argue why it
> is safer to ignore it, for now.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 71 +++++++++++++++---------------
>   1 file changed, 35 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 204536a3087e..e52affdcc125 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -551,20 +551,6 @@ relocate_entry(struct drm_i915_gem_object *obj,
>   	return 0;
>   }
>   
> -static bool object_is_idle(struct drm_i915_gem_object *obj)
> -{
> -	unsigned long active = i915_gem_object_get_active(obj);
> -	int idx;
> -
> -	for_each_active(active, idx) {
> -		if (!i915_gem_active_is_idle(&obj->last_read[idx],
> -					     &obj->base.dev->struct_mutex))
> -			return false;
> -	}
> -
> -	return true;
> -}
> -
>   static int
>   i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>   				   struct eb_vmas *eb,
> @@ -648,10 +634,6 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>   		return -EINVAL;
>   	}
>   
> -	/* We can't wait for rendering with pagefaults disabled */
> -	if (pagefault_disabled() && !object_is_idle(obj))
> -		return -EFAULT;
> -
>   	ret = relocate_entry(obj, reloc, cache, target_offset);
>   	if (ret)
>   		return ret;
> @@ -678,12 +660,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
>   	remain = entry->relocation_count;
>   	while (remain) {
>   		struct drm_i915_gem_relocation_entry *r = stack_reloc;
> -		int count = remain;
> -		if (count > ARRAY_SIZE(stack_reloc))
> -			count = ARRAY_SIZE(stack_reloc);
> +		unsigned long unwritten;
> +		unsigned int count;
> +
> +		count = min_t(unsigned int, remain, ARRAY_SIZE(stack_reloc));
>   		remain -= count;
>   
> -		if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
> +		/* This is the fast path and we cannot handle a pagefault
> +		 * whilst holding the struct mutex lest the user pass in the
> +		 * relocations contained within a mmaped bo. For in such a case
> +		 * we, the page fault handler would call i915_gem_fault() and
> +		 * we would try to acquire the struct mutex again. Obviously
> +		 * this is bad and so lockdep complains vehemently.
> +		 */
> +		pagefault_disable();
> +		unwritten = __copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]));
> +		pagefault_enable();
> +		if (unlikely(unwritten)) {
>   			ret = -EFAULT;
>   			goto out;
>   		}
> @@ -695,11 +688,26 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
>   			if (ret)
>   				goto out;
>   
> -			if (r->presumed_offset != offset &&
> -			    __put_user(r->presumed_offset,
> -				       &user_relocs->presumed_offset)) {
> -				ret = -EFAULT;
> -				goto out;
> +			if (r->presumed_offset != offset) {
> +				pagefault_disable();
> +				unwritten = __put_user(r->presumed_offset,
> +						       &user_relocs->presumed_offset);
> +				pagefault_enable();
> +				if (unlikely(unwritten)) {
> +					/* Note that reporting an error now
> +					 * leaves everything in an inconsistent
> +					 * state as we have *already* changed
> +					 * the relocation value inside the
> +					 * object. As we have not changed the
> +					 * reloc.presumed_offset or will not
> +					 * change the execobject.offset, on the
> +					 * call we may not rewrite the value
> +					 * inside the object, leaving it
> +					 * dangling and causing a GPU hang.
> +					 */
> +					ret = -EFAULT;
> +					goto out;
> +				}
>   			}
>   
>   			user_relocs++;
> @@ -739,20 +747,11 @@ i915_gem_execbuffer_relocate(struct eb_vmas *eb)
>   	struct i915_vma *vma;
>   	int ret = 0;
>   
> -	/* This is the fast path and we cannot handle a pagefault whilst
> -	 * holding the struct mutex lest the user pass in the relocations
> -	 * contained within a mmaped bo. For in such a case we, the page
> -	 * fault handler would call i915_gem_fault() and we would try to
> -	 * acquire the struct mutex again. Obviously this is bad and so
> -	 * lockdep complains vehemently.
> -	 */
> -	pagefault_disable();
>   	list_for_each_entry(vma, &eb->vmas, exec_list) {
>   		ret = i915_gem_execbuffer_relocate_vma(vma, eb);
>   		if (ret)
>   			break;
>   	}
> -	pagefault_enable();
>   
>   	return ret;
>   }

Ok we can continue the educational discussion at some other time. :)

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* ✗ Fi.CI.BAT: failure for drm/i915: Restrict pagefault disabling to just around copy_from_user() (rev2)
  2016-10-17 14:10 [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user() Chris Wilson
  2016-10-17 17:28 ` ✗ Fi.CI.BAT: failure for " Patchwork
  2016-10-18  8:01 ` [PATCH] " Tvrtko Ursulin
@ 2016-10-18  9:35 ` Patchwork
  2016-10-18  9:54   ` Chris Wilson
  2 siblings, 1 reply; 12+ messages in thread
From: Patchwork @ 2016-10-18  9:35 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Restrict pagefault disabling to just around copy_from_user() (rev2)
URL   : https://patchwork.freedesktop.org/series/13880/
State : failure

== Summary ==

Series 13880v2 drm/i915: Restrict pagefault disabling to just around copy_from_user()
https://patchwork.freedesktop.org/api/1.0/series/13880/revisions/2/mbox/

Test gem_exec_suspend:
        Subgroup basic-s3:
                incomplete -> DMESG-WARN (fi-byt-n2820)
                dmesg-warn -> INCOMPLETE (fi-byt-j1900)
Test gem_ringfill:
        Subgroup basic-default-hang:
                pass       -> TIMEOUT    (fi-ivb-3770)
Test kms_force_connector_basic:
        Subgroup force-load-detect:
                incomplete -> PASS       (fi-ivb-3520m)
Test kms_pipe_crc_basic:
        Subgroup hang-read-crc-pipe-b:
                pass       -> INCOMPLETE (fi-ivb-3770)
        Subgroup suspend-read-crc-pipe-c:
                pass       -> FAIL       (fi-skl-6700k)
Test vgem_basic:
        Subgroup unload:
                skip       -> PASS       (fi-kbl-7200u)
                pass       -> SKIP       (fi-hsw-4770)
                skip       -> PASS       (fi-hsw-4770r)

fi-bdw-5557u     total:246  pass:231  dwarn:0   dfail:0   fail:0   skip:15 
fi-bsw-n3050     total:246  pass:203  dwarn:1   dfail:0   fail:0   skip:42 
fi-bxt-t5700     total:246  pass:216  dwarn:0   dfail:0   fail:0   skip:30 
fi-byt-j1900     total:76   pass:62   dwarn:0   dfail:0   fail:1   skip:12 
fi-byt-n2820     total:246  pass:209  dwarn:1   dfail:0   fail:1   skip:35 
fi-hsw-4770      total:246  pass:223  dwarn:0   dfail:0   fail:0   skip:23 
fi-hsw-4770r     total:246  pass:223  dwarn:1   dfail:0   fail:0   skip:22 
fi-ilk-650       total:246  pass:181  dwarn:0   dfail:0   fail:5   skip:60 
fi-ivb-3520m     total:246  pass:221  dwarn:0   dfail:0   fail:0   skip:25 
fi-ivb-3770      total:194  pass:171  dwarn:0   dfail:0   fail:0   skip:21 
fi-kbl-7200u     total:246  pass:222  dwarn:0   dfail:0   fail:0   skip:24 
fi-skl-6700hq    total:246  pass:223  dwarn:0   dfail:0   fail:0   skip:23 
fi-skl-6700k     total:246  pass:220  dwarn:1   dfail:0   fail:1   skip:24 
fi-snb-2520m     total:246  pass:210  dwarn:0   dfail:0   fail:0   skip:36 
fi-snb-2600      total:246  pass:209  dwarn:0   dfail:0   fail:0   skip:37 
fi-skl-6260u failed to connect after reboot
fi-skl-6770hq failed to connect after reboot

Results at /archive/results/CI_IGT_test/Patchwork_2741/

7ec75289774dec48c46c37271fb334b7caed3d32 drm-intel-nightly: 2016y-10m-17d-14h-07m-32s UTC integration manifest
0dbfd20 drm/i915: Restrict pagefault disabling to just around copy_from_user()

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ✗ Fi.CI.BAT: failure for drm/i915: Restrict pagefault disabling to just around copy_from_user() (rev2)
  2016-10-18  9:35 ` ✗ Fi.CI.BAT: failure for drm/i915: Restrict pagefault disabling to just around copy_from_user() (rev2) Patchwork
@ 2016-10-18  9:54   ` Chris Wilson
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2016-10-18  9:54 UTC (permalink / raw)
  To: intel-gfx

On Tue, Oct 18, 2016 at 09:35:12AM -0000, Patchwork wrote:
> == Series Details ==
> 
> Series: drm/i915: Restrict pagefault disabling to just around copy_from_user() (rev2)
> URL   : https://patchwork.freedesktop.org/series/13880/
> State : failure
> 
> == Summary ==
> 
> Series 13880v2 drm/i915: Restrict pagefault disabling to just around copy_from_user()
> https://patchwork.freedesktop.org/api/1.0/series/13880/revisions/2/mbox/
> 
> Test gem_exec_suspend:
>         Subgroup basic-s3:
>                 incomplete -> DMESG-WARN (fi-byt-n2820)
>                 dmesg-warn -> INCOMPLETE (fi-byt-j1900)

These are fluctuating results because we now are seeing bugs from ACPI.

> Test gem_ringfill:
>         Subgroup basic-default-hang:
>                 pass       -> TIMEOUT    (fi-ivb-3770)

The unrelated case of ivb not being able to recover from a hang. :|
Still looking for that locally on an i7-3720QM
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-10-18  9:54 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-17 14:10 [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user() Chris Wilson
2016-10-17 17:28 ` ✗ Fi.CI.BAT: failure for " Patchwork
2016-10-18  8:01 ` [PATCH] " Tvrtko Ursulin
2016-10-18  8:17   ` Chris Wilson
2016-10-18  8:22     ` Tvrtko Ursulin
2016-10-18  8:38       ` Chris Wilson
2016-10-18  8:54         ` Tvrtko Ursulin
2016-10-18  9:00           ` Chris Wilson
2016-10-18  9:07             ` [PATCH v2] " Chris Wilson
2016-10-18  9:11               ` Tvrtko Ursulin
2016-10-18  9:35 ` ✗ Fi.CI.BAT: failure for drm/i915: Restrict pagefault disabling to just around copy_from_user() (rev2) Patchwork
2016-10-18  9:54   ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.