* [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user()
@ 2016-10-17 14:10 Chris Wilson
2016-10-17 17:28 ` ✗ Fi.CI.BAT: failure for " Patchwork
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Chris Wilson @ 2016-10-17 14:10 UTC (permalink / raw)
To: intel-gfx; +Cc: Daniel Vetter
When handling execbuf relocations, we play a delicate dance with
pagefault. We first try to access the user pages underneath our
struct_mutex. However, if those pages were inside a GEM object, we may
trigger a pagefault and deadlock as i915_gem_fault() tries to
recursively acquire struct_mutex. Instead, we choose to disable
pagefaulting around the copy_from_user whilst inside the struct_mutex
and handle the EFAULT by falling back to a copy outside the
struct_mutex.
We however presumed that disabling pagefaults would be expensive. It is
just an operation on the local current task. Cheap enough that we can
restrict the disable/enable to the critical section around the copy, and
so avoid having to handle the atomic sections within the relocation
handling itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 64 +++++++++++++-----------------
1 file changed, 28 insertions(+), 36 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 1d02e74ce62d..22342ad0e07f 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -551,20 +551,6 @@ repeat:
return 0;
}
-static bool object_is_idle(struct drm_i915_gem_object *obj)
-{
- unsigned long active = i915_gem_object_get_active(obj);
- int idx;
-
- for_each_active(active, idx) {
- if (!i915_gem_active_is_idle(&obj->last_read[idx],
- &obj->base.dev->struct_mutex))
- return false;
- }
-
- return true;
-}
-
static int
i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
struct eb_vmas *eb,
@@ -648,10 +634,6 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
return -EINVAL;
}
- /* We can't wait for rendering with pagefaults disabled */
- if (pagefault_disabled() && !object_is_idle(obj))
- return -EFAULT;
-
ret = relocate_entry(obj, reloc, cache, target_offset);
if (ret)
return ret;
@@ -678,12 +660,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
remain = entry->relocation_count;
while (remain) {
struct drm_i915_gem_relocation_entry *r = stack_reloc;
- int count = remain;
- if (count > ARRAY_SIZE(stack_reloc))
- count = ARRAY_SIZE(stack_reloc);
+ unsigned long unwritten;
+ unsigned int count;
+
+ count = min_t(unsigned int, remain, ARRAY_SIZE(stack_reloc));
remain -= count;
- if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
+ /* This is the fast path and we cannot handle a pagefault
+ * whilst holding the struct mutex lest the user pass in the
+ * relocations contained within a mmaped bo. For in such a case
+ * we, the page fault handler would call i915_gem_fault() and
+ * we would try to acquire the struct mutex again. Obviously
+ * this is bad and so lockdep complains vehemently.
+ */
+ pagefault_disable();
+ unwritten = __copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]));
+ pagefault_enable();
+ if (unwritten) {
ret = -EFAULT;
goto out;
}
@@ -695,11 +688,19 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
if (ret)
goto out;
- if (r->presumed_offset != offset &&
- __put_user(r->presumed_offset,
- &user_relocs->presumed_offset)) {
- ret = -EFAULT;
- goto out;
+ if (r->presumed_offset != offset) {
+ /* Copying back to the user is allowed to fail.
+ * The information passed back is a hint as
+ * to the final location. If the copy_to_user
+ * fails after a successful copy_from_user,
+ * it must be a readonly location and so
+ * we presume the user knows what they are
+ * doing!
+ */
+ pagefault_disable();
+ __put_user(r->presumed_offset,
+ &user_relocs->presumed_offset);
+ pagefault_enable();
}
user_relocs++;
@@ -739,20 +740,11 @@ i915_gem_execbuffer_relocate(struct eb_vmas *eb)
struct i915_vma *vma;
int ret = 0;
- /* This is the fast path and we cannot handle a pagefault whilst
- * holding the struct mutex lest the user pass in the relocations
- * contained within a mmaped bo. For in such a case we, the page
- * fault handler would call i915_gem_fault() and we would try to
- * acquire the struct mutex again. Obviously this is bad and so
- * lockdep complains vehemently.
- */
- pagefault_disable();
list_for_each_entry(vma, &eb->vmas, exec_list) {
ret = i915_gem_execbuffer_relocate_vma(vma, eb);
if (ret)
break;
}
- pagefault_enable();
return ret;
}
--
2.9.3
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 12+ messages in thread
* ✗ Fi.CI.BAT: failure for drm/i915: Restrict pagefault disabling to just around copy_from_user()
2016-10-17 14:10 [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user() Chris Wilson
@ 2016-10-17 17:28 ` Patchwork
2016-10-18 8:01 ` [PATCH] " Tvrtko Ursulin
2016-10-18 9:35 ` ✗ Fi.CI.BAT: failure for drm/i915: Restrict pagefault disabling to just around copy_from_user() (rev2) Patchwork
2 siblings, 0 replies; 12+ messages in thread
From: Patchwork @ 2016-10-17 17:28 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: drm/i915: Restrict pagefault disabling to just around copy_from_user()
URL : https://patchwork.freedesktop.org/series/13880/
State : failure
== Summary ==
Series 13880v1 drm/i915: Restrict pagefault disabling to just around copy_from_user()
https://patchwork.freedesktop.org/api/1.0/series/13880/revisions/1/mbox/
Test gem_exec_suspend:
Subgroup basic-s3:
incomplete -> DMESG-WARN (fi-byt-n2820)
dmesg-warn -> INCOMPLETE (fi-byt-j1900)
Test kms_force_connector_basic:
Subgroup force-load-detect:
incomplete -> PASS (fi-ivb-3520m)
pass -> INCOMPLETE (fi-ivb-3770)
Test kms_pipe_crc_basic:
Subgroup suspend-read-crc-pipe-b:
pass -> INCOMPLETE (fi-snb-2600)
Test vgem_basic:
Subgroup unload:
pass -> SKIP (fi-bdw-5557u)
skip -> PASS (fi-hsw-4770r)
fi-bdw-5557u total:246 pass:230 dwarn:0 dfail:0 fail:0 skip:16
fi-bsw-n3050 total:246 pass:203 dwarn:1 dfail:0 fail:0 skip:42
fi-bxt-t5700 total:246 pass:216 dwarn:0 dfail:0 fail:0 skip:30
fi-byt-j1900 total:76 pass:62 dwarn:0 dfail:0 fail:1 skip:12
fi-byt-n2820 total:246 pass:209 dwarn:1 dfail:0 fail:1 skip:35
fi-hsw-4770 total:246 pass:224 dwarn:0 dfail:0 fail:0 skip:22
fi-hsw-4770r total:246 pass:223 dwarn:1 dfail:0 fail:0 skip:22
fi-ilk-650 total:246 pass:181 dwarn:0 dfail:0 fail:5 skip:60
fi-ivb-3520m total:246 pass:221 dwarn:0 dfail:0 fail:0 skip:25
fi-ivb-3770 total:186 pass:164 dwarn:0 dfail:0 fail:0 skip:21
fi-skl-6700hq total:246 pass:223 dwarn:0 dfail:0 fail:0 skip:23
fi-skl-6700k total:246 pass:221 dwarn:1 dfail:0 fail:0 skip:24
fi-snb-2600 total:209 pass:176 dwarn:0 dfail:0 fail:0 skip:32
fi-kbl-7200u failed to collect. IGT log at Patchwork_2738/fi-kbl-7200u/igt.log
fi-skl-6260u failed to connect after reboot
fi-skl-6770hq failed to connect after reboot
Results at /archive/results/CI_IGT_test/Patchwork_2738/
7ec75289774dec48c46c37271fb334b7caed3d32 drm-intel-nightly: 2016y-10m-17d-14h-07m-32s UTC integration manifest
bde1775 drm/i915: Restrict pagefault disabling to just around copy_from_user()
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user()
2016-10-17 14:10 [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user() Chris Wilson
2016-10-17 17:28 ` ✗ Fi.CI.BAT: failure for " Patchwork
@ 2016-10-18 8:01 ` Tvrtko Ursulin
2016-10-18 8:17 ` Chris Wilson
2016-10-18 9:35 ` ✗ Fi.CI.BAT: failure for drm/i915: Restrict pagefault disabling to just around copy_from_user() (rev2) Patchwork
2 siblings, 1 reply; 12+ messages in thread
From: Tvrtko Ursulin @ 2016-10-18 8:01 UTC (permalink / raw)
To: Chris Wilson, intel-gfx; +Cc: Daniel Vetter
On 17/10/2016 15:10, Chris Wilson wrote:
> When handling execbuf relocations, we play a delicate dance with
> pagefault. We first try to access the user pages underneath our
> struct_mutex. However, if those pages were inside a GEM object, we may
> trigger a pagefault and deadlock as i915_gem_fault() tries to
> recursively acquire struct_mutex. Instead, we choose to disable
> pagefaulting around the copy_from_user whilst inside the struct_mutex
> and handle the EFAULT by falling back to a copy outside the
> struct_mutex.
>
> We however presumed that disabling pagefaults would be expensive. It is
> just an operation on the local current task. Cheap enough that we can
> restrict the disable/enable to the critical section around the copy, and
> so avoid having to handle the atomic sections within the relocation
> handling itself.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem_execbuffer.c | 64 +++++++++++++-----------------
> 1 file changed, 28 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 1d02e74ce62d..22342ad0e07f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -551,20 +551,6 @@ repeat:
> return 0;
> }
>
> -static bool object_is_idle(struct drm_i915_gem_object *obj)
> -{
> - unsigned long active = i915_gem_object_get_active(obj);
> - int idx;
> -
> - for_each_active(active, idx) {
> - if (!i915_gem_active_is_idle(&obj->last_read[idx],
> - &obj->base.dev->struct_mutex))
> - return false;
> - }
> -
> - return true;
> -}
> -
> static int
> i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
> struct eb_vmas *eb,
> @@ -648,10 +634,6 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
> return -EINVAL;
> }
>
> - /* We can't wait for rendering with pagefaults disabled */
> - if (pagefault_disabled() && !object_is_idle(obj))
> - return -EFAULT;
> -
> ret = relocate_entry(obj, reloc, cache, target_offset);
> if (ret)
> return ret;
> @@ -678,12 +660,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
> remain = entry->relocation_count;
> while (remain) {
> struct drm_i915_gem_relocation_entry *r = stack_reloc;
> - int count = remain;
> - if (count > ARRAY_SIZE(stack_reloc))
> - count = ARRAY_SIZE(stack_reloc);
> + unsigned long unwritten;
> + unsigned int count;
> +
> + count = min_t(unsigned int, remain, ARRAY_SIZE(stack_reloc));
> remain -= count;
>
> - if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
> + /* This is the fast path and we cannot handle a pagefault
> + * whilst holding the struct mutex lest the user pass in the
> + * relocations contained within a mmaped bo. For in such a case
> + * we, the page fault handler would call i915_gem_fault() and
> + * we would try to acquire the struct mutex again. Obviously
> + * this is bad and so lockdep complains vehemently.
> + */
> + pagefault_disable();
> + unwritten = __copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]));
> + pagefault_enable();
> + if (unwritten) {
> ret = -EFAULT;
> goto out;
> }
> @@ -695,11 +688,19 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
> if (ret)
> goto out;
>
> - if (r->presumed_offset != offset &&
> - __put_user(r->presumed_offset,
> - &user_relocs->presumed_offset)) {
> - ret = -EFAULT;
> - goto out;
> + if (r->presumed_offset != offset) {
> + /* Copying back to the user is allowed to fail.
> + * The information passed back is a hint as
> + * to the final location. If the copy_to_user
> + * fails after a successful copy_from_user,
> + * it must be a readonly location and so
> + * we presume the user knows what they are
> + * doing!
> + */
> + pagefault_disable();
> + __put_user(r->presumed_offset,
> + &user_relocs->presumed_offset);
> + pagefault_enable();
Why is a good idea to ignore potential errors here?
> }
>
> user_relocs++;
> @@ -739,20 +740,11 @@ i915_gem_execbuffer_relocate(struct eb_vmas *eb)
> struct i915_vma *vma;
> int ret = 0;
>
> - /* This is the fast path and we cannot handle a pagefault whilst
> - * holding the struct mutex lest the user pass in the relocations
> - * contained within a mmaped bo. For in such a case we, the page
> - * fault handler would call i915_gem_fault() and we would try to
> - * acquire the struct mutex again. Obviously this is bad and so
> - * lockdep complains vehemently.
> - */
> - pagefault_disable();
> list_for_each_entry(vma, &eb->vmas, exec_list) {
> ret = i915_gem_execbuffer_relocate_vma(vma, eb);
> if (ret)
> break;
> }
> - pagefault_enable();
>
> return ret;
> }
Otherwise LGTM.
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user()
2016-10-18 8:01 ` [PATCH] " Tvrtko Ursulin
@ 2016-10-18 8:17 ` Chris Wilson
2016-10-18 8:22 ` Tvrtko Ursulin
0 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2016-10-18 8:17 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: Daniel Vetter, intel-gfx
On Tue, Oct 18, 2016 at 09:01:30AM +0100, Tvrtko Ursulin wrote:
>
> On 17/10/2016 15:10, Chris Wilson wrote:
> >When handling execbuf relocations, we play a delicate dance with
> >pagefault. We first try to access the user pages underneath our
> >struct_mutex. However, if those pages were inside a GEM object, we may
> >trigger a pagefault and deadlock as i915_gem_fault() tries to
> >recursively acquire struct_mutex. Instead, we choose to disable
> >pagefaulting around the copy_from_user whilst inside the struct_mutex
> >and handle the EFAULT by falling back to a copy outside the
> >struct_mutex.
> >
> >We however presumed that disabling pagefaults would be expensive. It is
> >just an operation on the local current task. Cheap enough that we can
> >restrict the disable/enable to the critical section around the copy, and
> >so avoid having to handle the atomic sections within the relocation
> >handling itself.
> >
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> >Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> >---
> > drivers/gpu/drm/i915/i915_gem_execbuffer.c | 64 +++++++++++++-----------------
> > 1 file changed, 28 insertions(+), 36 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >index 1d02e74ce62d..22342ad0e07f 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >@@ -551,20 +551,6 @@ repeat:
> > return 0;
> > }
> >-static bool object_is_idle(struct drm_i915_gem_object *obj)
> >-{
> >- unsigned long active = i915_gem_object_get_active(obj);
> >- int idx;
> >-
> >- for_each_active(active, idx) {
> >- if (!i915_gem_active_is_idle(&obj->last_read[idx],
> >- &obj->base.dev->struct_mutex))
> >- return false;
> >- }
> >-
> >- return true;
> >-}
> >-
> > static int
> > i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
> > struct eb_vmas *eb,
> >@@ -648,10 +634,6 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
> > return -EINVAL;
> > }
> >- /* We can't wait for rendering with pagefaults disabled */
> >- if (pagefault_disabled() && !object_is_idle(obj))
> >- return -EFAULT;
> >-
> > ret = relocate_entry(obj, reloc, cache, target_offset);
> > if (ret)
> > return ret;
> >@@ -678,12 +660,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
> > remain = entry->relocation_count;
> > while (remain) {
> > struct drm_i915_gem_relocation_entry *r = stack_reloc;
> >- int count = remain;
> >- if (count > ARRAY_SIZE(stack_reloc))
> >- count = ARRAY_SIZE(stack_reloc);
> >+ unsigned long unwritten;
> >+ unsigned int count;
> >+
> >+ count = min_t(unsigned int, remain, ARRAY_SIZE(stack_reloc));
> > remain -= count;
> >- if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
> >+ /* This is the fast path and we cannot handle a pagefault
> >+ * whilst holding the struct mutex lest the user pass in the
> >+ * relocations contained within a mmaped bo. For in such a case
> >+ * we, the page fault handler would call i915_gem_fault() and
> >+ * we would try to acquire the struct mutex again. Obviously
> >+ * this is bad and so lockdep complains vehemently.
> >+ */
> >+ pagefault_disable();
> >+ unwritten = __copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]));
> >+ pagefault_enable();
> >+ if (unwritten) {
> > ret = -EFAULT;
> > goto out;
> > }
> >@@ -695,11 +688,19 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
> > if (ret)
> > goto out;
> >- if (r->presumed_offset != offset &&
> >- __put_user(r->presumed_offset,
> >- &user_relocs->presumed_offset)) {
> >- ret = -EFAULT;
> >- goto out;
> >+ if (r->presumed_offset != offset) {
> >+ /* Copying back to the user is allowed to fail.
> >+ * The information passed back is a hint as
> >+ * to the final location. If the copy_to_user
> >+ * fails after a successful copy_from_user,
> >+ * it must be a readonly location and so
> >+ * we presume the user knows what they are
> >+ * doing!
> >+ */
> >+ pagefault_disable();
> >+ __put_user(r->presumed_offset,
> >+ &user_relocs->presumed_offset);
> >+ pagefault_enable();
>
> Why is a good idea to ignore potential errors here?
Wrong question: why did we think it a good idea to ignore success here?
(a) it is safe to do so, and I can legitimately setup userspace to use
this
(b) reporting an error after we have committed the change is broken
anyway.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user()
2016-10-18 8:17 ` Chris Wilson
@ 2016-10-18 8:22 ` Tvrtko Ursulin
2016-10-18 8:38 ` Chris Wilson
0 siblings, 1 reply; 12+ messages in thread
From: Tvrtko Ursulin @ 2016-10-18 8:22 UTC (permalink / raw)
To: Chris Wilson, intel-gfx, Daniel Vetter
On 18/10/2016 09:17, Chris Wilson wrote:
> On Tue, Oct 18, 2016 at 09:01:30AM +0100, Tvrtko Ursulin wrote:
>> On 17/10/2016 15:10, Chris Wilson wrote:
>>> When handling execbuf relocations, we play a delicate dance with
>>> pagefault. We first try to access the user pages underneath our
>>> struct_mutex. However, if those pages were inside a GEM object, we may
>>> trigger a pagefault and deadlock as i915_gem_fault() tries to
>>> recursively acquire struct_mutex. Instead, we choose to disable
>>> pagefaulting around the copy_from_user whilst inside the struct_mutex
>>> and handle the EFAULT by falling back to a copy outside the
>>> struct_mutex.
>>>
>>> We however presumed that disabling pagefaults would be expensive. It is
>>> just an operation on the local current task. Cheap enough that we can
>>> restrict the disable/enable to the critical section around the copy, and
>>> so avoid having to handle the atomic sections within the relocation
>>> handling itself.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>> ---
>>> drivers/gpu/drm/i915/i915_gem_execbuffer.c | 64 +++++++++++++-----------------
>>> 1 file changed, 28 insertions(+), 36 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>> index 1d02e74ce62d..22342ad0e07f 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>> @@ -551,20 +551,6 @@ repeat:
>>> return 0;
>>> }
>>> -static bool object_is_idle(struct drm_i915_gem_object *obj)
>>> -{
>>> - unsigned long active = i915_gem_object_get_active(obj);
>>> - int idx;
>>> -
>>> - for_each_active(active, idx) {
>>> - if (!i915_gem_active_is_idle(&obj->last_read[idx],
>>> - &obj->base.dev->struct_mutex))
>>> - return false;
>>> - }
>>> -
>>> - return true;
>>> -}
>>> -
>>> static int
>>> i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>>> struct eb_vmas *eb,
>>> @@ -648,10 +634,6 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>>> return -EINVAL;
>>> }
>>> - /* We can't wait for rendering with pagefaults disabled */
>>> - if (pagefault_disabled() && !object_is_idle(obj))
>>> - return -EFAULT;
>>> -
>>> ret = relocate_entry(obj, reloc, cache, target_offset);
>>> if (ret)
>>> return ret;
>>> @@ -678,12 +660,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
>>> remain = entry->relocation_count;
>>> while (remain) {
>>> struct drm_i915_gem_relocation_entry *r = stack_reloc;
>>> - int count = remain;
>>> - if (count > ARRAY_SIZE(stack_reloc))
>>> - count = ARRAY_SIZE(stack_reloc);
>>> + unsigned long unwritten;
>>> + unsigned int count;
>>> +
>>> + count = min_t(unsigned int, remain, ARRAY_SIZE(stack_reloc));
>>> remain -= count;
>>> - if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
>>> + /* This is the fast path and we cannot handle a pagefault
>>> + * whilst holding the struct mutex lest the user pass in the
>>> + * relocations contained within a mmaped bo. For in such a case
>>> + * we, the page fault handler would call i915_gem_fault() and
>>> + * we would try to acquire the struct mutex again. Obviously
>>> + * this is bad and so lockdep complains vehemently.
>>> + */
>>> + pagefault_disable();
>>> + unwritten = __copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]));
>>> + pagefault_enable();
>>> + if (unwritten) {
>>> ret = -EFAULT;
>>> goto out;
>>> }
>>> @@ -695,11 +688,19 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
>>> if (ret)
>>> goto out;
>>> - if (r->presumed_offset != offset &&
>>> - __put_user(r->presumed_offset,
>>> - &user_relocs->presumed_offset)) {
>>> - ret = -EFAULT;
>>> - goto out;
>>> + if (r->presumed_offset != offset) {
>>> + /* Copying back to the user is allowed to fail.
>>> + * The information passed back is a hint as
>>> + * to the final location. If the copy_to_user
>>> + * fails after a successful copy_from_user,
>>> + * it must be a readonly location and so
>>> + * we presume the user knows what they are
>>> + * doing!
>>> + */
>>> + pagefault_disable();
>>> + __put_user(r->presumed_offset,
>>> + &user_relocs->presumed_offset);
>>> + pagefault_enable();
>> Why is a good idea to ignore potential errors here?
> Wrong question: why did we think it a good idea to ignore success here?
How were we ignoring errors, I don't follow?
> (a) it is safe to do so, and I can legitimately setup userspace to use
> this
How what where? :)
> (b) reporting an error after we have committed the change is broken
> anyway.
You mean in the half way through cases?
Regards,
Tvrkto
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user()
2016-10-18 8:22 ` Tvrtko Ursulin
@ 2016-10-18 8:38 ` Chris Wilson
2016-10-18 8:54 ` Tvrtko Ursulin
0 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2016-10-18 8:38 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: Daniel Vetter, intel-gfx
On Tue, Oct 18, 2016 at 09:22:56AM +0100, Tvrtko Ursulin wrote:
>
> On 18/10/2016 09:17, Chris Wilson wrote:
> >On Tue, Oct 18, 2016 at 09:01:30AM +0100, Tvrtko Ursulin wrote:
> >>On 17/10/2016 15:10, Chris Wilson wrote:
> >>>When handling execbuf relocations, we play a delicate dance with
> >>>pagefault. We first try to access the user pages underneath our
> >>>struct_mutex. However, if those pages were inside a GEM object, we may
> >>>trigger a pagefault and deadlock as i915_gem_fault() tries to
> >>>recursively acquire struct_mutex. Instead, we choose to disable
> >>>pagefaulting around the copy_from_user whilst inside the struct_mutex
> >>>and handle the EFAULT by falling back to a copy outside the
> >>>struct_mutex.
> >>>
> >>>We however presumed that disabling pagefaults would be expensive. It is
> >>>just an operation on the local current task. Cheap enough that we can
> >>>restrict the disable/enable to the critical section around the copy, and
> >>>so avoid having to handle the atomic sections within the relocation
> >>>handling itself.
> >>>
> >>>Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> >>>Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> >>>---
> >>> drivers/gpu/drm/i915/i915_gem_execbuffer.c | 64 +++++++++++++-----------------
> >>> 1 file changed, 28 insertions(+), 36 deletions(-)
> >>>
> >>>diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>>index 1d02e74ce62d..22342ad0e07f 100644
> >>>--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>>+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>>@@ -551,20 +551,6 @@ repeat:
> >>> return 0;
> >>> }
> >>>-static bool object_is_idle(struct drm_i915_gem_object *obj)
> >>>-{
> >>>- unsigned long active = i915_gem_object_get_active(obj);
> >>>- int idx;
> >>>-
> >>>- for_each_active(active, idx) {
> >>>- if (!i915_gem_active_is_idle(&obj->last_read[idx],
> >>>- &obj->base.dev->struct_mutex))
> >>>- return false;
> >>>- }
> >>>-
> >>>- return true;
> >>>-}
> >>>-
> >>> static int
> >>> i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
> >>> struct eb_vmas *eb,
> >>>@@ -648,10 +634,6 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
> >>> return -EINVAL;
> >>> }
> >>>- /* We can't wait for rendering with pagefaults disabled */
> >>>- if (pagefault_disabled() && !object_is_idle(obj))
> >>>- return -EFAULT;
> >>>-
> >>> ret = relocate_entry(obj, reloc, cache, target_offset);
> >>> if (ret)
> >>> return ret;
> >>>@@ -678,12 +660,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
> >>> remain = entry->relocation_count;
> >>> while (remain) {
> >>> struct drm_i915_gem_relocation_entry *r = stack_reloc;
> >>>- int count = remain;
> >>>- if (count > ARRAY_SIZE(stack_reloc))
> >>>- count = ARRAY_SIZE(stack_reloc);
> >>>+ unsigned long unwritten;
> >>>+ unsigned int count;
> >>>+
> >>>+ count = min_t(unsigned int, remain, ARRAY_SIZE(stack_reloc));
> >>> remain -= count;
> >>>- if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
> >>>+ /* This is the fast path and we cannot handle a pagefault
> >>>+ * whilst holding the struct mutex lest the user pass in the
> >>>+ * relocations contained within a mmaped bo. For in such a case
> >>>+ * we, the page fault handler would call i915_gem_fault() and
> >>>+ * we would try to acquire the struct mutex again. Obviously
> >>>+ * this is bad and so lockdep complains vehemently.
> >>>+ */
> >>>+ pagefault_disable();
> >>>+ unwritten = __copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]));
> >>>+ pagefault_enable();
> >>>+ if (unwritten) {
> >>> ret = -EFAULT;
> >>> goto out;
> >>> }
> >>>@@ -695,11 +688,19 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
> >>> if (ret)
> >>> goto out;
> >>>- if (r->presumed_offset != offset &&
> >>>- __put_user(r->presumed_offset,
> >>>- &user_relocs->presumed_offset)) {
> >>>- ret = -EFAULT;
> >>>- goto out;
> >>>+ if (r->presumed_offset != offset) {
> >>>+ /* Copying back to the user is allowed to fail.
> >>>+ * The information passed back is a hint as
> >>>+ * to the final location. If the copy_to_user
> >>>+ * fails after a successful copy_from_user,
> >>>+ * it must be a readonly location and so
> >>>+ * we presume the user knows what they are
> >>>+ * doing!
> >>>+ */
> >>>+ pagefault_disable();
> >>>+ __put_user(r->presumed_offset,
> >>>+ &user_relocs->presumed_offset);
> >>>+ pagefault_enable();
> >>Why is a good idea to ignore potential errors here?
> >Wrong question: why did we think it a good idea to ignore success here?
>
> How were we ignoring errors, I don't follow?
>
> >(a) it is safe to do so, and I can legitimately setup userspace to use
> >this
>
> How what where? :)
igt, whereelse, has a test case to check we can execute from read only
memory.
> >(b) reporting an error after we have committed the change is broken
> >anyway.
>
> You mean in the half way through cases?
We've already made the user/gpu visible change. Failing here is broken.
(User doesn't notice the change, presumed_offset doesn't match actual,
kernel is oblivious to mismatch, GPU hangs.)
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user()
2016-10-18 8:38 ` Chris Wilson
@ 2016-10-18 8:54 ` Tvrtko Ursulin
2016-10-18 9:00 ` Chris Wilson
0 siblings, 1 reply; 12+ messages in thread
From: Tvrtko Ursulin @ 2016-10-18 8:54 UTC (permalink / raw)
To: Chris Wilson, intel-gfx, Daniel Vetter
On 18/10/2016 09:38, Chris Wilson wrote:
> On Tue, Oct 18, 2016 at 09:22:56AM +0100, Tvrtko Ursulin wrote:
>> On 18/10/2016 09:17, Chris Wilson wrote:
>>> On Tue, Oct 18, 2016 at 09:01:30AM +0100, Tvrtko Ursulin wrote:
>>>> On 17/10/2016 15:10, Chris Wilson wrote:
>>>>> When handling execbuf relocations, we play a delicate dance with
>>>>> pagefault. We first try to access the user pages underneath our
>>>>> struct_mutex. However, if those pages were inside a GEM object, we may
>>>>> trigger a pagefault and deadlock as i915_gem_fault() tries to
>>>>> recursively acquire struct_mutex. Instead, we choose to disable
>>>>> pagefaulting around the copy_from_user whilst inside the struct_mutex
>>>>> and handle the EFAULT by falling back to a copy outside the
>>>>> struct_mutex.
>>>>>
>>>>> We however presumed that disabling pagefaults would be expensive. It is
>>>>> just an operation on the local current task. Cheap enough that we can
>>>>> restrict the disable/enable to the critical section around the copy, and
>>>>> so avoid having to handle the atomic sections within the relocation
>>>>> handling itself.
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>>>> ---
>>>>> drivers/gpu/drm/i915/i915_gem_execbuffer.c | 64 +++++++++++++-----------------
>>>>> 1 file changed, 28 insertions(+), 36 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>>>> index 1d02e74ce62d..22342ad0e07f 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>>>>> @@ -551,20 +551,6 @@ repeat:
>>>>> return 0;
>>>>> }
>>>>> -static bool object_is_idle(struct drm_i915_gem_object *obj)
>>>>> -{
>>>>> - unsigned long active = i915_gem_object_get_active(obj);
>>>>> - int idx;
>>>>> -
>>>>> - for_each_active(active, idx) {
>>>>> - if (!i915_gem_active_is_idle(&obj->last_read[idx],
>>>>> - &obj->base.dev->struct_mutex))
>>>>> - return false;
>>>>> - }
>>>>> -
>>>>> - return true;
>>>>> -}
>>>>> -
>>>>> static int
>>>>> i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>>>>> struct eb_vmas *eb,
>>>>> @@ -648,10 +634,6 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
>>>>> return -EINVAL;
>>>>> }
>>>>> - /* We can't wait for rendering with pagefaults disabled */
>>>>> - if (pagefault_disabled() && !object_is_idle(obj))
>>>>> - return -EFAULT;
>>>>> -
>>>>> ret = relocate_entry(obj, reloc, cache, target_offset);
>>>>> if (ret)
>>>>> return ret;
>>>>> @@ -678,12 +660,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
>>>>> remain = entry->relocation_count;
>>>>> while (remain) {
>>>>> struct drm_i915_gem_relocation_entry *r = stack_reloc;
>>>>> - int count = remain;
>>>>> - if (count > ARRAY_SIZE(stack_reloc))
>>>>> - count = ARRAY_SIZE(stack_reloc);
>>>>> + unsigned long unwritten;
>>>>> + unsigned int count;
>>>>> +
>>>>> + count = min_t(unsigned int, remain, ARRAY_SIZE(stack_reloc));
>>>>> remain -= count;
>>>>> - if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
>>>>> + /* This is the fast path and we cannot handle a pagefault
>>>>> + * whilst holding the struct mutex lest the user pass in the
>>>>> + * relocations contained within a mmaped bo. For in such a case
>>>>> + * we, the page fault handler would call i915_gem_fault() and
>>>>> + * we would try to acquire the struct mutex again. Obviously
>>>>> + * this is bad and so lockdep complains vehemently.
>>>>> + */
>>>>> + pagefault_disable();
>>>>> + unwritten = __copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]));
>>>>> + pagefault_enable();
>>>>> + if (unwritten) {
>>>>> ret = -EFAULT;
>>>>> goto out;
>>>>> }
>>>>> @@ -695,11 +688,19 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
>>>>> if (ret)
>>>>> goto out;
>>>>> - if (r->presumed_offset != offset &&
>>>>> - __put_user(r->presumed_offset,
>>>>> - &user_relocs->presumed_offset)) {
>>>>> - ret = -EFAULT;
>>>>> - goto out;
>>>>> + if (r->presumed_offset != offset) {
>>>>> + /* Copying back to the user is allowed to fail.
>>>>> + * The information passed back is a hint as
>>>>> + * to the final location. If the copy_to_user
>>>>> + * fails after a successful copy_from_user,
>>>>> + * it must be a readonly location and so
>>>>> + * we presume the user knows what they are
>>>>> + * doing!
>>>>> + */
>>>>> + pagefault_disable();
>>>>> + __put_user(r->presumed_offset,
>>>>> + &user_relocs->presumed_offset);
>>>>> + pagefault_enable();
>>>> Why is a good idea to ignore potential errors here?
>>> Wrong question: why did we think it a good idea to ignore success here?
>> How were we ignoring errors, I don't follow?
>>
>>> (a) it is safe to do so, and I can legitimately setup userspace to use
>>> this
>> How what where? :)
> igt, whereelse, has a test case to check we can execute from read only
> memory.
With relocations?
>
>>> (b) reporting an error after we have committed the change is broken
>>> anyway.
>> You mean in the half way through cases?
> We've already made the user/gpu visible change. Failing here is broken.
> (User doesn't notice the change, presumed_offset doesn't match actual,
> kernel is oblivious to mismatch, GPU hangs.)
How is that? I see a error path propagating all the way up to failing
the execbuf.
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user()
2016-10-18 8:54 ` Tvrtko Ursulin
@ 2016-10-18 9:00 ` Chris Wilson
2016-10-18 9:07 ` [PATCH v2] " Chris Wilson
0 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2016-10-18 9:00 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: Daniel Vetter, intel-gfx
On Tue, Oct 18, 2016 at 09:54:23AM +0100, Tvrtko Ursulin wrote:
> On 18/10/2016 09:38, Chris Wilson wrote:
> >>>(a) it is safe to do so, and I can legitimately setup userspace to use
> >>>this
> >>How what where? :)
> >igt, whereelse, has a test case to check we can execute from read only
> >memory.
>
> With relocations?
Yes. The layout is advisory.
> >>>(b) reporting an error after we have committed the change is broken
> >>>anyway.
> >>You mean in the half way through cases?
> >We've already made the user/gpu visible change. Failing here is broken.
> >(User doesn't notice the change, presumed_offset doesn't match actual,
> >kernel is oblivious to mismatch, GPU hangs.)
>
> How is that? I see a error path propagating all the way up to
> failing the execbuf.
The change has already been committed to the object, but the user is not
informed and so on the next pass everyone is oblivious that the value in
the buffer no longer matches (and so can pass through unscathed resulting
in a hang). The error handling itself is fragile.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2] drm/i915: Restrict pagefault disabling to just around copy_from_user()
2016-10-18 9:00 ` Chris Wilson
@ 2016-10-18 9:07 ` Chris Wilson
2016-10-18 9:11 ` Tvrtko Ursulin
0 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2016-10-18 9:07 UTC (permalink / raw)
To: intel-gfx; +Cc: Daniel Vetter
When handling execbuf relocations, we play a delicate dance with
pagefault. We first try to access the user pages underneath our
struct_mutex. However, if those pages were inside a GEM object, we may
trigger a pagefault and deadlock as i915_gem_fault() tries to
recursively acquire struct_mutex. Instead, we choose to disable
pagefaulting around the copy_from_user whilst inside the struct_mutex
and handle the EFAULT by falling back to a copy outside the
struct_mutex.
We however presumed that disabling pagefaults would be expensive. It is
just an operation on the local current task. Cheap enough that we can
restrict the disable/enable to the critical section around the copy, and
so avoid having to handle the atomic sections within the relocation
handling itself.
v2: Just illustrate the broken error handling rather than argue why it
is safer to ignore it, for now.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 71 +++++++++++++++---------------
1 file changed, 35 insertions(+), 36 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 204536a3087e..e52affdcc125 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -551,20 +551,6 @@ relocate_entry(struct drm_i915_gem_object *obj,
return 0;
}
-static bool object_is_idle(struct drm_i915_gem_object *obj)
-{
- unsigned long active = i915_gem_object_get_active(obj);
- int idx;
-
- for_each_active(active, idx) {
- if (!i915_gem_active_is_idle(&obj->last_read[idx],
- &obj->base.dev->struct_mutex))
- return false;
- }
-
- return true;
-}
-
static int
i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
struct eb_vmas *eb,
@@ -648,10 +634,6 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
return -EINVAL;
}
- /* We can't wait for rendering with pagefaults disabled */
- if (pagefault_disabled() && !object_is_idle(obj))
- return -EFAULT;
-
ret = relocate_entry(obj, reloc, cache, target_offset);
if (ret)
return ret;
@@ -678,12 +660,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
remain = entry->relocation_count;
while (remain) {
struct drm_i915_gem_relocation_entry *r = stack_reloc;
- int count = remain;
- if (count > ARRAY_SIZE(stack_reloc))
- count = ARRAY_SIZE(stack_reloc);
+ unsigned long unwritten;
+ unsigned int count;
+
+ count = min_t(unsigned int, remain, ARRAY_SIZE(stack_reloc));
remain -= count;
- if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
+ /* This is the fast path and we cannot handle a pagefault
+ * whilst holding the struct mutex lest the user pass in the
+ * relocations contained within a mmaped bo. For in such a case
+ * we, the page fault handler would call i915_gem_fault() and
+ * we would try to acquire the struct mutex again. Obviously
+ * this is bad and so lockdep complains vehemently.
+ */
+ pagefault_disable();
+ unwritten = __copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]));
+ pagefault_enable();
+ if (unlikely(unwritten)) {
ret = -EFAULT;
goto out;
}
@@ -695,11 +688,26 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
if (ret)
goto out;
- if (r->presumed_offset != offset &&
- __put_user(r->presumed_offset,
- &user_relocs->presumed_offset)) {
- ret = -EFAULT;
- goto out;
+ if (r->presumed_offset != offset) {
+ pagefault_disable();
+ unwritten = __put_user(r->presumed_offset,
+ &user_relocs->presumed_offset);
+ pagefault_enable();
+ if (unlikely(unwritten)) {
+ /* Note that reporting an error now
+ * leaves everything in an inconsistent
+ * state as we have *already* changed
+ * the relocation value inside the
+ * object. As we have not changed the
+ * reloc.presumed_offset or will not
+ * change the execobject.offset, on the
+ * call we may not rewrite the value
+ * inside the object, leaving it
+ * dangling and causing a GPU hang.
+ */
+ ret = -EFAULT;
+ goto out;
+ }
}
user_relocs++;
@@ -739,20 +747,11 @@ i915_gem_execbuffer_relocate(struct eb_vmas *eb)
struct i915_vma *vma;
int ret = 0;
- /* This is the fast path and we cannot handle a pagefault whilst
- * holding the struct mutex lest the user pass in the relocations
- * contained within a mmaped bo. For in such a case we, the page
- * fault handler would call i915_gem_fault() and we would try to
- * acquire the struct mutex again. Obviously this is bad and so
- * lockdep complains vehemently.
- */
- pagefault_disable();
list_for_each_entry(vma, &eb->vmas, exec_list) {
ret = i915_gem_execbuffer_relocate_vma(vma, eb);
if (ret)
break;
}
- pagefault_enable();
return ret;
}
--
2.9.3
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2] drm/i915: Restrict pagefault disabling to just around copy_from_user()
2016-10-18 9:07 ` [PATCH v2] " Chris Wilson
@ 2016-10-18 9:11 ` Tvrtko Ursulin
0 siblings, 0 replies; 12+ messages in thread
From: Tvrtko Ursulin @ 2016-10-18 9:11 UTC (permalink / raw)
To: Chris Wilson, intel-gfx; +Cc: Daniel Vetter
On 18/10/2016 10:07, Chris Wilson wrote:
> When handling execbuf relocations, we play a delicate dance with
> pagefault. We first try to access the user pages underneath our
> struct_mutex. However, if those pages were inside a GEM object, we may
> trigger a pagefault and deadlock as i915_gem_fault() tries to
> recursively acquire struct_mutex. Instead, we choose to disable
> pagefaulting around the copy_from_user whilst inside the struct_mutex
> and handle the EFAULT by falling back to a copy outside the
> struct_mutex.
>
> We however presumed that disabling pagefaults would be expensive. It is
> just an operation on the local current task. Cheap enough that we can
> restrict the disable/enable to the critical section around the copy, and
> so avoid having to handle the atomic sections within the relocation
> handling itself.
>
> v2: Just illustrate the broken error handling rather than argue why it
> is safer to ignore it, for now.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem_execbuffer.c | 71 +++++++++++++++---------------
> 1 file changed, 35 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 204536a3087e..e52affdcc125 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -551,20 +551,6 @@ relocate_entry(struct drm_i915_gem_object *obj,
> return 0;
> }
>
> -static bool object_is_idle(struct drm_i915_gem_object *obj)
> -{
> - unsigned long active = i915_gem_object_get_active(obj);
> - int idx;
> -
> - for_each_active(active, idx) {
> - if (!i915_gem_active_is_idle(&obj->last_read[idx],
> - &obj->base.dev->struct_mutex))
> - return false;
> - }
> -
> - return true;
> -}
> -
> static int
> i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
> struct eb_vmas *eb,
> @@ -648,10 +634,6 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
> return -EINVAL;
> }
>
> - /* We can't wait for rendering with pagefaults disabled */
> - if (pagefault_disabled() && !object_is_idle(obj))
> - return -EFAULT;
> -
> ret = relocate_entry(obj, reloc, cache, target_offset);
> if (ret)
> return ret;
> @@ -678,12 +660,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
> remain = entry->relocation_count;
> while (remain) {
> struct drm_i915_gem_relocation_entry *r = stack_reloc;
> - int count = remain;
> - if (count > ARRAY_SIZE(stack_reloc))
> - count = ARRAY_SIZE(stack_reloc);
> + unsigned long unwritten;
> + unsigned int count;
> +
> + count = min_t(unsigned int, remain, ARRAY_SIZE(stack_reloc));
> remain -= count;
>
> - if (__copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]))) {
> + /* This is the fast path and we cannot handle a pagefault
> + * whilst holding the struct mutex lest the user pass in the
> + * relocations contained within a mmaped bo. For in such a case
> + * we, the page fault handler would call i915_gem_fault() and
> + * we would try to acquire the struct mutex again. Obviously
> + * this is bad and so lockdep complains vehemently.
> + */
> + pagefault_disable();
> + unwritten = __copy_from_user_inatomic(r, user_relocs, count*sizeof(r[0]));
> + pagefault_enable();
> + if (unlikely(unwritten)) {
> ret = -EFAULT;
> goto out;
> }
> @@ -695,11 +688,26 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
> if (ret)
> goto out;
>
> - if (r->presumed_offset != offset &&
> - __put_user(r->presumed_offset,
> - &user_relocs->presumed_offset)) {
> - ret = -EFAULT;
> - goto out;
> + if (r->presumed_offset != offset) {
> + pagefault_disable();
> + unwritten = __put_user(r->presumed_offset,
> + &user_relocs->presumed_offset);
> + pagefault_enable();
> + if (unlikely(unwritten)) {
> + /* Note that reporting an error now
> + * leaves everything in an inconsistent
> + * state as we have *already* changed
> + * the relocation value inside the
> + * object. As we have not changed the
> + * reloc.presumed_offset or will not
> + * change the execobject.offset, on the
> + * call we may not rewrite the value
> + * inside the object, leaving it
> + * dangling and causing a GPU hang.
> + */
> + ret = -EFAULT;
> + goto out;
> + }
> }
>
> user_relocs++;
> @@ -739,20 +747,11 @@ i915_gem_execbuffer_relocate(struct eb_vmas *eb)
> struct i915_vma *vma;
> int ret = 0;
>
> - /* This is the fast path and we cannot handle a pagefault whilst
> - * holding the struct mutex lest the user pass in the relocations
> - * contained within a mmaped bo. For in such a case we, the page
> - * fault handler would call i915_gem_fault() and we would try to
> - * acquire the struct mutex again. Obviously this is bad and so
> - * lockdep complains vehemently.
> - */
> - pagefault_disable();
> list_for_each_entry(vma, &eb->vmas, exec_list) {
> ret = i915_gem_execbuffer_relocate_vma(vma, eb);
> if (ret)
> break;
> }
> - pagefault_enable();
>
> return ret;
> }
Ok we can continue the educational discussion at some other time. :)
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread
* ✗ Fi.CI.BAT: failure for drm/i915: Restrict pagefault disabling to just around copy_from_user() (rev2)
2016-10-17 14:10 [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user() Chris Wilson
2016-10-17 17:28 ` ✗ Fi.CI.BAT: failure for " Patchwork
2016-10-18 8:01 ` [PATCH] " Tvrtko Ursulin
@ 2016-10-18 9:35 ` Patchwork
2016-10-18 9:54 ` Chris Wilson
2 siblings, 1 reply; 12+ messages in thread
From: Patchwork @ 2016-10-18 9:35 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: drm/i915: Restrict pagefault disabling to just around copy_from_user() (rev2)
URL : https://patchwork.freedesktop.org/series/13880/
State : failure
== Summary ==
Series 13880v2 drm/i915: Restrict pagefault disabling to just around copy_from_user()
https://patchwork.freedesktop.org/api/1.0/series/13880/revisions/2/mbox/
Test gem_exec_suspend:
Subgroup basic-s3:
incomplete -> DMESG-WARN (fi-byt-n2820)
dmesg-warn -> INCOMPLETE (fi-byt-j1900)
Test gem_ringfill:
Subgroup basic-default-hang:
pass -> TIMEOUT (fi-ivb-3770)
Test kms_force_connector_basic:
Subgroup force-load-detect:
incomplete -> PASS (fi-ivb-3520m)
Test kms_pipe_crc_basic:
Subgroup hang-read-crc-pipe-b:
pass -> INCOMPLETE (fi-ivb-3770)
Subgroup suspend-read-crc-pipe-c:
pass -> FAIL (fi-skl-6700k)
Test vgem_basic:
Subgroup unload:
skip -> PASS (fi-kbl-7200u)
pass -> SKIP (fi-hsw-4770)
skip -> PASS (fi-hsw-4770r)
fi-bdw-5557u total:246 pass:231 dwarn:0 dfail:0 fail:0 skip:15
fi-bsw-n3050 total:246 pass:203 dwarn:1 dfail:0 fail:0 skip:42
fi-bxt-t5700 total:246 pass:216 dwarn:0 dfail:0 fail:0 skip:30
fi-byt-j1900 total:76 pass:62 dwarn:0 dfail:0 fail:1 skip:12
fi-byt-n2820 total:246 pass:209 dwarn:1 dfail:0 fail:1 skip:35
fi-hsw-4770 total:246 pass:223 dwarn:0 dfail:0 fail:0 skip:23
fi-hsw-4770r total:246 pass:223 dwarn:1 dfail:0 fail:0 skip:22
fi-ilk-650 total:246 pass:181 dwarn:0 dfail:0 fail:5 skip:60
fi-ivb-3520m total:246 pass:221 dwarn:0 dfail:0 fail:0 skip:25
fi-ivb-3770 total:194 pass:171 dwarn:0 dfail:0 fail:0 skip:21
fi-kbl-7200u total:246 pass:222 dwarn:0 dfail:0 fail:0 skip:24
fi-skl-6700hq total:246 pass:223 dwarn:0 dfail:0 fail:0 skip:23
fi-skl-6700k total:246 pass:220 dwarn:1 dfail:0 fail:1 skip:24
fi-snb-2520m total:246 pass:210 dwarn:0 dfail:0 fail:0 skip:36
fi-snb-2600 total:246 pass:209 dwarn:0 dfail:0 fail:0 skip:37
fi-skl-6260u failed to connect after reboot
fi-skl-6770hq failed to connect after reboot
Results at /archive/results/CI_IGT_test/Patchwork_2741/
7ec75289774dec48c46c37271fb334b7caed3d32 drm-intel-nightly: 2016y-10m-17d-14h-07m-32s UTC integration manifest
0dbfd20 drm/i915: Restrict pagefault disabling to just around copy_from_user()
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: ✗ Fi.CI.BAT: failure for drm/i915: Restrict pagefault disabling to just around copy_from_user() (rev2)
2016-10-18 9:35 ` ✗ Fi.CI.BAT: failure for drm/i915: Restrict pagefault disabling to just around copy_from_user() (rev2) Patchwork
@ 2016-10-18 9:54 ` Chris Wilson
0 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2016-10-18 9:54 UTC (permalink / raw)
To: intel-gfx
On Tue, Oct 18, 2016 at 09:35:12AM -0000, Patchwork wrote:
> == Series Details ==
>
> Series: drm/i915: Restrict pagefault disabling to just around copy_from_user() (rev2)
> URL : https://patchwork.freedesktop.org/series/13880/
> State : failure
>
> == Summary ==
>
> Series 13880v2 drm/i915: Restrict pagefault disabling to just around copy_from_user()
> https://patchwork.freedesktop.org/api/1.0/series/13880/revisions/2/mbox/
>
> Test gem_exec_suspend:
> Subgroup basic-s3:
> incomplete -> DMESG-WARN (fi-byt-n2820)
> dmesg-warn -> INCOMPLETE (fi-byt-j1900)
These are fluctuating results because we now are seeing bugs from ACPI.
> Test gem_ringfill:
> Subgroup basic-default-hang:
> pass -> TIMEOUT (fi-ivb-3770)
The unrelated case of ivb not being able to recover from a hang. :|
Still looking for that locally on an i7-3720QM
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2016-10-18 9:54 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-17 14:10 [PATCH] drm/i915: Restrict pagefault disabling to just around copy_from_user() Chris Wilson
2016-10-17 17:28 ` ✗ Fi.CI.BAT: failure for " Patchwork
2016-10-18 8:01 ` [PATCH] " Tvrtko Ursulin
2016-10-18 8:17 ` Chris Wilson
2016-10-18 8:22 ` Tvrtko Ursulin
2016-10-18 8:38 ` Chris Wilson
2016-10-18 8:54 ` Tvrtko Ursulin
2016-10-18 9:00 ` Chris Wilson
2016-10-18 9:07 ` [PATCH v2] " Chris Wilson
2016-10-18 9:11 ` Tvrtko Ursulin
2016-10-18 9:35 ` ✗ Fi.CI.BAT: failure for drm/i915: Restrict pagefault disabling to just around copy_from_user() (rev2) Patchwork
2016-10-18 9:54 ` Chris Wilson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.