All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Daniel Vetter <daniel@ffwll.ch>,
	Chris Wilson <chris@chris-wilson.co.uk>,
	intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/i915: Convert WARNs during userptr revoke to SIGBUS
Date: Thu, 8 Oct 2015 10:45:47 +0100	[thread overview]
Message-ID: <56163B4B.6010604@linux.intel.com> (raw)
In-Reply-To: <20150928141457.GY3383@phenom.ffwll.local>


On 28/09/15 15:14, Daniel Vetter wrote:
> On Mon, Sep 28, 2015 at 02:52:30PM +0100, Chris Wilson wrote:
>> On Mon, Sep 28, 2015 at 03:42:22PM +0200, Daniel Vetter wrote:
>>> On Wed, Sep 23, 2015 at 09:07:24PM +0100, Chris Wilson wrote:
>>>> If the client revokes the virtual address it asked to be mapped into GPU
>>>> space via userptr (by using anything along the lines of mmap, mprotect,
>>>> madvise, munmap, ftruncate etc) the mmu notifier sends a range
>>>> invalidate command to userptr. Upon receiving the invalidation signal
>>>> for the revoked range, we try to release the struct pages we pinned into
>>>> the GTT. However, this can fail if any of the GPU's VMA are pinned for
>>>> use by the hardware (i.e. despite the user's intention we cannot
>>>> relinquish the client's address range and keep uptodate with whatever is
>>>> placed in there). Currently we emit a few WARN so that we would notice
>>>> if this every occurred in the wild; it has. Sadly this means we need to
>>>> replace those WARNs with the proper SIGBUS to the offending clients
>>>> instead.
>>>>
>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> Cc: Michał Winiarski <michal.winiarski@intel.com>
>>>> ---
>>>>   drivers/gpu/drm/i915/i915_gem_userptr.c | 41 +++++++++++++++++++++++++++++----
>>>>   1 file changed, 37 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
>>>> index f75d90118888..efb404b9fe69 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
>>>> @@ -81,11 +81,44 @@ static void __cancel_userptr__worker(struct work_struct *work)
>>>>   		was_interruptible = dev_priv->mm.interruptible;
>>>>   		dev_priv->mm.interruptible = false;
>>>>
>>>> -		list_for_each_entry_safe(vma, tmp, &obj->vma_list, obj_link) {
>>>> -			int ret = i915_vma_unbind(vma);
>>>> -			WARN_ON(ret && ret != -EIO);
>>>> +		list_for_each_entry_safe(vma, tmp, &obj->vma_list, obj_link)
>>>> +			i915_vma_unbind(vma);
>>>> +		if (i915_gem_object_put_pages(obj)) {
>>>> +			struct task_struct *p;
>>>> +
>>>> +			DRM_ERROR("Unable to revoke ownership by userptr of"
>>>> +				  " invalidated address range, sending SIGBUS"
>>>> +				  " to attached clients.\n");
>>>> +
>>>> +			rcu_read_lock();
>>>> +			for_each_process(p) {
>>>> +				siginfo_t info;
>>>> +
>>>> +				/* This doesn't capture everyone who has
>>>> +				 * the pages pinned behind a VMA as we
>>>> +				 * do not have that tracking information
>>>> +				 * available, it does however kill the
>>>> +				 * original process (and siblings) who
>>>> +				 * created the userptr and presumably tried
>>>> +				 * to reuse the address space despite having
>>>> +				 * pinned it (possibly indirectly) to the hw.
>>>> +				 * Arguably, we don't even want to kill the
>>>> +				 * other processes as they are not at fault,
>>>> +				 * likely to be a display server, and hopefully
>>>> +				 * will release the pages in due course once
>>>> +				 * the client is dead.
>>>> +				 */
>>>> +				if (p->mm != obj->userptr.mm->mm)
>>>> +					continue;
>>>> +
>>>> +				info.si_signo = SIGBUS;
>>>> +				info.si_errno = 0;
>>>> +				info.si_code = BUS_ADRERR;
>>>> +				info.si_addr = (void __user *)obj->userptr.ptr;
>>>> +				force_sig_info(SIGBUS, &info, p);
>>>> +			}
>>>> +			rcu_read_unlock();
>>>
>>> Why do we need to send a SIGBUS? It won't tear down the offending gem bo,
>>> any new users will hopefully get it, and abusing SIGBUS without the thread
>>> actually doing a memory access is a bit surprising. DRM_DEBUG seems to be
>>> the most we can do here I think - I think userspace being able to do this
>>> is just a fundamental property of userptr.
>>
>> It is not the bo that is at fault but the *client's* *address* *space*
>> that is being changed. It is equivalent to mmap on a truncated file i.e.
>> if the client tries to use its mmapping after it has truncated the file
>> it is scolded via SIGBUS.
>
> But existing SIGBUS is thread-bound and comes with the fault address
> attached. This is just the gpu being a bit unhappy, so the SIGBUS comes
> out of complete nowhere to smack the userspace thread. Any kind of SIGBUS
> catcher userspace has for other reasons might be supremely surprised by
> this and do stupid things. Hence I don't think throwing SIGBUS here is
> correct behaviour. And there doesn't seem to be anything else suitable
> really.

Te offending address is provided with the signal as far as I can see.

I think it is fine to do this, even required since the alternative is 
for GPU to keep using random memory indefinitely and userspace never 
gets to know.

And I don't see any reason to keep the process running who did such an 
elementary and serious mistake.

Is the only concern that the process can catch it and not exit?

I am just not sure about the locking requirement for for_each_process 
since existing call sites give conflicting examples. I don't see how 
turning the preemption off can be safe without the tasklist lock but 
perhaps I am wrong, don't know.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2015-10-08  9:45 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-23 20:07 [PATCH] drm/i915: Convert WARNs during userptr revoke to SIGBUS Chris Wilson
2015-09-24 10:23 ` Tvrtko Ursulin
2015-09-24 10:31   ` Chris Wilson
2015-09-24 10:55     ` Tvrtko Ursulin
2015-09-28 13:42 ` Daniel Vetter
2015-09-28 13:52   ` Chris Wilson
2015-09-28 14:14     ` Daniel Vetter
2015-10-08  9:45       ` Tvrtko Ursulin [this message]
2015-10-09  7:48         ` Daniel Vetter
2015-10-09  8:40           ` Chris Wilson
2015-10-09  8:55             ` Daniel Vetter
2015-10-09  8:59               ` Chris Wilson
2015-10-09  9:03               ` Tvrtko Ursulin
2015-10-09 17:14                 ` Daniel Vetter
2015-10-09 17:26                   ` Chris Wilson
2015-10-09 18:33                     ` Dave Gordon
2015-10-12  9:06                     ` Tvrtko Ursulin
2015-10-12  9:31                       ` Chris Wilson
2015-10-12 10:10                         ` Chris Wilson
2015-10-12 12:59                           ` Tvrtko Ursulin
2015-10-13 11:26                         ` Daniel Vetter
2015-10-13 11:44                           ` Chris Wilson
2015-10-13 12:23                             ` Daniel Vetter
2015-10-13 13:09                               ` Chris Wilson
2015-10-13 13:39                                 ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56163B4B.6010604@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=daniel@ffwll.ch \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.