All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Auld <matthew.william.auld@gmail.com>
To: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Intel Graphics Development <intel-gfx@lists.freedesktop.org>,
	ML dri-devel <dri-devel@lists.freedesktop.org>
Subject: Re: [PATCH v2 12/16] drm/i915: Add i915_vma_unbind_unlocked, and take obj lock for i915_vma_unbind
Date: Thu, 9 Dec 2021 14:27:21 +0000	[thread overview]
Message-ID: <CAM0jSHPdWQ3_ee0dO5gwPK4rOzbLiiKJ-yujwXYEcuPc6C1HTw@mail.gmail.com> (raw)
In-Reply-To: <1188bcc7-9415-adbb-1ec2-7016392d2923@linux.intel.com>

On Thu, 9 Dec 2021 at 13:46, Maarten Lankhorst
<maarten.lankhorst@linux.intel.com> wrote:
>
> On 09-12-2021 14:40, Matthew Auld wrote:
> > On Thu, 9 Dec 2021 at 13:25, Maarten Lankhorst
> > <maarten.lankhorst@linux.intel.com> wrote:
> >> On 09-12-2021 14:05, Matthew Auld wrote:
> >>> On Mon, 29 Nov 2021 at 13:58, Maarten Lankhorst
> >>> <maarten.lankhorst@linux.intel.com> wrote:
> >>>> We want to remove more members of i915_vma, which requires the locking to be
> >>>> held more often.
> >>>>
> >>>> Start requiring gem object lock for i915_vma_unbind, as it's one of the
> >>>> callers that may unpin pages.
> >>>>
> >>>> Some special care is needed when evicting, because the last reference to the
> >>>> object may be held by the VMA, so after __i915_vma_unbind, vma may be garbage,
> >>>> and we need to cache vma->obj before unlocking.
> >>>>
> >>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> >>>> ---
> >>> <snip>
> >>>
> >>>> @@ -129,22 +129,47 @@ void i915_ggtt_suspend_vm(struct i915_address_space *vm)
> >>>>
> >>>>         drm_WARN_ON(&vm->i915->drm, !vm->is_ggtt && !vm->is_dpt);
> >>>>
> >>>> +retry:
> >>>> +       i915_gem_drain_freed_objects(vm->i915);
> >>>> +
> >>>>         mutex_lock(&vm->mutex);
> >>>>
> >>>>         /* Skip rewriting PTE on VMA unbind. */
> >>>>         open = atomic_xchg(&vm->open, 0);
> >>>>
> >>>>         list_for_each_entry_safe(vma, vn, &vm->bound_list, vm_link) {
> >>>> +               struct drm_i915_gem_object *obj = vma->obj;
> >>>> +
> >>>>                 GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
> >>>> +
> >>>>                 i915_vma_wait_for_bind(vma);
> >>>>
> >>>> -               if (i915_vma_is_pinned(vma))
> >>>> +               if (i915_vma_is_pinned(vma) || !i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND))
> >>>>                         continue;
> >>>>
> >>>> -               if (!i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND)) {
> >>>> -                       __i915_vma_evict(vma);
> >>>> -                       drm_mm_remove_node(&vma->node);
> >>>> +               /* unlikely to race when GPU is idle, so no worry about slowpath.. */
> >>>> +               if (!i915_gem_object_trylock(obj, NULL)) {
> >>>> +                       atomic_set(&vm->open, open);
> >>> Does this need a comment about barriers?
> >> Not sure, it's guarded by vm->mutex.
> >>>> +
> >>>> +                       i915_gem_object_get(obj);
> >>> Should this not be kref_get_unless_zero? Assuming the vm->mutex is the
> >>> only thing keeping the object alive here, won't this lead to potential
> >>> uaf/double-free or something? Also should we not plonk this before the
> >>> trylock? Or maybe I'm missing something here?
> >> Normally you're correct, this is normally the case, but we drain freed objects and this path should only be run during s/r, at which point userspace should be dead, GPU idle, and we just drained all freed objects above.
> >>
> >> It would be a bug if we still found a dead object, as nothing should be running.
> > Hmm, Ok. So why do we expect the trylock to ever fail here? Who else
> > can grab it at this stage?
> It probably shouldn't, should probably be a WARN if it happens.

r-b with that then.

> >>>> +                       mutex_unlock(&vm->mutex);
> >>>> +
> >>>> +                       i915_gem_object_lock(obj, NULL);
> >>>> +                       open = i915_vma_unbind(vma);
> >>>> +                       i915_gem_object_unlock(obj);
> >>>> +
> >>>> +                       GEM_WARN_ON(open);
> >>>> +
> >>>> +                       i915_gem_object_put(obj);
> >>>> +                       goto retry;
> >>>>                 }
> >>>> +
> >>>> +               i915_vma_wait_for_bind(vma);
> >>> We also call wait_for_bind above, is that intentional?
> >> Should be harmless, but first one should probably be removed. :)
> >>
>

WARNING: multiple messages have this Message-ID (diff)
From: Matthew Auld <matthew.william.auld@gmail.com>
To: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Intel Graphics Development <intel-gfx@lists.freedesktop.org>,
	ML dri-devel <dri-devel@lists.freedesktop.org>
Subject: Re: [Intel-gfx] [PATCH v2 12/16] drm/i915: Add i915_vma_unbind_unlocked, and take obj lock for i915_vma_unbind
Date: Thu, 9 Dec 2021 14:27:21 +0000	[thread overview]
Message-ID: <CAM0jSHPdWQ3_ee0dO5gwPK4rOzbLiiKJ-yujwXYEcuPc6C1HTw@mail.gmail.com> (raw)
In-Reply-To: <1188bcc7-9415-adbb-1ec2-7016392d2923@linux.intel.com>

On Thu, 9 Dec 2021 at 13:46, Maarten Lankhorst
<maarten.lankhorst@linux.intel.com> wrote:
>
> On 09-12-2021 14:40, Matthew Auld wrote:
> > On Thu, 9 Dec 2021 at 13:25, Maarten Lankhorst
> > <maarten.lankhorst@linux.intel.com> wrote:
> >> On 09-12-2021 14:05, Matthew Auld wrote:
> >>> On Mon, 29 Nov 2021 at 13:58, Maarten Lankhorst
> >>> <maarten.lankhorst@linux.intel.com> wrote:
> >>>> We want to remove more members of i915_vma, which requires the locking to be
> >>>> held more often.
> >>>>
> >>>> Start requiring gem object lock for i915_vma_unbind, as it's one of the
> >>>> callers that may unpin pages.
> >>>>
> >>>> Some special care is needed when evicting, because the last reference to the
> >>>> object may be held by the VMA, so after __i915_vma_unbind, vma may be garbage,
> >>>> and we need to cache vma->obj before unlocking.
> >>>>
> >>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> >>>> ---
> >>> <snip>
> >>>
> >>>> @@ -129,22 +129,47 @@ void i915_ggtt_suspend_vm(struct i915_address_space *vm)
> >>>>
> >>>>         drm_WARN_ON(&vm->i915->drm, !vm->is_ggtt && !vm->is_dpt);
> >>>>
> >>>> +retry:
> >>>> +       i915_gem_drain_freed_objects(vm->i915);
> >>>> +
> >>>>         mutex_lock(&vm->mutex);
> >>>>
> >>>>         /* Skip rewriting PTE on VMA unbind. */
> >>>>         open = atomic_xchg(&vm->open, 0);
> >>>>
> >>>>         list_for_each_entry_safe(vma, vn, &vm->bound_list, vm_link) {
> >>>> +               struct drm_i915_gem_object *obj = vma->obj;
> >>>> +
> >>>>                 GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
> >>>> +
> >>>>                 i915_vma_wait_for_bind(vma);
> >>>>
> >>>> -               if (i915_vma_is_pinned(vma))
> >>>> +               if (i915_vma_is_pinned(vma) || !i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND))
> >>>>                         continue;
> >>>>
> >>>> -               if (!i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND)) {
> >>>> -                       __i915_vma_evict(vma);
> >>>> -                       drm_mm_remove_node(&vma->node);
> >>>> +               /* unlikely to race when GPU is idle, so no worry about slowpath.. */
> >>>> +               if (!i915_gem_object_trylock(obj, NULL)) {
> >>>> +                       atomic_set(&vm->open, open);
> >>> Does this need a comment about barriers?
> >> Not sure, it's guarded by vm->mutex.
> >>>> +
> >>>> +                       i915_gem_object_get(obj);
> >>> Should this not be kref_get_unless_zero? Assuming the vm->mutex is the
> >>> only thing keeping the object alive here, won't this lead to potential
> >>> uaf/double-free or something? Also should we not plonk this before the
> >>> trylock? Or maybe I'm missing something here?
> >> Normally you're correct, this is normally the case, but we drain freed objects and this path should only be run during s/r, at which point userspace should be dead, GPU idle, and we just drained all freed objects above.
> >>
> >> It would be a bug if we still found a dead object, as nothing should be running.
> > Hmm, Ok. So why do we expect the trylock to ever fail here? Who else
> > can grab it at this stage?
> It probably shouldn't, should probably be a WARN if it happens.

r-b with that then.

> >>>> +                       mutex_unlock(&vm->mutex);
> >>>> +
> >>>> +                       i915_gem_object_lock(obj, NULL);
> >>>> +                       open = i915_vma_unbind(vma);
> >>>> +                       i915_gem_object_unlock(obj);
> >>>> +
> >>>> +                       GEM_WARN_ON(open);
> >>>> +
> >>>> +                       i915_gem_object_put(obj);
> >>>> +                       goto retry;
> >>>>                 }
> >>>> +
> >>>> +               i915_vma_wait_for_bind(vma);
> >>> We also call wait_for_bind above, is that intentional?
> >> Should be harmless, but first one should probably be removed. :)
> >>
>

  reply	other threads:[~2021-12-09 16:58 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-29 13:47 [PATCH v2 00/16] drm/i915: Remove short term pins from execbuf Maarten Lankhorst
2021-11-29 13:47 ` [Intel-gfx] " Maarten Lankhorst
2021-11-29 13:47 ` [PATCH v2 01/16] drm/i915: Remove unused bits of i915_vma/active api Maarten Lankhorst
2021-11-29 13:47   ` [Intel-gfx] " Maarten Lankhorst
2021-11-29 13:47 ` [PATCH v2 02/16] drm/i915: Change shrink ordering to use locking around unbinding Maarten Lankhorst
2021-11-29 13:47   ` [Intel-gfx] " Maarten Lankhorst
2021-11-29 13:47 ` [PATCH v2 03/16] drm/i915: Remove pages_mutex and intel_gtt->vma_ops.set/clear_pages members, v2 Maarten Lankhorst
2021-11-29 13:47   ` [Intel-gfx] " Maarten Lankhorst
2021-12-06 13:13   ` Matthew Auld
2021-12-06 15:18     ` Maarten Lankhorst
2021-12-06 17:00       ` Matthew Auld
2021-12-07 18:15         ` Daniel Vetter
2021-12-06 17:10   ` Matthew Auld
2021-12-07 10:06     ` Maarten Lankhorst
2021-12-07 10:45       ` Matthew Auld
2021-11-29 13:47 ` [PATCH v2 04/16] drm/i915: Take object lock in i915_ggtt_pin if ww is not set Maarten Lankhorst
2021-11-29 13:47   ` [Intel-gfx] " Maarten Lankhorst
2021-12-06 13:18   ` Matthew Auld
2021-12-06 13:18     ` [Intel-gfx] " Matthew Auld
2021-11-29 13:47 ` [PATCH v2 05/16] drm/i915: Force ww lock for i915_gem_object_ggtt_pin_ww Maarten Lankhorst
2021-11-29 13:47   ` [Intel-gfx] " Maarten Lankhorst
2021-11-30  9:20   ` [PATCH] drm/i915: Force ww lock for i915_gem_object_ggtt_pin_ww, v2 Maarten Lankhorst
2021-11-30  9:20     ` [Intel-gfx] " Maarten Lankhorst
2021-12-01 15:07     ` Matthew Auld
2021-12-01 15:07       ` [Intel-gfx] " Matthew Auld
2021-11-29 13:47 ` [PATCH v2 06/16] drm/i915: Ensure gem_contexts selftests work with unbind changes Maarten Lankhorst
2021-11-29 13:47   ` [Intel-gfx] " Maarten Lankhorst
2021-12-07 10:44   ` Matthew Auld
2021-12-07 10:44     ` [Intel-gfx] " Matthew Auld
2021-12-08 13:20     ` Maarten Lankhorst
2021-12-08 13:20       ` [Intel-gfx] " Maarten Lankhorst
2021-11-29 13:47 ` [PATCH v2 07/16] drm/i915: Take trylock during eviction, v2 Maarten Lankhorst
2021-11-29 13:47   ` [Intel-gfx] " Maarten Lankhorst
2021-12-07 11:01   ` Matthew Auld
2021-12-08 13:28     ` Maarten Lankhorst
2021-11-29 13:47 ` [PATCH v2 08/16] drm/i915: Pass trylock context to callers Maarten Lankhorst
2021-11-29 13:47   ` [Intel-gfx] " Maarten Lankhorst
2021-12-07 14:26   ` Matthew Auld
2021-12-07 14:26     ` [Intel-gfx] " Matthew Auld
2021-11-29 13:47 ` [PATCH v2 09/16] drm/i915: Ensure i915_vma tests do not get -ENOSPC with the locking changes Maarten Lankhorst
2021-11-29 13:47   ` [Intel-gfx] " Maarten Lankhorst
2021-12-08 11:49   ` Matthew Auld
2021-12-08 12:01     ` Matthew Auld
2021-11-29 13:47 ` [PATCH v2 10/16] drm/i915: Make i915_gem_evict_vm work correctly for already locked objects Maarten Lankhorst
2021-11-29 13:47   ` [Intel-gfx] " Maarten Lankhorst
2021-12-08 12:07   ` Matthew Auld
2021-12-08 13:34     ` Maarten Lankhorst
2021-11-29 13:47 ` [PATCH v2 11/16] drm/i915: Call i915_gem_evict_vm in vm_fault_gtt to prevent new ENOSPC errors Maarten Lankhorst
2021-11-29 13:47   ` [Intel-gfx] " Maarten Lankhorst
2021-12-09 12:17   ` Matthew Auld
2021-12-09 12:59     ` Maarten Lankhorst
2021-11-29 13:47 ` [PATCH v2 12/16] drm/i915: Add i915_vma_unbind_unlocked, and take obj lock for i915_vma_unbind Maarten Lankhorst
2021-11-29 13:47   ` [Intel-gfx] " Maarten Lankhorst
2021-12-09 13:05   ` Matthew Auld
2021-12-09 13:05     ` [Intel-gfx] " Matthew Auld
2021-12-09 13:25     ` Maarten Lankhorst
2021-12-09 13:25       ` Maarten Lankhorst
2021-12-09 13:40       ` [Intel-gfx] " Matthew Auld
2021-12-09 13:40         ` Matthew Auld
2021-12-09 13:45         ` [Intel-gfx] " Maarten Lankhorst
2021-12-09 13:45           ` Maarten Lankhorst
2021-12-09 14:27           ` Matthew Auld [this message]
2021-12-09 14:27             ` [Intel-gfx] " Matthew Auld
2021-11-29 13:47 ` [PATCH v2 13/16] drm/i915: Require object lock when freeing pages during destruction Maarten Lankhorst
2021-11-29 13:47   ` [Intel-gfx] " Maarten Lankhorst
2021-11-29 13:47 ` [PATCH v2 14/16] drm/i915: Remove assert_object_held_shared Maarten Lankhorst
2021-11-29 13:47   ` [Intel-gfx] " Maarten Lankhorst
2021-12-09 13:07   ` Matthew Auld
2021-12-09 13:07     ` [Intel-gfx] " Matthew Auld
2021-11-29 13:47 ` [PATCH v2 15/16] drm/i915: Remove support for unlocked i915_vma unbind Maarten Lankhorst
2021-11-29 13:47   ` [Intel-gfx] " Maarten Lankhorst
2021-11-29 13:47 ` [PATCH v2 16/16] drm/i915: Remove short-term pins from execbuf, v5 Maarten Lankhorst
2021-11-29 13:47   ` [Intel-gfx] " Maarten Lankhorst
2021-12-09 16:22   ` Matthew Auld
2021-12-09 16:22     ` [Intel-gfx] " Matthew Auld
2021-11-29 15:32 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Remove short term pins from execbuf Patchwork
2021-11-29 15:33 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-11-29 15:37 ` [Intel-gfx] ✗ Fi.CI.DOCS: " Patchwork
2021-11-29 16:11 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2021-11-30  8:54 ` [Intel-gfx] [PATCH v2 00/16] " Tvrtko Ursulin
2021-11-30 11:17   ` Maarten Lankhorst
2021-11-30 18:38     ` Tvrtko Ursulin
2021-12-01 11:15       ` Maarten Lankhorst
2021-12-01 13:11         ` Tvrtko Ursulin
2021-11-30 11:18 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Remove short term pins from execbuf. (rev2) Patchwork
2021-11-30 11:19 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-11-30 11:23 ` [Intel-gfx] ✗ Fi.CI.DOCS: " Patchwork
2021-11-30 11:49 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-11-30 14:51 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAM0jSHPdWQ3_ee0dO5gwPK4rOzbLiiKJ-yujwXYEcuPc6C1HTw@mail.gmail.com \
    --to=matthew.william.auld@gmail.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=maarten.lankhorst@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.