All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Robert Beckett <bob.beckett@collabora.com>,
	 intel-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org
Cc: matthew.auld@intel.com
Subject: Re: [Intel-gfx] [PATCH v7 4/6] drm/i915: Use vma resources for async unbinding
Date: Tue, 18 Jan 2022 10:06:26 +0100	[thread overview]
Message-ID: <563dd04b0f29f9e30b47e230d40619c65732c67b.camel@linux.intel.com> (raw)
In-Reply-To: <28186e6b-513b-d380-d72f-0c05862e4a73@collabora.com>

Hi,

On Mon, 2022-01-17 at 21:07 +0000, Robert Beckett wrote:
> 
> 
> On 10/01/2022 17:22, Thomas Hellström wrote:
> > Implement async (non-blocking) unbinding by not syncing the vma
> > before
> > calling unbind on the vma_resource.
> > Add the resulting unbind fence to the object's dma_resv from where
> > it is
> > picked up by the ttm migration code.
> > Ideally these unbind fences should be coalesced with the migration
> > blit
> > fence to avoid stalling the migration blit waiting for unbind, as
> > they
> > can certainly go on in parallel, but since we don't yet have a
> > reasonable data structure to use to coalesce fences and attach the
> > resulting fence to a timeline, we defer that for now.
> > 
> > Note that with async unbinding, even while the unbind waits for the
> > preceding bind to complete before unbinding, the vma itself might
> > have been
> > destroyed in the process, clearing the vma pages. Therefore we can
> > only allow async unbinding if we have a refcounted sg-list and keep
> > a
> > refcount on that for the vma resource pages to stay intact until
> > binding occurs. If this condition is not met, a request for an
> > async
> > unbind is diverted to a sync unbind.
> > 
> > v2:
> > - Use a separate kmem_cache for vma resources for now to isolate
> > their
> >    memory allocation and aid debugging.
> > - Move the check for vm closed to the actual unbinding thread.
> > Regardless
> >    of whether the vm is closed, we need the unbind fence to
> > properly wait
> >    for capture.
> > - Clear vma_res::vm on unbind and update its documentation.
> > v4:
> > - Take cache coloring into account when searching for vma resources
> >    pending unbind. (Matthew Auld)
> > v5:
> > - Fix timeout and error check in
> > i915_vma_resource_bind_dep_await().
> > - Avoid taking a reference on the object for async binding if
> >    async unbind capable.
> > - Fix braces around a single-line if statement.
> > v6:
> > - Fix up the cache coloring adjustment. (Kernel test robot
> > <lkp@intel.com>)
> > - Don't allow async unbinding if the vma_res pages are not the same
> > as
> >    the object pages. (Matthew Auld)
> > v7:
> > - s/unsigned long/u64/ in a number of places (Matthew Auld)
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > Reviewed-by: Matthew Auld <matthew.auld@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c |  11 +-
> >   drivers/gpu/drm/i915/gt/intel_ggtt.c         |   2 +-
> >   drivers/gpu/drm/i915/gt/intel_gtt.c          |   4 +
> >   drivers/gpu/drm/i915/gt/intel_gtt.h          |   3 +
> >   drivers/gpu/drm/i915/i915_drv.h              |   1 +
> >   drivers/gpu/drm/i915/i915_gem.c              |  12 +-
> >   drivers/gpu/drm/i915/i915_module.c           |   3 +
> >   drivers/gpu/drm/i915/i915_vma.c              | 205 +++++++++--
> >   drivers/gpu/drm/i915/i915_vma.h              |   3 +-
> >   drivers/gpu/drm/i915/i915_vma_resource.c     | 354
> > +++++++++++++++++--
> >   drivers/gpu/drm/i915/i915_vma_resource.h     |  48 +++
> >   11 files changed, 579 insertions(+), 67 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> > b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> > index 8653855d808b..1de306c03aaf 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
> > @@ -142,7 +142,16 @@ int i915_ttm_move_notify(struct
> > ttm_buffer_object *bo)
> >         struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
> >         int ret;
> >   
> > -       ret = i915_gem_object_unbind(obj,
> > I915_GEM_OBJECT_UNBIND_ACTIVE);
> > +       /*
> > +        * Note: The async unbinding here will actually transform
> > the
> > +        * blocking wait for unbind into a wait before finally
> > submitting
> > +        * evict / migration blit and thus stall the migration
> > timeline
> > +        * which may not be good for overall throughput. We should
> > make
> > +        * sure we await the unbind fences *after* the migration
> > blit
> > +        * instead of *before* as we currently do.
> > +        */
> > +       ret = i915_gem_object_unbind(obj,
> > I915_GEM_OBJECT_UNBIND_ACTIVE |
> > +                                    I915_GEM_OBJECT_UNBIND_ASYNC);
> >         if (ret)
> >                 return ret;
> >   
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > index e49b6250c4b7..a1b2761bc16e 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > @@ -142,7 +142,7 @@ void i915_ggtt_suspend_vm(struct
> > i915_address_space *vm)
> >                         continue;
> >   
> >                 if (!i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND))
> > {
> > -                       __i915_vma_evict(vma);
> > +                       __i915_vma_evict(vma, false);
> >                         drm_mm_remove_node(&vma->node);
> >                 }
> >         }
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > index a94be0306464..46be4197b93f 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
> > @@ -161,6 +161,9 @@ static void __i915_vm_release(struct
> > work_struct *work)
> >         struct i915_address_space *vm =
> >                 container_of(work, struct i915_address_space,
> > release_work);
> >   
> > +       /* Synchronize async unbinds. */
> > +       i915_vma_resource_bind_dep_sync_all(vm);
> > +
> >         vm->cleanup(vm);
> >         i915_address_space_fini(vm);
> >   
> > @@ -189,6 +192,7 @@ void i915_address_space_init(struct
> > i915_address_space *vm, int subclass)
> >         if (!kref_read(&vm->resv_ref))
> >                 kref_init(&vm->resv_ref);
> >   
> > +       vm->pending_unbind = RB_ROOT_CACHED;
> >         INIT_WORK(&vm->release_work, __i915_vm_release);
> >         atomic_set(&vm->open, 1);
> >   
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > index 676b839d1a34..8073438b67c8 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > @@ -265,6 +265,9 @@ struct i915_address_space {
> >         /* Flags used when creating page-table objects for this vm
> > */
> >         unsigned long lmem_pt_obj_flags;
> >   
> > +       /* Interval tree for pending unbind vma resources */
> > +       struct rb_root_cached pending_unbind;
> > +
> >         struct drm_i915_gem_object *
> >                 (*alloc_pt_dma)(struct i915_address_space *vm, int
> > sz);
> >         struct drm_i915_gem_object *
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > b/drivers/gpu/drm/i915/i915_drv.h
> > index a44e0c3298fc..6caec2eca8cd 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1661,6 +1661,7 @@ int i915_gem_object_unbind(struct
> > drm_i915_gem_object *obj,
> >   #define I915_GEM_OBJECT_UNBIND_BARRIER BIT(1)
> >   #define I915_GEM_OBJECT_UNBIND_TEST BIT(2)
> >   #define I915_GEM_OBJECT_UNBIND_VM_TRYLOCK BIT(3)
> > +#define I915_GEM_OBJECT_UNBIND_ASYNC BIT(4)
> >   
> >   void i915_gem_runtime_suspend(struct drm_i915_private *dev_priv);
> >   
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index 3729ec01b5bc..7188db42c85b 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -157,10 +157,16 @@ int i915_gem_object_unbind(struct
> > drm_i915_gem_object *obj,
> >                 spin_unlock(&obj->vma.lock);
> >   
> >                 if (vma) {
> > +                       bool vm_trylock = !!(flags &
> > I915_GEM_OBJECT_UNBIND_VM_TRYLOCK);
> >                         ret = -EBUSY;
> > -                       if (flags & I915_GEM_OBJECT_UNBIND_ACTIVE
> > ||
> > -                           !i915_vma_is_active(vma)) {
> > -                               if (flags &
> > I915_GEM_OBJECT_UNBIND_VM_TRYLOCK) {
> > +                       if (flags & I915_GEM_OBJECT_UNBIND_ASYNC) {
> > +                               assert_object_held(vma->obj);
> > +                               ret = i915_vma_unbind_async(vma,
> > vm_trylock);
> > +                       }
> > +
> > +                       if (ret == -EBUSY && (flags &
> > I915_GEM_OBJECT_UNBIND_ACTIVE ||
> > +                                            
> > !i915_vma_is_active(vma))) {
> > +                               if (vm_trylock) {
> >                                         if (mutex_trylock(&vma->vm-
> > >mutex)) {
> >                                                 ret =
> > __i915_vma_unbind(vma);
> >                                                 mutex_unlock(&vma-
> > >vm->mutex);
> > diff --git a/drivers/gpu/drm/i915/i915_module.c
> > b/drivers/gpu/drm/i915/i915_module.c
> > index f6bcd2f89257..a8f175960b34 100644
> > --- a/drivers/gpu/drm/i915/i915_module.c
> > +++ b/drivers/gpu/drm/i915/i915_module.c
> > @@ -17,6 +17,7 @@
> >   #include "i915_scheduler.h"
> >   #include "i915_selftest.h"
> >   #include "i915_vma.h"
> > +#include "i915_vma_resource.h"
> >   
> >   static int i915_check_nomodeset(void)
> >   {
> > @@ -64,6 +65,8 @@ static const struct {
> >           .exit = i915_scheduler_module_exit },
> >         { .init = i915_vma_module_init,
> >           .exit = i915_vma_module_exit },
> > +       { .init = i915_vma_resource_module_init,
> > +         .exit = i915_vma_resource_module_exit },
> >         { .init = i915_mock_selftests },
> >         { .init = i915_pmu_init,
> >           .exit = i915_pmu_exit },
> > diff --git a/drivers/gpu/drm/i915/i915_vma.c
> > b/drivers/gpu/drm/i915/i915_vma.c
> > index 29c770a764aa..5881b0713b1a 100644
> > --- a/drivers/gpu/drm/i915/i915_vma.c
> > +++ b/drivers/gpu/drm/i915/i915_vma.c
> > @@ -286,9 +286,10 @@ struct i915_vma_work {
> >         struct dma_fence_work base;
> >         struct i915_address_space *vm;
> >         struct i915_vm_pt_stash stash;
> > -       struct i915_vma *vma;
> > +       struct i915_vma_resource *vma_res;
> >         struct drm_i915_gem_object *pinned;
> >         struct i915_sw_dma_fence_cb cb;
> > +       struct i915_refct_sgt *rsgt;
> >         enum i915_cache_level cache_level;
> >         unsigned int flags;
> >   };
> > @@ -296,10 +297,11 @@ struct i915_vma_work {
> >   static void __vma_bind(struct dma_fence_work *work)
> >   {
> >         struct i915_vma_work *vw = container_of(work, typeof(*vw),
> > base);
> > -       struct i915_vma *vma = vw->vma;
> > +       struct i915_vma_resource *vma_res = vw->vma_res;
> > +
> > +       vma_res->ops->bind_vma(vma_res->vm, &vw->stash,
> > +                              vma_res, vw->cache_level, vw-
> > >flags);
> >   
> > -       vma->ops->bind_vma(vw->vm, &vw->stash,
> > -                          vma->resource, vw->cache_level, vw-
> > >flags);
> >   }
> >   
> >   static void __vma_release(struct dma_fence_work *work)
> > @@ -311,6 +313,10 @@ static void __vma_release(struct
> > dma_fence_work *work)
> >   
> >         i915_vm_free_pt_stash(vw->vm, &vw->stash);
> >         i915_vm_put(vw->vm);
> > +       if (vw->vma_res)
> > +               i915_vma_resource_put(vw->vma_res);
> > +       if (vw->rsgt)
> > +               i915_refct_sgt_put(vw->rsgt);
> >   }
> >   
> >   static const struct dma_fence_work_ops bind_ops = {
> > @@ -380,13 +386,11 @@ i915_vma_resource_init_from_vma(struct
> > i915_vma_resource *vma_res,
> >   {
> >         struct drm_i915_gem_object *obj = vma->obj;
> >   
> > -       i915_vma_resource_init(vma_res, vma->pages, &vma-
> > >page_sizes,
> > +       i915_vma_resource_init(vma_res, vma->vm, vma->pages, &vma-
> > >page_sizes,
> >                                i915_gem_object_is_readonly(obj),
> >                                i915_gem_object_is_lmem(obj),
> > -                              vma->private,
> > -                              vma->node.start,
> > -                              vma->node.size,
> > -                              vma->size);
> > +                              vma->ops, vma->private, vma-
> > >node.start,
> > +                              vma->node.size, vma->size);
> >   }
> >   
> >   /**
> > @@ -410,6 +414,7 @@ int i915_vma_bind(struct i915_vma *vma,
> >   {
> >         u32 bind_flags;
> >         u32 vma_flags;
> > +       int ret;
> >   
> >         lockdep_assert_held(&vma->vm->mutex);
> >         GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
> > @@ -418,12 +423,12 @@ int i915_vma_bind(struct i915_vma *vma,
> >         if (GEM_DEBUG_WARN_ON(range_overflows(vma->node.start,
> >                                               vma->node.size,
> >                                               vma->vm->total))) {
> > -               kfree(vma_res);
> > +               i915_vma_resource_free(vma_res);
> >                 return -ENODEV;
> >         }
> >   
> >         if (GEM_DEBUG_WARN_ON(!flags)) {
> > -               kfree(vma_res);
> > +               i915_vma_resource_free(vma_res);
> >                 return -EINVAL;
> >         }
> >   
> > @@ -435,12 +440,30 @@ int i915_vma_bind(struct i915_vma *vma,
> >   
> >         bind_flags &= ~vma_flags;
> >         if (bind_flags == 0) {
> > -               kfree(vma_res);
> > +               i915_vma_resource_free(vma_res);
> >                 return 0;
> >         }
> >   
> >         GEM_BUG_ON(!atomic_read(&vma->pages_count));
> >   
> > +       /* Wait for or await async unbinds touching our range */
> > +       if (work && bind_flags & vma->vm->bind_async_flags)
> > +               ret = i915_vma_resource_bind_dep_await(vma->vm,
> > +                                                      &work-
> > >base.chain,
> > +                                                      vma-
> > >node.start,
> > +                                                      vma-
> > >node.size,
> > +                                                      true,
> > +                                                      GFP_NOWAIT |
> > +                                                     
> > __GFP_RETRY_MAYFAIL |
> > +                                                     
> > __GFP_NOWARN);
> > +       else
> > +               ret = i915_vma_resource_bind_dep_sync(vma->vm, vma-
> > >node.start,
> > +                                                     vma-
> > >node.size, true);
> > +       if (ret) {
> > +               i915_vma_resource_free(vma_res);
> > +               return ret;
> > +       }
> > +
> >         if (vma->resource || !vma_res) {
> >                 /* Rebinding with an additional I915_VMA_*_BIND */
> >                 GEM_WARN_ON(!vma_flags);
> 
> You missed an extra `kfree(vma_res)` on the next line from here (not
> in 
> patch context), which would presumably need to be 
> `i915_vma_resource_free(vma_res)`

It should, thanks. I see I've missed that in an error path as well, as
you mention below.

> 
> Further to that, you are freeing it where it can already be known to
> be 
> null, which, while relying on known zero side effect of double free
> will 
> work, looks very unintuitive.

This stems from NULL checking before kfree() is typically not done.
IIRC even checkpatch.pl warns if doing that, but with transition to
kmem_cache_free() that doesn't hold anymore so I need to fix this up.

> 
> Firther to that, once this series is applied, `i915_vma_bind` only 
> appears to be called from `eb_relocate_entry()` with `vma_res` always
> null, and from `i915_vma_pin_ww()` where it allocates `vma_res`.
> 
> It would be much cleaner to just allocate the `vma_res` in 
> `i915_vma_bind` if required. That way it will avoid the null
> asignment 
> on return to `i915_vma_pin_ww` with the equally unintuitive 
> `kfree(vma_res)` during the error fallthrough path at the end of 
> `i915_vma_pin_ww` on a knowingly null pointer (also that would need
> to 
> be a `i915_vma_resource_free` if you keep it the way it is).

We can't allocate (GFP_KERNEL) under the vm mutex, hence need to pass
it as an argument.

Thanks,
Thomas



  reply	other threads:[~2022-01-18  9:06 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-10 17:22 [PATCH v7 0/6] drm/i915: Asynchronous vma unbinding Thomas Hellström
2022-01-10 17:22 ` [Intel-gfx] " Thomas Hellström
2022-01-10 17:22 ` [PATCH v7 1/6] drm/i915: Initial introduction of vma resources Thomas Hellström
2022-01-10 17:22   ` [Intel-gfx] " Thomas Hellström
2022-01-10 17:22 ` [PATCH v7 2/6] drm/i915: Use the vma resource as argument for gtt binding / unbinding Thomas Hellström
2022-01-10 17:22   ` [Intel-gfx] " Thomas Hellström
2022-01-10 17:22 ` [PATCH v7 3/6] drm/i915: Don't pin the object pages during pending vma binds Thomas Hellström
2022-01-10 17:22   ` [Intel-gfx] " Thomas Hellström
2022-01-10 17:22 ` [PATCH v7 4/6] drm/i915: Use vma resources for async unbinding Thomas Hellström
2022-01-10 17:22   ` [Intel-gfx] " Thomas Hellström
2022-01-17 21:07   ` Robert Beckett
2022-01-18  9:06     ` Thomas Hellström [this message]
2022-01-10 17:22 ` [PATCH v7 5/6] drm/i915: Asynchronous migration selftest Thomas Hellström
2022-01-10 17:22   ` [Intel-gfx] " Thomas Hellström
2022-01-10 17:22 ` [PATCH v7 6/6] drm/i915: Use struct vma_resource instead of struct vma_snapshot Thomas Hellström
2022-01-10 17:22   ` [Intel-gfx] " Thomas Hellström
2022-01-10 21:10 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Asynchronous vma unbinding (rev7) Patchwork
2022-01-10 21:12 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-01-10 21:42 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-01-11  4:07 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=563dd04b0f29f9e30b47e230d40619c65732c67b.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=bob.beckett@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.