From: Daniel Vetter <daniel@ffwll.ch>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>
Cc: amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [PATCH 7/7] drm/amdgpu: rework dma_resv handling
Date: Fri, 11 Jun 2021 17:21:20 +0200 [thread overview]
Message-ID: <YMN/cPvzpRRB3z5B@phenom.ffwll.local> (raw)
In-Reply-To: <3a1eef23-7a9a-9a08-cb41-ce042d0bd36c@gmail.com>
On Fri, Jun 11, 2021 at 12:12:45PM +0200, Christian König wrote:
> Am 11.06.21 um 11:17 schrieb Daniel Vetter:
> > On Thu, Jun 10, 2021 at 11:18:00AM +0200, Christian König wrote:
> > > Drop the workaround and instead implement a better solution.
> > >
> > > Basically we are now chaining all submissions using a dma_fence_chain
> > > container and adding them as exclusive fence to the dma_resv object.
> > >
> > > This way other drivers can still sync to the single exclusive fence
> > > while amdgpu only sync to fences from different processes.
> > >
> > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > ---
> > > drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h | 1 +
> > > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 54 +++++++++++++----
> > > drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 65 ---------------------
> > > drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 3 +-
> > > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
> > > drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 -
> > > 6 files changed, 47 insertions(+), 79 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> > > index a130e766cbdb..c905a4cfc173 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h
> > > @@ -34,6 +34,7 @@ struct amdgpu_fpriv;
> > > struct amdgpu_bo_list_entry {
> > > struct ttm_validate_buffer tv;
> > > struct amdgpu_bo_va *bo_va;
> > > + struct dma_fence_chain *chain;
> > > uint32_t priority;
> > > struct page **user_pages;
> > > bool user_invalidated;
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > index 325e82621467..f6f3029f958d 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > @@ -587,6 +587,20 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
> > > goto out;
> > > }
> > > + amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> > > + struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
> > > +
> > > + e->bo_va = amdgpu_vm_bo_find(vm, bo);
> > > +
> > > + if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) {
> > > + e->chain = dma_fence_chain_alloc();
> > > + if (!e->chain) {
> > > + r = -ENOMEM;
> > > + goto error_validate;
> > > + }
> > > + }
> > > + }
> > > +
> > > amdgpu_cs_get_threshold_for_moves(p->adev, &p->bytes_moved_threshold,
> > > &p->bytes_moved_vis_threshold);
> > > p->bytes_moved = 0;
> > > @@ -614,15 +628,6 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
> > > gws = p->bo_list->gws_obj;
> > > oa = p->bo_list->oa_obj;
> > > - amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> > > - struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
> > > -
> > > - /* Make sure we use the exclusive slot for shared BOs */
> > > - if (bo->prime_shared_count)
> > > - e->tv.num_shared = 0;
> > > - e->bo_va = amdgpu_vm_bo_find(vm, bo);
> > > - }
> > > -
> > > if (gds) {
> > > p->job->gds_base = amdgpu_bo_gpu_offset(gds) >> PAGE_SHIFT;
> > > p->job->gds_size = amdgpu_bo_size(gds) >> PAGE_SHIFT;
> > > @@ -644,8 +649,13 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
> > > }
> > > error_validate:
> > > - if (r)
> > > + if (r) {
> > > + amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> > > + dma_fence_chain_free(e->chain);
> > > + e->chain = NULL;
> > > + }
> > > ttm_eu_backoff_reservation(&p->ticket, &p->validated);
> > > + }
> > > out:
> > > return r;
> > > }
> > > @@ -685,9 +695,17 @@ static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser *parser, int error,
> > > {
> > > unsigned i;
> > > - if (error && backoff)
> > > + if (error && backoff) {
> > > + struct amdgpu_bo_list_entry *e;
> > > +
> > > + amdgpu_bo_list_for_each_entry(e, parser->bo_list) {
> > > + dma_fence_chain_free(e->chain);
> > > + e->chain = NULL;
> > > + }
> > > +
> > > ttm_eu_backoff_reservation(&parser->ticket,
> > > &parser->validated);
> > > + }
> > > for (i = 0; i < parser->num_post_deps; i++) {
> > > drm_syncobj_put(parser->post_deps[i].syncobj);
> > > @@ -1260,6 +1278,20 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
> > > amdgpu_vm_move_to_lru_tail(p->adev, &fpriv->vm);
> > > + amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> > > + struct dma_resv *resv = e->tv.bo->base.resv;
> > > + struct dma_fence_chain *chain = e->chain;
> > > +
> > > + if (!chain)
> > > + continue;
> > > +
> > > + dma_fence_chain_init(chain, dma_resv_excl_fence(resv),
> > > + dma_fence_get(p->fence), 1);
> > > +
> > > + rcu_assign_pointer(resv->fence_excl, &chain->base);
> > So for safety since this is now driver interface I was thinking off a
> > helper which does this entire dance and _also_ adds the new fence to the
> > shared slots. This way we don't let drivers open-code this rather tricky
> > operation.
>
> Well I only see this as a workaround for amdgpu and don't want to leak it
> into other drivers.
>
> If somebody else wants to adopt it we should probably consider fixing the
> dma_resv object instead.
Well semantically this is exactly what what the sync_file import from
Jason does. And it's kinda also what a lot of the drivers with opt-out
implicit sync should be doing.
So I really think we do need this as a dma_resv primitive in form of a
function drivers can use. Once we have that rolled out and tested some,
then we can fix up the internals of dma_resv to make this case more
efficient.
> > You probably also want to then combine this with
> > ttm_eu_fence_buffer_objects below I think so you're not walking this list
> > 2x.
> >
> > Iow I'd put the fence_chain parameter into struct ttm_validate_buffer and
> > then let ttm_eu_fence_buffer_objects() do this, by calling a new
> > dma_resv_add_shared_excl_fence. Ideally the same helper that also Jason's
> > sync_file import will use. We want to hide the inner workings of dma_resv
> > as much as possible ofc.
> >
> > The other thing I'm wondering is whether this needs to be wrapped in a
> > single seqlock or whether we don't have ordering issues if we split up the
> > update here? Setting the exclusive fence before we also added it to the
> > shared slot can at least violate the "shared fences signal after exclusive
> > if both are present"
>
> Uff, good point.
Plus it's actually fairly tricky thing :-)
-Daniel
>
> > Finally I guess need to sprinkle the manual garbage collector somehwere
> > here too.
>
> That is done automatically when somebody iterates the chain node.
>
> Christian.
>
> >
> > But aside from the interface polish and correctness against races I think
> > this looks solid in the big picture.
> >
> > Cheers, Daniel
> >
> > > + e->chain = NULL;
> > > + }
> > > +
> > > ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence);
> > > mutex_unlock(&p->adev->notifier_lock);
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> > > index c3053b83b80c..23219fc3b28c 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> > > @@ -42,48 +42,6 @@
> > > #include <linux/pci-p2pdma.h>
> > > #include <linux/pm_runtime.h>
> > > -static int
> > > -__dma_resv_make_exclusive(struct dma_resv *obj)
> > > -{
> > > - struct dma_fence **fences;
> > > - unsigned int count;
> > > - int r;
> > > -
> > > - if (!dma_resv_shared_list(obj)) /* no shared fences to convert */
> > > - return 0;
> > > -
> > > - r = dma_resv_get_fences(obj, NULL, &count, &fences);
> > > - if (r)
> > > - return r;
> > > -
> > > - if (count == 0) {
> > > - /* Now that was unexpected. */
> > > - } else if (count == 1) {
> > > - dma_resv_add_excl_fence(obj, fences[0]);
> > > - dma_fence_put(fences[0]);
> > > - kfree(fences);
> > > - } else {
> > > - struct dma_fence_array *array;
> > > -
> > > - array = dma_fence_array_create(count, fences,
> > > - dma_fence_context_alloc(1), 0,
> > > - false);
> > > - if (!array)
> > > - goto err_fences_put;
> > > -
> > > - dma_resv_add_excl_fence(obj, &array->base);
> > > - dma_fence_put(&array->base);
> > > - }
> > > -
> > > - return 0;
> > > -
> > > -err_fences_put:
> > > - while (count--)
> > > - dma_fence_put(fences[count]);
> > > - kfree(fences);
> > > - return -ENOMEM;
> > > -}
> > > -
> > > /**
> > > * amdgpu_dma_buf_attach - &dma_buf_ops.attach implementation
> > > *
> > > @@ -110,24 +68,6 @@ static int amdgpu_dma_buf_attach(struct dma_buf *dmabuf,
> > > if (r < 0)
> > > goto out;
> > > - r = amdgpu_bo_reserve(bo, false);
> > > - if (unlikely(r != 0))
> > > - goto out;
> > > -
> > > - /*
> > > - * We only create shared fences for internal use, but importers
> > > - * of the dmabuf rely on exclusive fences for implicitly
> > > - * tracking write hazards. As any of the current fences may
> > > - * correspond to a write, we need to convert all existing
> > > - * fences on the reservation object into a single exclusive
> > > - * fence.
> > > - */
> > > - r = __dma_resv_make_exclusive(bo->tbo.base.resv);
> > > - if (r)
> > > - goto out;
> > > -
> > > - bo->prime_shared_count++;
> > > - amdgpu_bo_unreserve(bo);
> > > return 0;
> > > out:
> > > @@ -150,9 +90,6 @@ static void amdgpu_dma_buf_detach(struct dma_buf *dmabuf,
> > > struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
> > > struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
> > > - if (attach->dev->driver != adev->dev->driver && bo->prime_shared_count)
> > > - bo->prime_shared_count--;
> > > -
> > > pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
> > > pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
> > > }
> > > @@ -406,8 +343,6 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct dma_buf *dma_buf)
> > > bo = gem_to_amdgpu_bo(gobj);
> > > bo->allowed_domains = AMDGPU_GEM_DOMAIN_GTT;
> > > bo->preferred_domains = AMDGPU_GEM_DOMAIN_GTT;
> > > - if (dma_buf->ops != &amdgpu_dmabuf_ops)
> > > - bo->prime_shared_count = 1;
> > > dma_resv_unlock(resv);
> > > return gobj;
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > > index 1c3e3b608332..5d82120fc160 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> > > @@ -829,7 +829,8 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data,
> > > break;
> > > }
> > > case AMDGPU_GEM_OP_SET_PLACEMENT:
> > > - if (robj->prime_shared_count && (args->value & AMDGPU_GEM_DOMAIN_VRAM)) {
> > > + if (robj->tbo.base.import_attach &&
> > > + args->value & AMDGPU_GEM_DOMAIN_VRAM) {
> > > r = -EINVAL;
> > > amdgpu_bo_unreserve(robj);
> > > break;
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > index 96447e1d4c9c..30951b593809 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > @@ -871,7 +871,7 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 domain,
> > > return -EINVAL;
> > > /* A shared bo cannot be migrated to VRAM */
> > > - if (bo->prime_shared_count || bo->tbo.base.import_attach) {
> > > + if (bo->tbo.base.import_attach) {
> > > if (domain & AMDGPU_GEM_DOMAIN_GTT)
> > > domain = AMDGPU_GEM_DOMAIN_GTT;
> > > else
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> > > index b35962702278..55d7e90eaa72 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> > > @@ -98,7 +98,6 @@ struct amdgpu_bo {
> > > struct ttm_buffer_object tbo;
> > > struct ttm_bo_kmap_obj kmap;
> > > u64 flags;
> > > - unsigned prime_shared_count;
> > > /* per VM structure for page tables and with virtual addresses */
> > > struct amdgpu_vm_bo_base *vm_bo;
> > > /* Constant after initialization */
> > > --
> > > 2.25.1
> > >
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
next prev parent reply other threads:[~2021-06-11 15:21 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-10 9:17 Change how amdgpu stores fences in dma_resv objects Christian König
2021-06-10 9:17 ` [PATCH 1/7] dma-buf: some dma_fence_chain improvements Christian König
2021-06-11 7:49 ` Daniel Vetter
2021-06-10 9:17 ` [PATCH 2/7] dma-buf: add dma_fence_chain_alloc/free Christian König
2021-06-11 7:54 ` Daniel Vetter
2021-06-11 11:48 ` Christian König
2021-06-11 14:40 ` Daniel Vetter
2021-06-10 9:17 ` [PATCH 3/7] dma-buf: add dma_fence_chain_alloc/free self tests Christian König
2021-06-11 7:58 ` Daniel Vetter
2021-06-11 10:04 ` Christian König
2021-06-10 9:17 ` [PATCH 4/7] dma-buf: add dma_fence_chain_garbage_collect Christian König
2021-06-11 8:58 ` Daniel Vetter
2021-06-11 10:07 ` Christian König
2021-06-11 15:11 ` Daniel Vetter
2021-06-16 17:29 ` kernel test robot
2021-06-16 18:50 ` kernel test robot
2021-06-10 9:17 ` [PATCH 5/7] drm/syncobj: drop the manual garbage collection Christian König
2021-06-10 9:17 ` [PATCH 6/7] drm/amdgpu: unwrap fence chains in the explicit sync fence Christian König
2021-06-11 9:07 ` Daniel Vetter
2021-06-11 10:09 ` Christian König
2021-06-11 15:18 ` Daniel Vetter
2021-06-14 7:25 ` Christian König
2021-06-17 16:51 ` Daniel Vetter
2021-06-10 9:18 ` [PATCH 7/7] drm/amdgpu: rework dma_resv handling Christian König
2021-06-11 9:17 ` Daniel Vetter
2021-06-11 10:12 ` Christian König
2021-06-11 15:21 ` Daniel Vetter [this message]
2021-06-10 16:34 ` Change how amdgpu stores fences in dma_resv objects Michel Dänzer
2021-06-10 16:44 ` Christian König
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YMN/cPvzpRRB3z5B@phenom.ffwll.local \
--to=daniel@ffwll.ch \
--cc=amd-gfx@lists.freedesktop.org \
--cc=ckoenig.leichtzumerken@gmail.com \
--cc=dri-devel@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).