dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>
Cc: "Rob Clark" <robdclark@chromium.org>,
	"Daniel Stone" <daniels@collabora.com>,
	"Michel Dänzer" <michel@daenzer.net>,
	"Intel Graphics Development" <intel-gfx@lists.freedesktop.org>,
	"Kevin Wang" <kevin1.wang@amd.com>,
	"DRI Development" <dri-devel@lists.freedesktop.org>,
	"moderated list:DMA BUFFER SHARING FRAMEWORK"
	<linaro-mm-sig@lists.linaro.org>,
	"Luben Tuikov" <luben.tuikov@amd.com>,
	"Kristian H . Kristensen" <hoegsberg@google.com>,
	"Chen Li" <chenli@uniontech.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Daniel Vetter" <daniel.vetter@intel.com>,
	mesa-dev <mesa-dev@lists.freedesktop.org>,
	"Christian König" <christian.koenig@amd.com>,
	"Dennis Li" <Dennis.Li@amd.com>,
	"Deepak R Varma" <mh12gx2825@gmail.com>
Subject: Re: [Mesa-dev] [PATCH 01/11] drm/amdgpu: Comply with implicit fencing rules
Date: Wed, 26 May 2021 15:51:45 +0200	[thread overview]
Message-ID: <CAKMK7uFQczzpkdSLB1H-dVySGTiap2ONETZCaz5ErE1sca8YWQ@mail.gmail.com> (raw)
In-Reply-To: <fe633922-53a4-2409-8697-d815650c65ac@gmail.com>

On Wed, May 26, 2021 at 3:32 PM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 25.05.21 um 17:23 schrieb Daniel Vetter:
> > On Tue, May 25, 2021 at 5:05 PM Christian König
> > <ckoenig.leichtzumerken@gmail.com> wrote:
> >> Hi Daniel,
> >>
> >> Am 25.05.21 um 15:05 schrieb Daniel Vetter:
> >>> Hi Christian,
> >>>
> >>> On Sat, May 22, 2021 at 10:30:19AM +0200, Christian König wrote:
> >>>> Am 21.05.21 um 20:31 schrieb Daniel Vetter:
> >>>> This works by adding the fence of the last eviction DMA operation to BOs
> >>>> when their backing store is newly allocated. That's what the
> >>>> ttm_bo_add_move_fence() function you stumbled over is good for: https://elixir.bootlin.com/linux/v5.13-rc2/source/drivers/gpu/drm/ttm/ttm_bo.c#L692
> >>>>
> >>>> Now the problem is it is possible that the application is terminated before
> >>>> it can complete it's command submission. But since resource management only
> >>>> waits for the shared fences when there are some there is a chance that we
> >>>> free up memory while it is still in use.
> >>> Hm where is this code? Would help in my audit that I wanted to do this
> >>> week? If you look at most other places like
> >>> drm_gem_fence_array_add_implicit() I mentioned earlier, then we don't
> >>> treat the shared fences special and always also include the exclusive one.
> >> See amdgpu_gem_object_close():
> >>
> >> ...
> >>           fence = dma_resv_get_excl(bo->tbo.base.resv);
> >>           if (fence) {
> >>                   amdgpu_bo_fence(bo, fence, true);
> >>                   fence = NULL;
> >>           }
> >> ...
> >>
> >> We explicitly added that because resource management of some other
> >> driver was going totally bananas without that.
> >>
> >> But I'm not sure which one that was. Maybe dig a bit in the git and
> >> mailing history of that.
> > Hm I looked and it's
> >
> > commit 82c416b13cb7d22b96ec0888b296a48dff8a09eb
> > Author: Christian König <christian.koenig@amd.com>
> > Date:   Thu Mar 12 12:03:34 2020 +0100
> >
> >     drm/amdgpu: fix and cleanup amdgpu_gem_object_close v4
> >
> > That sounded more like amdgpu itself needing this, not another driver?
>
> No, that patch was just a follow up moving the functionality around.

That patch added the "add exclusive fence to shared slots before
amdgpu_vm_clear_freed() call", which I thought was at least part of
your fix.

> > But looking at amdgpu_vm_bo_update_mapping() it seems to pick the
> > right fencing mode for gpu pte clearing, so I'm really not sure what
> > the bug was that you worked around here?The implementation boils down
> > to amdgpu_sync_resv() which syncs for the exclusive fence, always. And
> > there's nothing else that I could find in public history at least, no
> > references to bug reports or anything. I think you need to dig
> > internally, because as-is I'm not seeing the problem here.
> >
> > Or am I missing something here?
>
> See the code here for example:
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/nouveau/nouveau_fence.c#L361
>
> Nouveau assumes that when a shared fence is present it doesn't need to
> wait for the exclusive one because the shared are always supposed to
> finish after the exclusive one.
>
> But for page table unmap fences that isn't true and we ran into a really
> nasty and hard to reproduce bug because of this.
>
> I think it would be much more defensive if we could say that we always
> wait for the exclusive fence and fix the use case in nouveau and double
> check if somebody else does stuff like that as well.

Yeah most other drivers do the defensive thing here. I think it would
be good to standardize on that. I'll add that to my list and do more
auditing.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

  reply	other threads:[~2021-05-26 13:52 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-21  9:09 [PATCH 01/11] drm/amdgpu: Comply with implicit fencing rules Daniel Vetter
2021-05-21  9:09 ` [PATCH 02/11] drm/panfrost: Remove sched_lock Daniel Vetter
2021-05-21  9:32   ` Lucas Stach
2021-05-21 14:49     ` Daniel Vetter
2021-05-21  9:09 ` [PATCH 03/11] drm/panfrost: Use xarray and helpers for depedency tracking Daniel Vetter
2021-06-02 14:06   ` Steven Price
2021-06-02 18:51     ` Daniel Vetter
2021-06-03  7:48       ` Steven Price
2021-05-21  9:09 ` [PATCH 04/11] drm/panfrost: Fix implicit sync Daniel Vetter
2021-05-21 12:22   ` Daniel Stone
2021-05-21 12:28     ` [Linaro-mm-sig] " Christian König
2021-05-21 12:54       ` Daniel Stone
2021-05-21 13:09         ` Christian König
2021-05-21 13:23           ` Daniel Stone
2021-05-21  9:09 ` [PATCH 05/11] drm/atomic-helper: make drm_gem_plane_helper_prepare_fb the default Daniel Vetter
2021-05-21  9:09 ` [PATCH 06/11] drm/<driver>: drm_gem_plane_helper_prepare_fb is now " Daniel Vetter
2021-05-21  9:38   ` Lucas Stach
2021-05-21 12:20   ` Heiko Stübner
2021-05-21 12:22   ` Paul Cercueil
2021-05-21 15:53   ` Jernej Škrabec
2021-05-21 23:18   ` Chun-Kuang Hu
2021-05-23 12:17   ` Martin Blumenstingl
2021-05-24  7:54   ` Tomi Valkeinen
2021-05-28  9:55   ` Philippe CORNU
2021-05-21  9:09 ` [PATCH 07/11] drm/armada: Remove prepare/cleanup_fb hooks Daniel Vetter
2021-05-21  9:09 ` [PATCH 08/11] drm/vram-helpers: Create DRM_GEM_VRAM_PLANE_HELPER_FUNCS Daniel Vetter
2021-05-21  9:33   ` tiantao (H)
2021-05-21  9:09 ` [PATCH 09/11] drm/omap: Follow implicit fencing in prepare_fb Daniel Vetter
2021-05-24  7:53   ` Tomi Valkeinen
2021-05-21  9:09 ` [PATCH 10/11] drm/simple-helper: drm_gem_simple_display_pipe_prepare_fb as default Daniel Vetter
2021-05-25 17:48   ` Noralf Trønnes
2021-05-25 17:53     ` Daniel Vetter
2021-05-21  9:09 ` [PATCH 11/11] drm/tiny: drm_gem_simple_display_pipe_prepare_fb is the default Daniel Vetter
2021-05-21 13:41   ` David Lechner
2021-05-21 14:09   ` Noralf Trønnes
2021-05-25 16:05     ` Daniel Vetter
2021-05-21 14:13   ` Oleksandr Andrushchenko
2021-05-28  0:38   ` Linus Walleij
2021-05-21  9:46 ` [PATCH 01/11] drm/amdgpu: Comply with implicit fencing rules Bas Nieuwenhuizen
2021-05-21 14:37   ` Daniel Vetter
2021-05-21 15:00     ` Bas Nieuwenhuizen
2021-05-21 15:16       ` Daniel Vetter
2021-05-21 18:08         ` [Mesa-dev] " Christian König
2021-05-21 18:31           ` Daniel Vetter
2021-05-22  8:30             ` Christian König
2021-05-25 13:05               ` Daniel Vetter
2021-05-25 15:05                 ` Christian König
2021-05-25 15:23                   ` Daniel Vetter
2021-05-26 13:32                     ` Christian König
2021-05-26 13:51                       ` Daniel Vetter [this message]
2021-05-21 11:22 ` Christian König
2021-05-21 14:58 ` [Mesa-dev] " Rob Clark
2021-05-21 14:58   ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKMK7uFQczzpkdSLB1H-dVySGTiap2ONETZCaz5ErE1sca8YWQ@mail.gmail.com \
    --to=daniel@ffwll.ch \
    --cc=Dennis.Li@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=chenli@uniontech.com \
    --cc=christian.koenig@amd.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=daniel.vetter@intel.com \
    --cc=daniels@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hoegsberg@google.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=kevin1.wang@amd.com \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=luben.tuikov@amd.com \
    --cc=mesa-dev@lists.freedesktop.org \
    --cc=mh12gx2825@gmail.com \
    --cc=michel@daenzer.net \
    --cc=robdclark@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).