From: Daniel Vetter <daniel@ffwll.ch> To: "Christian König" <ckoenig.leichtzumerken@gmail.com> Cc: "Rob Clark" <robdclark@chromium.org>, "Daniel Stone" <daniels@collabora.com>, "Michel Dänzer" <michel@daenzer.net>, "Intel Graphics Development" <intel-gfx@lists.freedesktop.org>, "Kevin Wang" <kevin1.wang@amd.com>, "DRI Development" <dri-devel@lists.freedesktop.org>, "moderated list:DMA BUFFER SHARING FRAMEWORK" <linaro-mm-sig@lists.linaro.org>, "Luben Tuikov" <luben.tuikov@amd.com>, "Kristian H . Kristensen" <hoegsberg@google.com>, "Chen Li" <chenli@uniontech.com>, "Alex Deucher" <alexander.deucher@amd.com>, "Daniel Vetter" <daniel.vetter@intel.com>, mesa-dev <mesa-dev@lists.freedesktop.org>, "Christian König" <christian.koenig@amd.com>, "Dennis Li" <Dennis.Li@amd.com>, "Deepak R Varma" <mh12gx2825@gmail.com> Subject: Re: [Mesa-dev] [PATCH 01/11] drm/amdgpu: Comply with implicit fencing rules Date: Wed, 26 May 2021 15:51:45 +0200 [thread overview] Message-ID: <CAKMK7uFQczzpkdSLB1H-dVySGTiap2ONETZCaz5ErE1sca8YWQ@mail.gmail.com> (raw) In-Reply-To: <fe633922-53a4-2409-8697-d815650c65ac@gmail.com> On Wed, May 26, 2021 at 3:32 PM Christian König <ckoenig.leichtzumerken@gmail.com> wrote: > > Am 25.05.21 um 17:23 schrieb Daniel Vetter: > > On Tue, May 25, 2021 at 5:05 PM Christian König > > <ckoenig.leichtzumerken@gmail.com> wrote: > >> Hi Daniel, > >> > >> Am 25.05.21 um 15:05 schrieb Daniel Vetter: > >>> Hi Christian, > >>> > >>> On Sat, May 22, 2021 at 10:30:19AM +0200, Christian König wrote: > >>>> Am 21.05.21 um 20:31 schrieb Daniel Vetter: > >>>> This works by adding the fence of the last eviction DMA operation to BOs > >>>> when their backing store is newly allocated. That's what the > >>>> ttm_bo_add_move_fence() function you stumbled over is good for: https://elixir.bootlin.com/linux/v5.13-rc2/source/drivers/gpu/drm/ttm/ttm_bo.c#L692 > >>>> > >>>> Now the problem is it is possible that the application is terminated before > >>>> it can complete it's command submission. But since resource management only > >>>> waits for the shared fences when there are some there is a chance that we > >>>> free up memory while it is still in use. > >>> Hm where is this code? Would help in my audit that I wanted to do this > >>> week? If you look at most other places like > >>> drm_gem_fence_array_add_implicit() I mentioned earlier, then we don't > >>> treat the shared fences special and always also include the exclusive one. > >> See amdgpu_gem_object_close(): > >> > >> ... > >> fence = dma_resv_get_excl(bo->tbo.base.resv); > >> if (fence) { > >> amdgpu_bo_fence(bo, fence, true); > >> fence = NULL; > >> } > >> ... > >> > >> We explicitly added that because resource management of some other > >> driver was going totally bananas without that. > >> > >> But I'm not sure which one that was. Maybe dig a bit in the git and > >> mailing history of that. > > Hm I looked and it's > > > > commit 82c416b13cb7d22b96ec0888b296a48dff8a09eb > > Author: Christian König <christian.koenig@amd.com> > > Date: Thu Mar 12 12:03:34 2020 +0100 > > > > drm/amdgpu: fix and cleanup amdgpu_gem_object_close v4 > > > > That sounded more like amdgpu itself needing this, not another driver? > > No, that patch was just a follow up moving the functionality around. That patch added the "add exclusive fence to shared slots before amdgpu_vm_clear_freed() call", which I thought was at least part of your fix. > > But looking at amdgpu_vm_bo_update_mapping() it seems to pick the > > right fencing mode for gpu pte clearing, so I'm really not sure what > > the bug was that you worked around here?The implementation boils down > > to amdgpu_sync_resv() which syncs for the exclusive fence, always. And > > there's nothing else that I could find in public history at least, no > > references to bug reports or anything. I think you need to dig > > internally, because as-is I'm not seeing the problem here. > > > > Or am I missing something here? > > See the code here for example: > https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/nouveau/nouveau_fence.c#L361 > > Nouveau assumes that when a shared fence is present it doesn't need to > wait for the exclusive one because the shared are always supposed to > finish after the exclusive one. > > But for page table unmap fences that isn't true and we ran into a really > nasty and hard to reproduce bug because of this. > > I think it would be much more defensive if we could say that we always > wait for the exclusive fence and fix the use case in nouveau and double > check if somebody else does stuff like that as well. Yeah most other drivers do the defensive thing here. I think it would be good to standardize on that. I'll add that to my list and do more auditing. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
WARNING: multiple messages have this Message-ID (diff)
From: Daniel Vetter <daniel@ffwll.ch> To: "Christian König" <ckoenig.leichtzumerken@gmail.com> Cc: "Rob Clark" <robdclark@chromium.org>, "Daniel Stone" <daniels@collabora.com>, "Michel Dänzer" <michel@daenzer.net>, "Intel Graphics Development" <intel-gfx@lists.freedesktop.org>, "Kevin Wang" <kevin1.wang@amd.com>, "DRI Development" <dri-devel@lists.freedesktop.org>, "Sumit Semwal" <sumit.semwal@linaro.org>, "moderated list:DMA BUFFER SHARING FRAMEWORK" <linaro-mm-sig@lists.linaro.org>, "Luben Tuikov" <luben.tuikov@amd.com>, "Kristian H . Kristensen" <hoegsberg@google.com>, "Chen Li" <chenli@uniontech.com>, "Bas Nieuwenhuizen" <bas@basnieuwenhuizen.nl>, "Alex Deucher" <alexander.deucher@amd.com>, "Daniel Vetter" <daniel.vetter@intel.com>, mesa-dev <mesa-dev@lists.freedesktop.org>, "Christian König" <christian.koenig@amd.com>, "Dennis Li" <Dennis.Li@amd.com>, "Deepak R Varma" <mh12gx2825@gmail.com> Subject: Re: [Intel-gfx] [Mesa-dev] [PATCH 01/11] drm/amdgpu: Comply with implicit fencing rules Date: Wed, 26 May 2021 15:51:45 +0200 [thread overview] Message-ID: <CAKMK7uFQczzpkdSLB1H-dVySGTiap2ONETZCaz5ErE1sca8YWQ@mail.gmail.com> (raw) In-Reply-To: <fe633922-53a4-2409-8697-d815650c65ac@gmail.com> On Wed, May 26, 2021 at 3:32 PM Christian König <ckoenig.leichtzumerken@gmail.com> wrote: > > Am 25.05.21 um 17:23 schrieb Daniel Vetter: > > On Tue, May 25, 2021 at 5:05 PM Christian König > > <ckoenig.leichtzumerken@gmail.com> wrote: > >> Hi Daniel, > >> > >> Am 25.05.21 um 15:05 schrieb Daniel Vetter: > >>> Hi Christian, > >>> > >>> On Sat, May 22, 2021 at 10:30:19AM +0200, Christian König wrote: > >>>> Am 21.05.21 um 20:31 schrieb Daniel Vetter: > >>>> This works by adding the fence of the last eviction DMA operation to BOs > >>>> when their backing store is newly allocated. That's what the > >>>> ttm_bo_add_move_fence() function you stumbled over is good for: https://elixir.bootlin.com/linux/v5.13-rc2/source/drivers/gpu/drm/ttm/ttm_bo.c#L692 > >>>> > >>>> Now the problem is it is possible that the application is terminated before > >>>> it can complete it's command submission. But since resource management only > >>>> waits for the shared fences when there are some there is a chance that we > >>>> free up memory while it is still in use. > >>> Hm where is this code? Would help in my audit that I wanted to do this > >>> week? If you look at most other places like > >>> drm_gem_fence_array_add_implicit() I mentioned earlier, then we don't > >>> treat the shared fences special and always also include the exclusive one. > >> See amdgpu_gem_object_close(): > >> > >> ... > >> fence = dma_resv_get_excl(bo->tbo.base.resv); > >> if (fence) { > >> amdgpu_bo_fence(bo, fence, true); > >> fence = NULL; > >> } > >> ... > >> > >> We explicitly added that because resource management of some other > >> driver was going totally bananas without that. > >> > >> But I'm not sure which one that was. Maybe dig a bit in the git and > >> mailing history of that. > > Hm I looked and it's > > > > commit 82c416b13cb7d22b96ec0888b296a48dff8a09eb > > Author: Christian König <christian.koenig@amd.com> > > Date: Thu Mar 12 12:03:34 2020 +0100 > > > > drm/amdgpu: fix and cleanup amdgpu_gem_object_close v4 > > > > That sounded more like amdgpu itself needing this, not another driver? > > No, that patch was just a follow up moving the functionality around. That patch added the "add exclusive fence to shared slots before amdgpu_vm_clear_freed() call", which I thought was at least part of your fix. > > But looking at amdgpu_vm_bo_update_mapping() it seems to pick the > > right fencing mode for gpu pte clearing, so I'm really not sure what > > the bug was that you worked around here?The implementation boils down > > to amdgpu_sync_resv() which syncs for the exclusive fence, always. And > > there's nothing else that I could find in public history at least, no > > references to bug reports or anything. I think you need to dig > > internally, because as-is I'm not seeing the problem here. > > > > Or am I missing something here? > > See the code here for example: > https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/nouveau/nouveau_fence.c#L361 > > Nouveau assumes that when a shared fence is present it doesn't need to > wait for the exclusive one because the shared are always supposed to > finish after the exclusive one. > > But for page table unmap fences that isn't true and we ran into a really > nasty and hard to reproduce bug because of this. > > I think it would be much more defensive if we could say that we always > wait for the exclusive fence and fix the use case in nouveau and double > check if somebody else does stuff like that as well. Yeah most other drivers do the defensive thing here. I think it would be good to standardize on that. I'll add that to my list and do more auditing. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2021-05-26 13:52 UTC|newest] Thread overview: 175+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-05-21 9:09 [PATCH 01/11] drm/amdgpu: Comply with implicit fencing rules Daniel Vetter 2021-05-21 9:09 ` [Intel-gfx] " Daniel Vetter 2021-05-21 9:09 ` [PATCH 02/11] drm/panfrost: Remove sched_lock Daniel Vetter 2021-05-21 9:09 ` [Intel-gfx] " Daniel Vetter 2021-05-21 9:32 ` Lucas Stach 2021-05-21 9:32 ` [Intel-gfx] " Lucas Stach 2021-05-21 14:49 ` Daniel Vetter 2021-05-21 14:49 ` [Intel-gfx] " Daniel Vetter 2021-05-21 9:09 ` [PATCH 03/11] drm/panfrost: Use xarray and helpers for depedency tracking Daniel Vetter 2021-05-21 9:09 ` [Intel-gfx] " Daniel Vetter 2021-05-21 9:09 ` Daniel Vetter 2021-06-02 14:06 ` Steven Price 2021-06-02 14:06 ` [Intel-gfx] " Steven Price 2021-06-02 14:06 ` Steven Price 2021-06-02 18:51 ` Daniel Vetter 2021-06-02 18:51 ` [Intel-gfx] " Daniel Vetter 2021-06-02 18:51 ` Daniel Vetter 2021-06-03 7:48 ` Steven Price 2021-06-03 7:48 ` [Intel-gfx] " Steven Price 2021-06-03 7:48 ` Steven Price 2021-05-21 9:09 ` [PATCH 04/11] drm/panfrost: Fix implicit sync Daniel Vetter 2021-05-21 9:09 ` [Intel-gfx] " Daniel Vetter 2021-05-21 9:09 ` Daniel Vetter 2021-05-21 12:22 ` Daniel Stone 2021-05-21 12:22 ` [Intel-gfx] " Daniel Stone 2021-05-21 12:22 ` Daniel Stone 2021-05-21 12:28 ` [Linaro-mm-sig] " Christian König 2021-05-21 12:28 ` [Intel-gfx] " Christian König 2021-05-21 12:54 ` Daniel Stone 2021-05-21 12:54 ` [Intel-gfx] " Daniel Stone 2021-05-21 12:54 ` Daniel Stone 2021-05-21 13:09 ` Christian König 2021-05-21 13:09 ` [Intel-gfx] " Christian König 2021-05-21 13:09 ` Christian König 2021-05-21 13:23 ` Daniel Stone 2021-05-21 13:23 ` [Intel-gfx] " Daniel Stone 2021-05-21 13:23 ` Daniel Stone 2021-05-21 9:09 ` [PATCH 05/11] drm/atomic-helper: make drm_gem_plane_helper_prepare_fb the default Daniel Vetter 2021-05-21 9:09 ` [Intel-gfx] " Daniel Vetter 2021-05-21 9:09 ` [PATCH 06/11] drm/<driver>: drm_gem_plane_helper_prepare_fb is now " Daniel Vetter 2021-05-21 9:09 ` Daniel Vetter 2021-05-21 9:09 ` [Intel-gfx] " Daniel Vetter 2021-05-21 9:09 ` Daniel Vetter 2021-05-21 9:09 ` Daniel Vetter 2021-05-21 9:09 ` Daniel Vetter 2021-05-21 9:09 ` Daniel Vetter 2021-05-21 9:38 ` Lucas Stach 2021-05-21 9:38 ` Lucas Stach 2021-05-21 9:38 ` [Intel-gfx] " Lucas Stach 2021-05-21 9:38 ` Lucas Stach 2021-05-21 9:38 ` Lucas Stach 2021-05-21 9:38 ` Lucas Stach 2021-05-21 9:38 ` Lucas Stach 2021-05-21 12:20 ` Heiko Stübner 2021-05-21 12:20 ` Heiko Stübner 2021-05-21 12:20 ` [Intel-gfx] " Heiko Stübner 2021-05-21 12:20 ` Heiko Stübner 2021-05-21 12:20 ` Heiko Stübner 2021-05-21 12:20 ` Heiko Stübner 2021-05-21 12:20 ` Heiko Stübner 2021-05-21 12:22 ` Paul Cercueil 2021-05-21 12:22 ` Paul Cercueil 2021-05-21 12:22 ` [Intel-gfx] " Paul Cercueil 2021-05-21 12:22 ` Paul Cercueil 2021-05-21 12:22 ` Paul Cercueil 2021-05-21 12:22 ` Paul Cercueil 2021-05-21 12:22 ` Paul Cercueil 2021-05-21 15:53 ` Jernej Škrabec 2021-05-21 15:53 ` Jernej Škrabec 2021-05-21 15:53 ` [Intel-gfx] " Jernej Škrabec 2021-05-21 15:53 ` Jernej Škrabec 2021-05-21 15:53 ` Jernej Škrabec 2021-05-21 15:53 ` Jernej Škrabec 2021-05-21 15:53 ` Jernej Škrabec 2021-05-21 23:18 ` Chun-Kuang Hu 2021-05-21 23:18 ` Chun-Kuang Hu 2021-05-21 23:18 ` [Intel-gfx] " Chun-Kuang Hu 2021-05-21 23:18 ` Chun-Kuang Hu 2021-05-21 23:18 ` Chun-Kuang Hu 2021-05-21 23:18 ` Chun-Kuang Hu 2021-05-21 23:18 ` Chun-Kuang Hu 2021-05-23 12:17 ` Martin Blumenstingl 2021-05-23 12:17 ` Martin Blumenstingl 2021-05-23 12:17 ` [Intel-gfx] " Martin Blumenstingl 2021-05-23 12:17 ` Martin Blumenstingl 2021-05-23 12:17 ` Martin Blumenstingl 2021-05-23 12:17 ` Martin Blumenstingl 2021-05-23 12:17 ` Martin Blumenstingl 2021-05-24 7:54 ` Tomi Valkeinen 2021-05-24 7:54 ` Tomi Valkeinen 2021-05-24 7:54 ` [Intel-gfx] " Tomi Valkeinen 2021-05-24 7:54 ` Tomi Valkeinen 2021-05-24 7:54 ` Tomi Valkeinen 2021-05-24 7:54 ` Tomi Valkeinen 2021-05-24 7:54 ` Tomi Valkeinen 2021-05-28 9:55 ` Philippe CORNU 2021-05-28 9:55 ` Philippe CORNU 2021-05-28 9:55 ` [Intel-gfx] " Philippe CORNU 2021-05-28 9:55 ` Philippe CORNU 2021-05-28 9:55 ` Philippe CORNU 2021-05-28 9:55 ` Philippe CORNU 2021-05-28 9:55 ` Philippe CORNU 2021-05-21 9:09 ` [PATCH 07/11] drm/armada: Remove prepare/cleanup_fb hooks Daniel Vetter 2021-05-21 9:09 ` [Intel-gfx] " Daniel Vetter 2021-05-21 9:09 ` [PATCH 08/11] drm/vram-helpers: Create DRM_GEM_VRAM_PLANE_HELPER_FUNCS Daniel Vetter 2021-05-21 9:09 ` [Intel-gfx] " Daniel Vetter 2021-05-21 9:33 ` tiantao (H) 2021-05-21 9:33 ` [Intel-gfx] " tiantao (H) 2021-05-21 9:09 ` [PATCH 09/11] drm/omap: Follow implicit fencing in prepare_fb Daniel Vetter 2021-05-21 9:09 ` [Intel-gfx] " Daniel Vetter 2021-05-24 7:53 ` Tomi Valkeinen 2021-05-24 7:53 ` [Intel-gfx] " Tomi Valkeinen 2021-05-21 9:09 ` [PATCH 10/11] drm/simple-helper: drm_gem_simple_display_pipe_prepare_fb as default Daniel Vetter 2021-05-21 9:09 ` [Intel-gfx] " Daniel Vetter 2021-05-25 17:48 ` Noralf Trønnes 2021-05-25 17:48 ` [Intel-gfx] " Noralf Trønnes 2021-05-25 17:53 ` Daniel Vetter 2021-05-25 17:53 ` [Intel-gfx] " Daniel Vetter 2021-05-21 9:09 ` [PATCH 11/11] drm/tiny: drm_gem_simple_display_pipe_prepare_fb is the default Daniel Vetter 2021-05-21 9:09 ` Daniel Vetter 2021-05-21 9:09 ` [Intel-gfx] " Daniel Vetter 2021-05-21 9:09 ` Daniel Vetter 2021-05-21 13:41 ` David Lechner 2021-05-21 13:41 ` David Lechner 2021-05-21 13:41 ` [Intel-gfx] " David Lechner 2021-05-21 13:41 ` David Lechner 2021-05-21 14:09 ` Noralf Trønnes 2021-05-21 14:09 ` Noralf Trønnes 2021-05-21 14:09 ` [Intel-gfx] " Noralf Trønnes 2021-05-21 14:09 ` Noralf Trønnes 2021-05-25 16:05 ` Daniel Vetter 2021-05-25 16:05 ` Daniel Vetter 2021-05-25 16:05 ` [Intel-gfx] " Daniel Vetter 2021-05-25 16:05 ` Daniel Vetter 2021-05-21 14:13 ` Oleksandr Andrushchenko 2021-05-21 14:13 ` Oleksandr Andrushchenko 2021-05-21 14:13 ` [Intel-gfx] " Oleksandr Andrushchenko 2021-05-21 14:13 ` Oleksandr Andrushchenko 2021-05-28 0:38 ` Linus Walleij 2021-05-28 0:38 ` Linus Walleij 2021-05-28 0:38 ` [Intel-gfx] " Linus Walleij 2021-05-28 0:38 ` Linus Walleij 2021-05-21 9:18 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/11] drm/amdgpu: Comply with implicit fencing rules Patchwork 2021-05-21 9:20 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork 2021-05-21 9:45 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork 2021-05-21 9:46 ` [PATCH 01/11] " Bas Nieuwenhuizen 2021-05-21 9:46 ` [Intel-gfx] " Bas Nieuwenhuizen 2021-05-21 14:37 ` Daniel Vetter 2021-05-21 14:37 ` [Intel-gfx] " Daniel Vetter 2021-05-21 15:00 ` Bas Nieuwenhuizen 2021-05-21 15:00 ` [Intel-gfx] " Bas Nieuwenhuizen 2021-05-21 15:16 ` Daniel Vetter 2021-05-21 15:16 ` [Intel-gfx] " Daniel Vetter 2021-05-21 18:08 ` [Mesa-dev] " Christian König 2021-05-21 18:08 ` [Intel-gfx] " Christian König 2021-05-21 18:31 ` Daniel Vetter 2021-05-21 18:31 ` [Intel-gfx] " Daniel Vetter 2021-05-22 8:30 ` Christian König 2021-05-22 8:30 ` [Intel-gfx] " Christian König 2021-05-25 13:05 ` Daniel Vetter 2021-05-25 13:05 ` [Intel-gfx] " Daniel Vetter 2021-05-25 15:05 ` Christian König 2021-05-25 15:23 ` Daniel Vetter 2021-05-25 15:23 ` [Intel-gfx] " Daniel Vetter 2021-05-26 13:32 ` Christian König 2021-05-26 13:32 ` [Intel-gfx] " Christian König 2021-05-26 13:51 ` Daniel Vetter [this message] 2021-05-26 13:51 ` Daniel Vetter 2021-05-21 11:22 ` Christian König 2021-05-21 11:22 ` [Intel-gfx] " Christian König 2021-05-21 14:58 ` [Mesa-dev] " Rob Clark 2021-05-21 14:58 ` [Intel-gfx] " Rob Clark 2021-05-21 14:58 ` Daniel Vetter 2021-05-21 14:58 ` [Intel-gfx] " Daniel Vetter 2021-05-23 5:00 ` [Intel-gfx] ✗ Fi.CI.IGT: failure for series starting with [01/11] " Patchwork
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CAKMK7uFQczzpkdSLB1H-dVySGTiap2ONETZCaz5ErE1sca8YWQ@mail.gmail.com \ --to=daniel@ffwll.ch \ --cc=Dennis.Li@amd.com \ --cc=alexander.deucher@amd.com \ --cc=chenli@uniontech.com \ --cc=christian.koenig@amd.com \ --cc=ckoenig.leichtzumerken@gmail.com \ --cc=daniel.vetter@intel.com \ --cc=daniels@collabora.com \ --cc=dri-devel@lists.freedesktop.org \ --cc=hoegsberg@google.com \ --cc=intel-gfx@lists.freedesktop.org \ --cc=kevin1.wang@amd.com \ --cc=linaro-mm-sig@lists.linaro.org \ --cc=luben.tuikov@amd.com \ --cc=mesa-dev@lists.freedesktop.org \ --cc=mh12gx2825@gmail.com \ --cc=michel@daenzer.net \ --cc=robdclark@chromium.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.