From: Daniel Vetter <daniel@ffwll.ch>
To: "Christian König" <christian.koenig@amd.com>
Cc: linux-rdma@vger.kernel.org,
Daniel Vetter <daniel.vetter@ffwll.ch>,
Intel Graphics Development <intel-gfx@lists.freedesktop.org>,
DRI Development <dri-devel@lists.freedesktop.org>,
Chris Wilson <chris@chris-wilson.co.uk>,
linaro-mm-sig@lists.linaro.org, amd-gfx@lists.freedesktop.org,
Daniel Vetter <daniel.vetter@intel.com>,
linux-media@vger.kernel.org
Subject: Re: [PATCH 19/25] drm/amdgpu: s/GFP_KERNEL/GFP_ATOMIC in scheduler code
Date: Tue, 14 Jul 2020 16:31:24 +0200 [thread overview]
Message-ID: <20200714143124.GG3278063@phenom.ffwll.local> (raw)
In-Reply-To: <d3e85f62-e427-7f1c-0ff4-842ffe57172e@amd.com>
On Tue, Jul 14, 2020 at 01:40:11PM +0200, Christian König wrote:
> Am 14.07.20 um 12:49 schrieb Daniel Vetter:
> > On Tue, Jul 07, 2020 at 10:12:23PM +0200, Daniel Vetter wrote:
> > > My dma-fence lockdep annotations caught an inversion because we
> > > allocate memory where we really shouldn't:
> > >
> > > kmem_cache_alloc+0x2b/0x6d0
> > > amdgpu_fence_emit+0x30/0x330 [amdgpu]
> > > amdgpu_ib_schedule+0x306/0x550 [amdgpu]
> > > amdgpu_job_run+0x10f/0x260 [amdgpu]
> > > drm_sched_main+0x1b9/0x490 [gpu_sched]
> > > kthread+0x12e/0x150
> > >
> > > Trouble right now is that lockdep only validates against GFP_FS, which
> > > would be good enough for shrinkers. But for mmu_notifiers we actually
> > > need !GFP_ATOMIC, since they can be called from any page laundering,
> > > even if GFP_NOFS or GFP_NOIO are set.
> > >
> > > I guess we should improve the lockdep annotations for
> > > fs_reclaim_acquire/release.
> > >
> > > Ofc real fix is to properly preallocate this fence and stuff it into
> > > the amdgpu job structure. But GFP_ATOMIC gets the lockdep splat out of
> > > the way.
> > >
> > > v2: Two more allocations in scheduler paths.
> > >
> > > Frist one:
> > >
> > > __kmalloc+0x58/0x720
> > > amdgpu_vmid_grab+0x100/0xca0 [amdgpu]
> > > amdgpu_job_dependency+0xf9/0x120 [amdgpu]
> > > drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
> > > drm_sched_main+0xf9/0x490 [gpu_sched]
> > >
> > > Second one:
> > >
> > > kmem_cache_alloc+0x2b/0x6d0
> > > amdgpu_sync_fence+0x7e/0x110 [amdgpu]
> > > amdgpu_vmid_grab+0x86b/0xca0 [amdgpu]
> > > amdgpu_job_dependency+0xf9/0x120 [amdgpu]
> > > drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
> > > drm_sched_main+0xf9/0x490 [gpu_sched]
> > >
> > > Cc: linux-media@vger.kernel.org
> > > Cc: linaro-mm-sig@lists.linaro.org
> > > Cc: linux-rdma@vger.kernel.org
> > > Cc: amd-gfx@lists.freedesktop.org
> > > Cc: intel-gfx@lists.freedesktop.org
> > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > Cc: Christian König <christian.koenig@amd.com>
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Has anyone from amd side started looking into how to fix this properly?
>
> Yeah I checked both and neither are any real problem.
I'm confused ... do you mean "no real problem fixing them" or "not
actually a real problem"?
> > I looked a bit into fixing this with mempool, and the big guarantee we
> > need is that
> > - there's a hard upper limit on how many allocations we minimally need to
> > guarantee forward progress. And the entire vmid allocation and
> > amdgpu_sync_fence stuff kinda makes me question that's a valid
> > assumption.
>
> We do have hard upper limits for those.
>
> The VMID allocation could as well just return the fence instead of putting
> it into the sync object IIRC. So that just needs some cleanup and can avoid
> the allocation entirely.
Yeah embedding should be simplest solution of all.
> The hardware fence is limited by the number of submissions we can have
> concurrently on the ring buffers, so also not a problem at all.
Ok that sounds good. Wrt releasing the memory again, is that also done
without any of the allocation-side locks held? I've seen some vmid manager
somewhere ...
-Daniel
>
> Regards,
> Christian.
>
> >
> > - mempool_free must be called without any locks in the way which are held
> > while we call mempool_alloc. Otherwise we again have a nice deadlock
> > with no forward progress. I tried auditing that, but got lost in amdgpu
> > and scheduler code. Some lockdep annotations for mempool.c might help,
> > but they're not going to catch everything. Plus it would be again manual
> > annotations because this is yet another cross-release issue. So not sure
> > that helps at all.
> >
> > iow, not sure what to do here. Ideas?
> >
> > Cheers, Daniel
> >
> > > ---
> > > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
> > > drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 2 +-
> > > drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 2 +-
> > > 3 files changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > index 8d84975885cd..a089a827fdfe 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > @@ -143,7 +143,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
> > > uint32_t seq;
> > > int r;
> > > - fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL);
> > > + fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC);
> > > if (fence == NULL)
> > > return -ENOMEM;
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> > > index 267fa45ddb66..a333ca2d4ddd 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> > > @@ -208,7 +208,7 @@ static int amdgpu_vmid_grab_idle(struct amdgpu_vm *vm,
> > > if (ring->vmid_wait && !dma_fence_is_signaled(ring->vmid_wait))
> > > return amdgpu_sync_fence(sync, ring->vmid_wait);
> > > - fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_KERNEL);
> > > + fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_ATOMIC);
> > > if (!fences)
> > > return -ENOMEM;
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > index 8ea6c49529e7..af22b526cec9 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > @@ -160,7 +160,7 @@ int amdgpu_sync_fence(struct amdgpu_sync *sync, struct dma_fence *f)
> > > if (amdgpu_sync_add_later(sync, f))
> > > return 0;
> > > - e = kmem_cache_alloc(amdgpu_sync_slab, GFP_KERNEL);
> > > + e = kmem_cache_alloc(amdgpu_sync_slab, GFP_ATOMIC);
> > > if (!e)
> > > return -ENOMEM;
> > > --
> > > 2.27.0
> > >
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
next prev parent reply other threads:[~2020-07-14 14:31 UTC|newest]
Thread overview: 119+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-07 20:12 [PATCH 00/25] dma-fence annotations, round 3 Daniel Vetter
2020-07-07 20:12 ` [PATCH 01/25] dma-fence: basic lockdep annotations Daniel Vetter
2020-07-08 14:57 ` Christian König
2020-07-08 15:12 ` Daniel Vetter
2020-07-08 15:19 ` Alex Deucher
2020-07-08 15:37 ` Daniel Vetter
2020-07-14 11:09 ` Daniel Vetter
2020-07-09 7:32 ` [Intel-gfx] " Daniel Stone
2020-07-09 7:52 ` Daniel Vetter
2020-07-13 16:26 ` Daniel Vetter
2020-07-13 16:39 ` Christian König
2020-07-13 20:31 ` Dave Airlie
2020-07-07 20:12 ` [PATCH 02/25] dma-fence: prime " Daniel Vetter
2020-07-09 8:09 ` Daniel Vetter
2020-07-10 12:43 ` Jason Gunthorpe
2020-07-10 12:48 ` Christian König
2020-07-10 12:54 ` Jason Gunthorpe
2020-07-10 13:01 ` Christian König
2020-07-10 13:48 ` Jason Gunthorpe
2020-07-10 14:02 ` Daniel Vetter
2020-07-10 14:23 ` Jason Gunthorpe
2020-07-10 20:02 ` Daniel Vetter
2020-07-07 20:12 ` [PATCH 03/25] dma-buf.rst: Document why idenfinite fences are a bad idea Daniel Vetter
2020-07-09 7:36 ` [Intel-gfx] " Daniel Stone
2020-07-09 8:04 ` Daniel Vetter
2020-07-09 12:11 ` Daniel Stone
2020-07-09 12:31 ` Daniel Vetter
2020-07-09 14:28 ` Christian König
2020-07-09 11:53 ` Christian König
2020-07-09 12:33 ` [PATCH 1/2] dma-buf.rst: Document why indefinite " Daniel Vetter
2020-07-09 12:33 ` [PATCH 2/2] drm/virtio: Remove open-coded commit-tail function Daniel Vetter
2020-07-09 12:48 ` Gerd Hoffmann
2020-07-09 14:05 ` Sam Ravnborg
2020-07-14 9:13 ` Daniel Vetter
2020-08-19 12:43 ` Jiri Slaby
2020-08-19 12:47 ` Jiri Slaby
2020-08-19 13:24 ` Gerd Hoffmann
2020-08-20 6:32 ` Jiri Slaby
2020-08-21 7:01 ` Gerd Hoffmann
2020-07-10 12:30 ` [PATCH 1/2] dma-buf.rst: Document why indefinite fences are a bad idea Maarten Lankhorst
2020-07-14 17:46 ` Jason Ekstrand
2020-07-20 11:15 ` [Linaro-mm-sig] " Thomas Hellström (Intel)
2020-07-21 7:41 ` Daniel Vetter
2020-07-21 7:45 ` Christian König
2020-07-21 8:47 ` Thomas Hellström (Intel)
2020-07-21 8:55 ` Christian König
2020-07-21 9:16 ` Daniel Vetter
2020-07-21 9:24 ` Daniel Vetter
2020-07-21 9:37 ` Thomas Hellström (Intel)
2020-07-21 9:50 ` Daniel Vetter
2020-07-21 10:47 ` Thomas Hellström (Intel)
2020-07-21 13:59 ` Christian König
2020-07-21 17:46 ` Thomas Hellström (Intel)
2020-07-21 18:18 ` Daniel Vetter
2020-07-21 21:42 ` Dave Airlie
2020-07-21 22:45 ` Dave Airlie
2020-07-22 6:45 ` Thomas Hellström (Intel)
2020-07-22 7:11 ` Daniel Vetter
2020-07-22 8:05 ` Thomas Hellström (Intel)
2020-07-22 9:45 ` Daniel Vetter
2020-07-22 10:31 ` Thomas Hellström (Intel)
2020-07-22 11:39 ` Daniel Vetter
2020-07-22 12:22 ` Thomas Hellström (Intel)
2020-07-22 12:41 ` Daniel Vetter
2020-07-22 13:12 ` Thomas Hellström (Intel)
2020-07-22 14:07 ` Daniel Vetter
2020-07-22 14:23 ` Christian König
2020-07-22 14:30 ` Thomas Hellström (Intel)
2020-07-22 14:35 ` Christian König
2020-07-07 20:12 ` [PATCH 04/25] drm/vkms: Annotate vblank timer Daniel Vetter
2020-07-12 22:27 ` Rodrigo Siqueira
2020-07-14 9:57 ` Melissa Wen
2020-07-14 9:59 ` Daniel Vetter
2020-07-14 14:55 ` Melissa Wen
2020-07-14 15:23 ` Daniel Vetter
2020-07-07 20:12 ` [PATCH 05/25] drm/vblank: Annotate with dma-fence signalling section Daniel Vetter
2020-07-07 20:12 ` [PATCH 06/25] drm/amdgpu: add dma-fence annotations to atomic commit path Daniel Vetter
2020-07-07 20:12 ` [PATCH 07/25] drm/komdea: Annotate dma-fence critical section in " Daniel Vetter
2020-07-08 5:17 ` james qian wang (Arm Technology China)
2020-07-14 8:34 ` Daniel Vetter
2020-07-07 20:12 ` [PATCH 08/25] drm/malidp: " Daniel Vetter
2020-07-15 12:53 ` Liviu Dudau
2020-07-15 13:51 ` Daniel Vetter
2020-07-07 20:12 ` [PATCH 09/25] drm/atmel: Use drm_atomic_helper_commit Daniel Vetter
2020-07-07 20:37 ` Sam Ravnborg
2020-07-07 21:31 ` [PATCH] " Daniel Vetter
2020-07-14 9:55 ` Sam Ravnborg
2020-07-07 20:12 ` [PATCH 10/25] drm/imx: Annotate dma-fence critical section in commit path Daniel Vetter
2020-07-07 20:12 ` [PATCH 11/25] drm/omapdrm: " Daniel Vetter
2020-07-07 20:12 ` [PATCH 12/25] drm/rcar-du: " Daniel Vetter
2020-07-07 23:32 ` Laurent Pinchart
2020-07-14 8:39 ` Daniel Vetter
2020-07-07 20:12 ` [PATCH 13/25] drm/tegra: " Daniel Vetter
2020-07-07 20:12 ` [PATCH 14/25] drm/tidss: " Daniel Vetter
2020-07-08 9:01 ` Jyri Sarha
2020-07-07 20:12 ` [PATCH 15/25] drm/tilcdc: Use standard drm_atomic_helper_commit Daniel Vetter
2020-07-08 9:17 ` Jyri Sarha
2020-07-08 9:27 ` Daniel Vetter
2020-07-08 9:44 ` [PATCH] " Daniel Vetter
2020-07-08 10:21 ` Jyri Sarha
2020-07-08 14:20 ` Daniel Vetter
2020-07-10 11:16 ` Jyri Sarha
2020-07-14 8:32 ` Daniel Vetter
2020-07-07 20:12 ` [PATCH 16/25] drm/atomic-helper: Add dma-fence annotations Daniel Vetter
2020-07-07 20:12 ` [PATCH 17/25] drm/scheduler: use dma-fence annotations in main thread Daniel Vetter
2020-07-07 20:12 ` [PATCH 18/25] drm/amdgpu: use dma-fence annotations in cs_submit() Daniel Vetter
2020-07-07 20:12 ` [PATCH 19/25] drm/amdgpu: s/GFP_KERNEL/GFP_ATOMIC in scheduler code Daniel Vetter
2020-07-14 10:49 ` Daniel Vetter
2020-07-14 11:40 ` Christian König
2020-07-14 14:31 ` Daniel Vetter [this message]
2020-07-15 9:17 ` Christian König
2020-07-15 11:53 ` Daniel Vetter
2020-07-07 20:12 ` [PATCH 20/25] drm/amdgpu: DC also loves to allocate stuff where it shouldn't Daniel Vetter
2020-07-14 11:12 ` Daniel Vetter
2020-07-07 20:12 ` [PATCH 21/25] drm/amdgpu/dc: Stop dma_resv_lock inversion in commit_tail Daniel Vetter
2020-07-07 20:12 ` [PATCH 22/25] drm/scheduler: use dma-fence annotations in tdr work Daniel Vetter
2020-07-07 20:12 ` [PATCH 23/25] drm/amdgpu: use dma-fence annotations for gpu reset code Daniel Vetter
2020-07-07 20:12 ` [PATCH 24/25] Revert "drm/amdgpu: add fbdev suspend/resume on gpu reset" Daniel Vetter
2020-07-07 20:12 ` [PATCH 25/25] drm/amdgpu: gpu recovery does full modesets Daniel Vetter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200714143124.GG3278063@phenom.ffwll.local \
--to=daniel@ffwll.ch \
--cc=amd-gfx@lists.freedesktop.org \
--cc=chris@chris-wilson.co.uk \
--cc=christian.koenig@amd.com \
--cc=daniel.vetter@ffwll.ch \
--cc=daniel.vetter@intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=linaro-mm-sig@lists.linaro.org \
--cc=linux-media@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).