amd-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel.vetter@ffwll.ch>
To: "Christian König" <christian.koenig@amd.com>
Cc: linux-rdma <linux-rdma@vger.kernel.org>,
	intel-gfx <intel-gfx@lists.freedesktop.org>,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	amd-gfx list <amd-gfx@lists.freedesktop.org>,
	Chris Wilson <chris@chris-wilson.co.uk>,
	"moderated list:DMA BUFFER SHARING FRAMEWORK"
	<linaro-mm-sig@lists.linaro.org>,
	DRI Development <dri-devel@lists.freedesktop.org>,
	Daniel Vetter <daniel.vetter@intel.com>,
	"open list:DMA BUFFER SHARING FRAMEWORK"
	<linux-media@vger.kernel.org>
Subject: Re: [RFC 10/17] drm/amdgpu: s/GFP_KERNEL/GFP_ATOMIC in scheduler code
Date: Tue, 12 May 2020 18:20:18 +0200	[thread overview]
Message-ID: <CAKMK7uF1c3R7DTsvRaBfzRVAx03Z+AiUnqdAzP=mt4d=KsoEgg@mail.gmail.com> (raw)
In-Reply-To: <879b127e-2180-bc59-f522-252416a7ac01@amd.com>

On Tue, May 12, 2020 at 5:56 PM Christian König
<christian.koenig@amd.com> wrote:
>
> Hui what? Of hand that doesn't looks correct to me.

It's not GFP_ATOMIC, it's just that GFP_ATOMIC is the only shotgun we
have to avoid direct reclaim. And direct reclaim might need to call
into your mmu notifier, which might need to wait on a fence, which is
never going to happen because your scheduler is stuck.

Note that all the explanations for the deadlocks and stuff I'm trying
to hunt here are in the other patches, the driver ones are more
informational, so I left these here rather bare-bones to shut up
lockdep so I can get through the entire driver and all major areas
(scheduler, reset, modeset code).

Now you can do something like GFP_NOFS, but the only reasons that
works is because the direct reclaim annotations
(fs_reclaim_acquire/release) only validates against __GFP_FS, and not
against any of the other flags. We should probably add some lockdep
annotations so that __GFP_RECLAIM is annotated against the
__mmu_notifier_invalidate_range_start_map lockdep map I've recently
added for mmu notifiers. End result (assuming I'm not mixing anything
up here, this is all rather tricky stuff): GFP_ATOMIC is the only kind
of memory allocation you can do.

> Why the heck should this be an atomic context? If that's correct
> allocating memory is the least of the problems we have.

It's not about atomic, it's !__GFP_RECLAIM. Which more or less is
GFP_ATOMIC. Correct fix is probably GFP_ATOMIC + a mempool for the
scheduler fixes so that if you can't allocate them for some reason,
you at least know that your scheduler should eventually retire retire
some of them, which you can then pick up from the mempool to guarantee
forward progress.

But I really didn't dig into details of the code, this was just a quick hack.

So sleeping and taking all kinds of locks (but not all, e.g.
dma_resv_lock and drm_modeset_lock are no-go) is still totally ok.
Just think

#define GFP_NO_DIRECT_RECLAIM GFP_ATOMIC

Cheers, Daniel

>
> Regards,
> Christian.
>
> Am 12.05.20 um 10:59 schrieb Daniel Vetter:
> > My dma-fence lockdep annotations caught an inversion because we
> > allocate memory where we really shouldn't:
> >
> >       kmem_cache_alloc+0x2b/0x6d0
> >       amdgpu_fence_emit+0x30/0x330 [amdgpu]
> >       amdgpu_ib_schedule+0x306/0x550 [amdgpu]
> >       amdgpu_job_run+0x10f/0x260 [amdgpu]
> >       drm_sched_main+0x1b9/0x490 [gpu_sched]
> >       kthread+0x12e/0x150
> >
> > Trouble right now is that lockdep only validates against GFP_FS, which
> > would be good enough for shrinkers. But for mmu_notifiers we actually
> > need !GFP_ATOMIC, since they can be called from any page laundering,
> > even if GFP_NOFS or GFP_NOIO are set.
> >
> > I guess we should improve the lockdep annotations for
> > fs_reclaim_acquire/release.
> >
> > Ofc real fix is to properly preallocate this fence and stuff it into
> > the amdgpu job structure. But GFP_ATOMIC gets the lockdep splat out of
> > the way.
> >
> > v2: Two more allocations in scheduler paths.
> >
> > Frist one:
> >
> >       __kmalloc+0x58/0x720
> >       amdgpu_vmid_grab+0x100/0xca0 [amdgpu]
> >       amdgpu_job_dependency+0xf9/0x120 [amdgpu]
> >       drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
> >       drm_sched_main+0xf9/0x490 [gpu_sched]
> >
> > Second one:
> >
> >       kmem_cache_alloc+0x2b/0x6d0
> >       amdgpu_sync_fence+0x7e/0x110 [amdgpu]
> >       amdgpu_vmid_grab+0x86b/0xca0 [amdgpu]
> >       amdgpu_job_dependency+0xf9/0x120 [amdgpu]
> >       drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
> >       drm_sched_main+0xf9/0x490 [gpu_sched]
> >
> > Cc: linux-media@vger.kernel.org
> > Cc: linaro-mm-sig@lists.linaro.org
> > Cc: linux-rdma@vger.kernel.org
> > Cc: amd-gfx@lists.freedesktop.org
> > Cc: intel-gfx@lists.freedesktop.org
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > Cc: Christian König <christian.koenig@amd.com>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c   | 2 +-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c  | 2 +-
> >   3 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > index d878fe7fee51..055b47241bb1 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > @@ -143,7 +143,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
> >       uint32_t seq;
> >       int r;
> >
> > -     fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL);
> > +     fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC);
> >       if (fence == NULL)
> >               return -ENOMEM;
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> > index fe92dcd94d4a..fdcd6659f5ad 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> > @@ -208,7 +208,7 @@ static int amdgpu_vmid_grab_idle(struct amdgpu_vm *vm,
> >       if (ring->vmid_wait && !dma_fence_is_signaled(ring->vmid_wait))
> >               return amdgpu_sync_fence(sync, ring->vmid_wait, false);
> >
> > -     fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_KERNEL);
> > +     fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_ATOMIC);
> >       if (!fences)
> >               return -ENOMEM;
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > index b87ca171986a..330476cc0c86 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > @@ -168,7 +168,7 @@ int amdgpu_sync_fence(struct amdgpu_sync *sync, struct dma_fence *f,
> >       if (amdgpu_sync_add_later(sync, f, explicit))
> >               return 0;
> >
> > -     e = kmem_cache_alloc(amdgpu_sync_slab, GFP_KERNEL);
> > +     e = kmem_cache_alloc(amdgpu_sync_slab, GFP_ATOMIC);
> >       if (!e)
> >               return -ENOMEM;
> >
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  reply	other threads:[~2020-05-12 16:20 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-12  8:59 [RFC 00/17] dma-fence lockdep annotations Daniel Vetter
2020-05-12  8:59 ` [RFC 01/17] dma-fence: add might_sleep annotation to _wait() Daniel Vetter
2020-05-12  9:03   ` Chris Wilson
2020-05-12  9:08   ` Christian König
2020-06-02  9:45     ` Maarten Lankhorst
2020-05-12  8:59 ` [RFC 02/17] dma-fence: basic lockdep annotations Daniel Vetter
2020-05-12  9:04   ` Chris Wilson
2020-05-12  9:08     ` Daniel Vetter
2020-05-12  9:19       ` Chris Wilson
2020-05-13  8:30         ` Daniel Vetter
2020-05-25 15:41     ` Daniel Vetter
2020-05-12 12:09   ` Jason Gunthorpe
2020-05-12 12:57     ` Daniel Vetter
2020-05-26 10:00   ` Maarten Lankhorst
2020-05-28 13:36   ` Thomas Hellström (Intel)
2020-05-28 14:22     ` Daniel Vetter
2020-05-28 21:54   ` Luben Tuikov
2020-05-29  5:49     ` Daniel Vetter
2020-05-12  8:59 ` [RFC 03/17] dma-fence: prime " Daniel Vetter
2020-05-12  8:59 ` [RFC 04/17] drm/vkms: Annotate vblank timer Daniel Vetter
2020-05-12  8:59 ` [RFC 05/17] drm/vblank: Annotate with dma-fence signalling section Daniel Vetter
2020-05-12  8:59 ` [RFC 06/17] drm/atomic-helper: Add dma-fence annotations Daniel Vetter
2020-05-12  8:59 ` [RFC 07/17] drm/amdgpu: add dma-fence annotations to atomic commit path Daniel Vetter
2020-05-12  8:59 ` [RFC 08/17] drm/scheduler: use dma-fence annotations in main thread Daniel Vetter
2020-05-25 15:30   ` Daniel Vetter
2020-05-12  8:59 ` [RFC 09/17] drm/amdgpu: use dma-fence annotations in cs_submit() Daniel Vetter
2020-05-13  7:02   ` Christian König
2020-05-13  7:07     ` Daniel Vetter
2020-05-12  8:59 ` [RFC 10/17] drm/amdgpu: s/GFP_KERNEL/GFP_ATOMIC in scheduler code Daniel Vetter
2020-05-12 15:56   ` Christian König
2020-05-12 16:20     ` Daniel Vetter [this message]
2020-05-12 16:27       ` Daniel Vetter
2020-05-12 17:31         ` Christian König
2020-05-12 18:34           ` Daniel Vetter
2020-05-12  8:59 ` [RFC 11/17] drm/amdgpu: DC also loves to allocate stuff where it shouldn't Daniel Vetter
2020-05-12  8:59 ` [RFC 12/17] drm/amdgpu/dc: Stop dma_resv_lock inversion in commit_tail Daniel Vetter
2020-05-12  8:59 ` [RFC 13/17] drm/scheduler: use dma-fence annotations in tdr work Daniel Vetter
2020-05-12  8:59 ` [RFC 14/17] drm/amdgpu: use dma-fence annotations for gpu reset code Daniel Vetter
2020-05-12  8:59 ` [RFC 15/17] Revert "drm/amdgpu: add fbdev suspend/resume on gpu reset" Daniel Vetter
2020-05-12  8:59 ` [RFC 16/17] drm/amdgpu: gpu recovery does full modesets Daniel Vetter
2020-05-12 12:54   ` Alex Deucher
2020-05-12 12:58     ` Daniel Vetter
2020-05-12 13:12       ` Alex Deucher
2020-05-12 13:17         ` Daniel Vetter
2020-05-12 13:29           ` Alex Deucher
2020-05-12 13:45             ` Daniel Vetter
2020-05-12 14:24               ` Alex Deucher
2020-05-12 16:12                 ` Daniel Vetter
2020-05-12 20:10                   ` Kazlauskas, Nicholas
2020-05-13  6:02                     ` Daniel Vetter
2020-05-12  8:59 ` [RFC 17/17] drm/i915: Annotate dma_fence_work Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKMK7uF1c3R7DTsvRaBfzRVAx03Z+AiUnqdAzP=mt4d=KsoEgg@mail.gmail.com' \
    --to=daniel.vetter@ffwll.ch \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=chris@chris-wilson.co.uk \
    --cc=christian.koenig@amd.com \
    --cc=daniel.vetter@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).