dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel.vetter@ffwll.ch>
To: "Christian König" <christian.koenig@amd.com>
Cc: linux-rdma <linux-rdma@vger.kernel.org>,
	intel-gfx <intel-gfx@lists.freedesktop.org>,
	LKML <linux-kernel@vger.kernel.org>,
	amd-gfx list <amd-gfx@lists.freedesktop.org>,
	Chris Wilson <chris@chris-wilson.co.uk>,
	"moderated list:DMA BUFFER SHARING FRAMEWORK"
	<linaro-mm-sig@lists.linaro.org>,
	DRI Development <dri-devel@lists.freedesktop.org>,
	Daniel Vetter <daniel.vetter@intel.com>,
	"open list:DMA BUFFER SHARING FRAMEWORK"
	<linux-media@vger.kernel.org>
Subject: Re: [RFC 10/17] drm/amdgpu: s/GFP_KERNEL/GFP_ATOMIC in scheduler code
Date: Tue, 12 May 2020 18:27:03 +0200	[thread overview]
Message-ID: <CAKMK7uGUBqcwo56p3f+8B=ntvuYZ8WtKaFxAPJ_D=H7qdDsGqQ@mail.gmail.com> (raw)
In-Reply-To: <CAKMK7uF1c3R7DTsvRaBfzRVAx03Z+AiUnqdAzP=mt4d=KsoEgg@mail.gmail.com>

On Tue, May 12, 2020 at 6:20 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Tue, May 12, 2020 at 5:56 PM Christian König
> <christian.koenig@amd.com> wrote:
> >
> > Hui what? Of hand that doesn't looks correct to me.
>
> It's not GFP_ATOMIC, it's just that GFP_ATOMIC is the only shotgun we
> have to avoid direct reclaim. And direct reclaim might need to call
> into your mmu notifier, which might need to wait on a fence, which is
> never going to happen because your scheduler is stuck.
>
> Note that all the explanations for the deadlocks and stuff I'm trying
> to hunt here are in the other patches, the driver ones are more
> informational, so I left these here rather bare-bones to shut up
> lockdep so I can get through the entire driver and all major areas
> (scheduler, reset, modeset code).
>
> Now you can do something like GFP_NOFS, but the only reasons that
> works is because the direct reclaim annotations
> (fs_reclaim_acquire/release) only validates against __GFP_FS, and not
> against any of the other flags. We should probably add some lockdep
> annotations so that __GFP_RECLAIM is annotated against the
> __mmu_notifier_invalidate_range_start_map lockdep map I've recently
> added for mmu notifiers. End result (assuming I'm not mixing anything
> up here, this is all rather tricky stuff): GFP_ATOMIC is the only kind
> of memory allocation you can do.
>
> > Why the heck should this be an atomic context? If that's correct
> > allocating memory is the least of the problems we have.
>
> It's not about atomic, it's !__GFP_RECLAIM. Which more or less is
> GFP_ATOMIC. Correct fix is probably GFP_ATOMIC + a mempool for the
> scheduler fixes so that if you can't allocate them for some reason,
> you at least know that your scheduler should eventually retire retire
> some of them, which you can then pick up from the mempool to guarantee
> forward progress.
>
> But I really didn't dig into details of the code, this was just a quick hack.
>
> So sleeping and taking all kinds of locks (but not all, e.g.
> dma_resv_lock and drm_modeset_lock are no-go) is still totally ok.
> Just think
>
> #define GFP_NO_DIRECT_RECLAIM GFP_ATOMIC

Maybe slightly different take that's easier to understand: You've
already made the observation that anything holding adev->notifier_lock
isn't allowed to allocate memory (well GFP_ATOMIC is ok, like here).

Only thing I'm adding is that the situation is a lot worse. Plus the
lockdep annotations to help us catch these issues.
-Daniel

> Cheers, Daniel
>
> >
> > Regards,
> > Christian.
> >
> > Am 12.05.20 um 10:59 schrieb Daniel Vetter:
> > > My dma-fence lockdep annotations caught an inversion because we
> > > allocate memory where we really shouldn't:
> > >
> > >       kmem_cache_alloc+0x2b/0x6d0
> > >       amdgpu_fence_emit+0x30/0x330 [amdgpu]
> > >       amdgpu_ib_schedule+0x306/0x550 [amdgpu]
> > >       amdgpu_job_run+0x10f/0x260 [amdgpu]
> > >       drm_sched_main+0x1b9/0x490 [gpu_sched]
> > >       kthread+0x12e/0x150
> > >
> > > Trouble right now is that lockdep only validates against GFP_FS, which
> > > would be good enough for shrinkers. But for mmu_notifiers we actually
> > > need !GFP_ATOMIC, since they can be called from any page laundering,
> > > even if GFP_NOFS or GFP_NOIO are set.
> > >
> > > I guess we should improve the lockdep annotations for
> > > fs_reclaim_acquire/release.
> > >
> > > Ofc real fix is to properly preallocate this fence and stuff it into
> > > the amdgpu job structure. But GFP_ATOMIC gets the lockdep splat out of
> > > the way.
> > >
> > > v2: Two more allocations in scheduler paths.
> > >
> > > Frist one:
> > >
> > >       __kmalloc+0x58/0x720
> > >       amdgpu_vmid_grab+0x100/0xca0 [amdgpu]
> > >       amdgpu_job_dependency+0xf9/0x120 [amdgpu]
> > >       drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
> > >       drm_sched_main+0xf9/0x490 [gpu_sched]
> > >
> > > Second one:
> > >
> > >       kmem_cache_alloc+0x2b/0x6d0
> > >       amdgpu_sync_fence+0x7e/0x110 [amdgpu]
> > >       amdgpu_vmid_grab+0x86b/0xca0 [amdgpu]
> > >       amdgpu_job_dependency+0xf9/0x120 [amdgpu]
> > >       drm_sched_entity_pop_job+0x3f/0x440 [gpu_sched]
> > >       drm_sched_main+0xf9/0x490 [gpu_sched]
> > >
> > > Cc: linux-media@vger.kernel.org
> > > Cc: linaro-mm-sig@lists.linaro.org
> > > Cc: linux-rdma@vger.kernel.org
> > > Cc: amd-gfx@lists.freedesktop.org
> > > Cc: intel-gfx@lists.freedesktop.org
> > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > Cc: Christian König <christian.koenig@amd.com>
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 2 +-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c   | 2 +-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c  | 2 +-
> > >   3 files changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > index d878fe7fee51..055b47241bb1 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > @@ -143,7 +143,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
> > >       uint32_t seq;
> > >       int r;
> > >
> > > -     fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL);
> > > +     fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_ATOMIC);
> > >       if (fence == NULL)
> > >               return -ENOMEM;
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> > > index fe92dcd94d4a..fdcd6659f5ad 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> > > @@ -208,7 +208,7 @@ static int amdgpu_vmid_grab_idle(struct amdgpu_vm *vm,
> > >       if (ring->vmid_wait && !dma_fence_is_signaled(ring->vmid_wait))
> > >               return amdgpu_sync_fence(sync, ring->vmid_wait, false);
> > >
> > > -     fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_KERNEL);
> > > +     fences = kmalloc_array(sizeof(void *), id_mgr->num_ids, GFP_ATOMIC);
> > >       if (!fences)
> > >               return -ENOMEM;
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > index b87ca171986a..330476cc0c86 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> > > @@ -168,7 +168,7 @@ int amdgpu_sync_fence(struct amdgpu_sync *sync, struct dma_fence *f,
> > >       if (amdgpu_sync_add_later(sync, f, explicit))
> > >               return 0;
> > >
> > > -     e = kmem_cache_alloc(amdgpu_sync_slab, GFP_KERNEL);
> > > +     e = kmem_cache_alloc(amdgpu_sync_slab, GFP_ATOMIC);
> > >       if (!e)
> > >               return -ENOMEM;
> > >
> >
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  reply	other threads:[~2020-05-12 16:27 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-12  8:59 [RFC 00/17] dma-fence lockdep annotations Daniel Vetter
2020-05-12  8:59 ` [RFC 01/17] dma-fence: add might_sleep annotation to _wait() Daniel Vetter
2020-05-12  9:03   ` Chris Wilson
2020-05-12  9:08   ` Christian König
2020-06-02  9:45     ` Maarten Lankhorst
2020-05-12  8:59 ` [RFC 02/17] dma-fence: basic lockdep annotations Daniel Vetter
2020-05-12  9:04   ` Chris Wilson
2020-05-12  9:08     ` Daniel Vetter
2020-05-12  9:19       ` Chris Wilson
2020-05-13  8:30         ` Daniel Vetter
2020-05-25 15:41     ` Daniel Vetter
2020-05-12 12:09   ` Jason Gunthorpe
2020-05-12 12:57     ` Daniel Vetter
2020-05-26 10:00   ` Maarten Lankhorst
2020-05-28 13:36   ` Thomas Hellström (Intel)
2020-05-28 14:22     ` Daniel Vetter
2020-05-28 21:54   ` Luben Tuikov
2020-05-29  5:49     ` Daniel Vetter
2020-05-12  8:59 ` [RFC 03/17] dma-fence: prime " Daniel Vetter
2020-05-12  8:59 ` [RFC 04/17] drm/vkms: Annotate vblank timer Daniel Vetter
2020-05-12  8:59 ` [RFC 05/17] drm/vblank: Annotate with dma-fence signalling section Daniel Vetter
2020-05-12  8:59 ` [RFC 06/17] drm/atomic-helper: Add dma-fence annotations Daniel Vetter
2020-05-12  8:59 ` [RFC 07/17] drm/amdgpu: add dma-fence annotations to atomic commit path Daniel Vetter
2020-05-12  8:59 ` [RFC 08/17] drm/scheduler: use dma-fence annotations in main thread Daniel Vetter
2020-05-25 15:30   ` Daniel Vetter
2020-05-12  8:59 ` [RFC 09/17] drm/amdgpu: use dma-fence annotations in cs_submit() Daniel Vetter
2020-05-13  7:02   ` Christian König
2020-05-13  7:07     ` Daniel Vetter
2020-05-12  8:59 ` [RFC 10/17] drm/amdgpu: s/GFP_KERNEL/GFP_ATOMIC in scheduler code Daniel Vetter
2020-05-12 15:56   ` Christian König
2020-05-12 16:20     ` Daniel Vetter
2020-05-12 16:27       ` Daniel Vetter [this message]
2020-05-12 17:31         ` Christian König
2020-05-12 18:34           ` Daniel Vetter
2020-05-12  8:59 ` [RFC 11/17] drm/amdgpu: DC also loves to allocate stuff where it shouldn't Daniel Vetter
2020-05-12  8:59 ` [RFC 12/17] drm/amdgpu/dc: Stop dma_resv_lock inversion in commit_tail Daniel Vetter
2020-05-12  8:59 ` [RFC 13/17] drm/scheduler: use dma-fence annotations in tdr work Daniel Vetter
2020-05-12  8:59 ` [RFC 14/17] drm/amdgpu: use dma-fence annotations for gpu reset code Daniel Vetter
2020-05-12  8:59 ` [RFC 15/17] Revert "drm/amdgpu: add fbdev suspend/resume on gpu reset" Daniel Vetter
2020-05-12  8:59 ` [RFC 16/17] drm/amdgpu: gpu recovery does full modesets Daniel Vetter
2020-05-12 12:54   ` Alex Deucher
2020-05-12 12:58     ` Daniel Vetter
2020-05-12 13:12       ` Alex Deucher
2020-05-12 13:17         ` Daniel Vetter
2020-05-12 13:29           ` Alex Deucher
2020-05-12 13:45             ` Daniel Vetter
2020-05-12 14:24               ` Alex Deucher
2020-05-12 16:12                 ` Daniel Vetter
2020-05-12 20:10                   ` Kazlauskas, Nicholas
2020-05-13  6:02                     ` Daniel Vetter
2020-05-12  8:59 ` [RFC 17/17] drm/i915: Annotate dma_fence_work Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKMK7uGUBqcwo56p3f+8B=ntvuYZ8WtKaFxAPJ_D=H7qdDsGqQ@mail.gmail.com' \
    --to=daniel.vetter@ffwll.ch \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=chris@chris-wilson.co.uk \
    --cc=christian.koenig@amd.com \
    --cc=daniel.vetter@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).