linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
       [not found]           ` <e6e9df3e-cd2b-d80f-205d-6ca1865819b2@gmail.com>
@ 2021-03-22 13:49             ` Daniel Vetter
  2021-03-22 14:05               ` Matthew Wilcox
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Vetter @ 2021-03-22 13:49 UTC (permalink / raw)
  To: Christian König, Matthew Wilcox, Dave Chinner
  Cc: Daniel Vetter, Leo Liu, amd-gfx list, dri-devel, Linux MM

On Sun, Mar 21, 2021 at 03:18:28PM +0100, Christian König wrote:
> Am 20.03.21 um 14:17 schrieb Daniel Vetter:
> > On Sat, Mar 20, 2021 at 10:04 AM Christian König
> > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > Am 19.03.21 um 20:06 schrieb Daniel Vetter:
> > > > On Fri, Mar 19, 2021 at 07:53:48PM +0100, Christian König wrote:
> > > > > Am 19.03.21 um 18:52 schrieb Daniel Vetter:
> > > > > > On Fri, Mar 19, 2021 at 03:08:57PM +0100, Christian König wrote:
> > > > > > > Don't print a warning when we fail to allocate a page for swapping things out.
> > > > > > > 
> > > > > > > Also rely on memalloc_nofs_save/memalloc_nofs_restore instead of GFP_NOFS.
> > > > > > Uh this part doesn't make sense. Especially since you only do it for the
> > > > > > debugfs file, not in general. Which means you've just completely broken
> > > > > > the shrinker.
> > > > > Are you sure? My impression is that GFP_NOFS should now work much more out
> > > > > of the box with the memalloc_nofs_save()/memalloc_nofs_restore().
> > > > Yeah, if you'd put it in the right place :-)
> > > > 
> > > > But also -mm folks are very clear that memalloc_no*() family is for dire
> > > > situation where there's really no other way out. For anything where you
> > > > know what you're doing, you really should use explicit gfp flags.
> > > My impression is just the other way around. You should try to avoid the
> > > NOFS/NOIO flags and use the memalloc_no* approach instead.
> > Where did you get that idea?
> 
> Well from the kernel comment on GFP_NOFS:
> 
>  * %GFP_NOFS will use direct reclaim but will not use any filesystem
> interfaces.
>  * Please try to avoid using this flag directly and instead use
>  * memalloc_nofs_{save,restore} to mark the whole scope which
> cannot/shouldn't
>  * recurse into the FS layer with a short explanation why. All allocation
>  * requests will inherit GFP_NOFS implicitly.

Huh that's interesting, since iirc Willy or Dave told me the opposite, and
the memalloc_no* stuff is for e.g. nfs calling into network layer (needs
GFP_NOFS) or swap on top of a filesystems (even needs GFP_NOIO I think).

Adding them, maybe I got confused.

> > The kernel is full of explicit gfp_t flag
> > passing to make this as explicit as possible. The memalloc_no* stuff
> > is just for when you go through entire subsystems and really can't
> > wire it through. I can't find the discussion anymore, but that was the
> > advice I got from mm/fs people.
> > 
> > One reason is that generally a small GFP_KERNEL allocation never
> > fails. But it absolutely can fail if it's in a memalloc_no* section,
> > and these kind of non-obvious non-local effects are a real pain in
> > testing and review. Hence explicit gfp_flag passing as much as
> > possible.
> > 
> > > > > > If this is just to paper over the seq_printf doing the wrong allocations,
> > > > > > then just move that out from under the fs_reclaim_acquire/release part.
> > > > > No, that wasn't the problem.
> > > > > 
> > > > > We have just seen to many failures to allocate pages for swapout and I think
> > > > > that would improve this because in a lot of cases we can then immediately
> > > > > swap things out instead of having to rely on upper layers.
> > > > Yeah, you broke it. Now the real shrinker is running with GFP_KERNEL,
> > > > because your memalloc_no is only around the debugfs function. And ofc it's
> > > > much easier to allocate with GFP_KERNEL, right until you deadlock :-)
> > > The problem here is that for example kswapd calls the shrinker without
> > > holding a FS lock as far as I can see.
> > > 
> > > And it is rather sad that we can't optimize this case directly.
> > I'm still not clear what you want to optimize? You can check for "is
> > this kswapd" in pf flags, but that sounds very hairy and fragile.
> 
> Well we only need the NOFS flag when the shrinker callback really comes from
> a memory shortage in the FS subsystem, and that is rather unlikely.
> 
> When we would allow all other cases to be able to directly IO the freed up
> pages to swap it would certainly help.

tbh I'm not sure. i915-gem code has played tricks with special casing the
kswapd path, and they do kinda scare me at least. I'm not sure whether
there's not some hidden dependencies there that would make this a bad
idea. Like afaik direct reclaim can sometimes stall for kswapd to catch up
a bit, or at least did in the past (I think, really not much clue about
this)

The other thing is that the fs_reclaim_acquire/release annotation really
only works well if you use it outside of the direct reclaim path too.
Otherwise it's not much better than just lots of testing. That pretty much
means you have to annotate the kswapd path.
-Daniel



> 
> Christian.
> 
> > -Daniel
> > 
> > > Anyway you are right if some caller doesn't use the memalloc_no*()
> > > approach we are busted.
> > > 
> > > Going to change the patch to only not warn for the moment.
> > > 
> > > Regards,
> > > Christian.
> > > 
> > > > Shrinking is hard, there's no easy way out here.
> > > > 
> > > > Cheers, Daniel
> > > > 
> > > > > Regards,
> > > > > Christian.
> > > > > 
> > > > > 
> > > > > > __GFP_NOWARN should be there indeed I think.
> > > > > > -Daniel
> > > > > > 
> > > > > > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > > > > > ---
> > > > > > >     drivers/gpu/drm/ttm/ttm_tt.c | 5 ++++-
> > > > > > >     1 file changed, 4 insertions(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> > > > > > > index 2f0833c98d2c..86fa3e82dacc 100644
> > > > > > > --- a/drivers/gpu/drm/ttm/ttm_tt.c
> > > > > > > +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> > > > > > > @@ -369,7 +369,7 @@ static unsigned long ttm_tt_shrinker_scan(struct shrinker *shrink,
> > > > > > >             };
> > > > > > >             int ret;
> > > > > > > -  ret = ttm_bo_swapout(&ctx, GFP_NOFS);
> > > > > > > +  ret = ttm_bo_swapout(&ctx, GFP_KERNEL | __GFP_NOWARN);
> > > > > > >             return ret < 0 ? SHRINK_EMPTY : ret;
> > > > > > >     }
> > > > > > > @@ -389,10 +389,13 @@ static unsigned long ttm_tt_shrinker_count(struct shrinker *shrink,
> > > > > > >     static int ttm_tt_debugfs_shrink_show(struct seq_file *m, void *data)
> > > > > > >     {
> > > > > > >             struct shrink_control sc = { .gfp_mask = GFP_KERNEL };
> > > > > > > +  unsigned int flags;
> > > > > > >             fs_reclaim_acquire(GFP_KERNEL);
> > > > > > > +  flags = memalloc_nofs_save();
> > > > > > >             seq_printf(m, "%lu/%lu\n", ttm_tt_shrinker_count(&mm_shrinker, &sc),
> > > > > > >                        ttm_tt_shrinker_scan(&mm_shrinker, &sc));
> > > > > > > +  memalloc_nofs_restore(flags);
> > > > > > >             fs_reclaim_release(GFP_KERNEL);
> > > > > > >             return 0;
> > > > > > > --
> > > > > > > 2.25.1
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > dri-devel mailing list
> > > > > > > dri-devel@lists.freedesktop.org
> > > > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-22 13:49             ` [PATCH] drm/ttm: stop warning on TT shrinker failure Daniel Vetter
@ 2021-03-22 14:05               ` Matthew Wilcox
  2021-03-22 14:22                 ` Daniel Vetter
  2021-03-22 15:57                 ` Michal Hocko
  0 siblings, 2 replies; 29+ messages in thread
From: Matthew Wilcox @ 2021-03-22 14:05 UTC (permalink / raw)
  To: Christian König, Dave Chinner, Leo Liu, amd-gfx list,
	dri-devel, Linux MM
  Cc: mhocko

On Mon, Mar 22, 2021 at 02:49:27PM +0100, Daniel Vetter wrote:
> On Sun, Mar 21, 2021 at 03:18:28PM +0100, Christian König wrote:
> > Am 20.03.21 um 14:17 schrieb Daniel Vetter:
> > > On Sat, Mar 20, 2021 at 10:04 AM Christian König
> > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > Am 19.03.21 um 20:06 schrieb Daniel Vetter:
> > > > > On Fri, Mar 19, 2021 at 07:53:48PM +0100, Christian König wrote:
> > > > > > Am 19.03.21 um 18:52 schrieb Daniel Vetter:
> > > > > > > On Fri, Mar 19, 2021 at 03:08:57PM +0100, Christian König wrote:
> > > > > > > > Don't print a warning when we fail to allocate a page for swapping things out.
> > > > > > > > 
> > > > > > > > Also rely on memalloc_nofs_save/memalloc_nofs_restore instead of GFP_NOFS.
> > > > > > > Uh this part doesn't make sense. Especially since you only do it for the
> > > > > > > debugfs file, not in general. Which means you've just completely broken
> > > > > > > the shrinker.
> > > > > > Are you sure? My impression is that GFP_NOFS should now work much more out
> > > > > > of the box with the memalloc_nofs_save()/memalloc_nofs_restore().
> > > > > Yeah, if you'd put it in the right place :-)
> > > > > 
> > > > > But also -mm folks are very clear that memalloc_no*() family is for dire
> > > > > situation where there's really no other way out. For anything where you
> > > > > know what you're doing, you really should use explicit gfp flags.
> > > > My impression is just the other way around. You should try to avoid the
> > > > NOFS/NOIO flags and use the memalloc_no* approach instead.
> > > Where did you get that idea?
> > 
> > Well from the kernel comment on GFP_NOFS:
> > 
> >  * %GFP_NOFS will use direct reclaim but will not use any filesystem
> > interfaces.
> >  * Please try to avoid using this flag directly and instead use
> >  * memalloc_nofs_{save,restore} to mark the whole scope which
> > cannot/shouldn't
> >  * recurse into the FS layer with a short explanation why. All allocation
> >  * requests will inherit GFP_NOFS implicitly.
> 
> Huh that's interesting, since iirc Willy or Dave told me the opposite, and
> the memalloc_no* stuff is for e.g. nfs calling into network layer (needs
> GFP_NOFS) or swap on top of a filesystems (even needs GFP_NOIO I think).
> 
> Adding them, maybe I got confused.

My impression is that the scoped API is preferred these days.

https://www.kernel.org/doc/html/latest/core-api/gfp_mask-from-fs-io.html

I'd probably need to spend a few months learning the DRM subsystem to
have a more detailed opinion on whether passing GFP flags around explicitly
or using the scope API is the better approach for your situation.

I usually defer to Michal on these kinds of questions.

> > > The kernel is full of explicit gfp_t flag
> > > passing to make this as explicit as possible. The memalloc_no* stuff
> > > is just for when you go through entire subsystems and really can't
> > > wire it through. I can't find the discussion anymore, but that was the
> > > advice I got from mm/fs people.
> > > 
> > > One reason is that generally a small GFP_KERNEL allocation never
> > > fails. But it absolutely can fail if it's in a memalloc_no* section,
> > > and these kind of non-obvious non-local effects are a real pain in
> > > testing and review. Hence explicit gfp_flag passing as much as
> > > possible.

I agree with this; it's definitely a problem with the scope API.  I wanted
to extend it to include GFP_NOWAIT, but if you do that, your chances of
memory allocation failure go way up, so you really want to set __GFP_NOWARN
too, but now you need to audit all the places that you're calling to be
sure they really handle errors correctly.

So I think I'm giving up on that patch set.

> > > > > > > If this is just to paper over the seq_printf doing the wrong allocations,
> > > > > > > then just move that out from under the fs_reclaim_acquire/release part.
> > > > > > No, that wasn't the problem.
> > > > > > 
> > > > > > We have just seen to many failures to allocate pages for swapout and I think
> > > > > > that would improve this because in a lot of cases we can then immediately
> > > > > > swap things out instead of having to rely on upper layers.
> > > > > Yeah, you broke it. Now the real shrinker is running with GFP_KERNEL,
> > > > > because your memalloc_no is only around the debugfs function. And ofc it's
> > > > > much easier to allocate with GFP_KERNEL, right until you deadlock :-)
> > > > The problem here is that for example kswapd calls the shrinker without
> > > > holding a FS lock as far as I can see.
> > > > 
> > > > And it is rather sad that we can't optimize this case directly.
> > > I'm still not clear what you want to optimize? You can check for "is
> > > this kswapd" in pf flags, but that sounds very hairy and fragile.
> > 
> > Well we only need the NOFS flag when the shrinker callback really comes from
> > a memory shortage in the FS subsystem, and that is rather unlikely.
> > 
> > When we would allow all other cases to be able to directly IO the freed up
> > pages to swap it would certainly help.
> 
> tbh I'm not sure. i915-gem code has played tricks with special casing the
> kswapd path, and they do kinda scare me at least. I'm not sure whether
> there's not some hidden dependencies there that would make this a bad
> idea. Like afaik direct reclaim can sometimes stall for kswapd to catch up
> a bit, or at least did in the past (I think, really not much clue about
> this)
> 
> The other thing is that the fs_reclaim_acquire/release annotation really
> only works well if you use it outside of the direct reclaim path too.
> Otherwise it's not much better than just lots of testing. That pretty much
> means you have to annotate the kswapd path.
> -Daniel
> 
> 
> 
> > 
> > Christian.
> > 
> > > -Daniel
> > > 
> > > > Anyway you are right if some caller doesn't use the memalloc_no*()
> > > > approach we are busted.
> > > > 
> > > > Going to change the patch to only not warn for the moment.
> > > > 
> > > > Regards,
> > > > Christian.
> > > > 
> > > > > Shrinking is hard, there's no easy way out here.
> > > > > 
> > > > > Cheers, Daniel
> > > > > 
> > > > > > Regards,
> > > > > > Christian.
> > > > > > 
> > > > > > 
> > > > > > > __GFP_NOWARN should be there indeed I think.
> > > > > > > -Daniel
> > > > > > > 
> > > > > > > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > > > > > > ---
> > > > > > > >     drivers/gpu/drm/ttm/ttm_tt.c | 5 ++++-
> > > > > > > >     1 file changed, 4 insertions(+), 1 deletion(-)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> > > > > > > > index 2f0833c98d2c..86fa3e82dacc 100644
> > > > > > > > --- a/drivers/gpu/drm/ttm/ttm_tt.c
> > > > > > > > +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> > > > > > > > @@ -369,7 +369,7 @@ static unsigned long ttm_tt_shrinker_scan(struct shrinker *shrink,
> > > > > > > >             };
> > > > > > > >             int ret;
> > > > > > > > -  ret = ttm_bo_swapout(&ctx, GFP_NOFS);
> > > > > > > > +  ret = ttm_bo_swapout(&ctx, GFP_KERNEL | __GFP_NOWARN);
> > > > > > > >             return ret < 0 ? SHRINK_EMPTY : ret;
> > > > > > > >     }
> > > > > > > > @@ -389,10 +389,13 @@ static unsigned long ttm_tt_shrinker_count(struct shrinker *shrink,
> > > > > > > >     static int ttm_tt_debugfs_shrink_show(struct seq_file *m, void *data)
> > > > > > > >     {
> > > > > > > >             struct shrink_control sc = { .gfp_mask = GFP_KERNEL };
> > > > > > > > +  unsigned int flags;
> > > > > > > >             fs_reclaim_acquire(GFP_KERNEL);
> > > > > > > > +  flags = memalloc_nofs_save();
> > > > > > > >             seq_printf(m, "%lu/%lu\n", ttm_tt_shrinker_count(&mm_shrinker, &sc),
> > > > > > > >                        ttm_tt_shrinker_scan(&mm_shrinker, &sc));
> > > > > > > > +  memalloc_nofs_restore(flags);
> > > > > > > >             fs_reclaim_release(GFP_KERNEL);
> > > > > > > >             return 0;
> > > > > > > > --
> > > > > > > > 2.25.1
> > > > > > > > 
> > > > > > > > _______________________________________________
> > > > > > > > dri-devel mailing list
> > > > > > > > dri-devel@lists.freedesktop.org
> > > > > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > > 
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-22 14:05               ` Matthew Wilcox
@ 2021-03-22 14:22                 ` Daniel Vetter
  2021-03-22 15:57                 ` Michal Hocko
  1 sibling, 0 replies; 29+ messages in thread
From: Daniel Vetter @ 2021-03-22 14:22 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christian König, Dave Chinner, Leo Liu, amd-gfx list,
	dri-devel, Linux MM, mhocko

On Mon, Mar 22, 2021 at 02:05:48PM +0000, Matthew Wilcox wrote:
> On Mon, Mar 22, 2021 at 02:49:27PM +0100, Daniel Vetter wrote:
> > On Sun, Mar 21, 2021 at 03:18:28PM +0100, Christian König wrote:
> > > Am 20.03.21 um 14:17 schrieb Daniel Vetter:
> > > > On Sat, Mar 20, 2021 at 10:04 AM Christian König
> > > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > > Am 19.03.21 um 20:06 schrieb Daniel Vetter:
> > > > > > On Fri, Mar 19, 2021 at 07:53:48PM +0100, Christian König wrote:
> > > > > > > Am 19.03.21 um 18:52 schrieb Daniel Vetter:
> > > > > > > > On Fri, Mar 19, 2021 at 03:08:57PM +0100, Christian König wrote:
> > > > > > > > > Don't print a warning when we fail to allocate a page for swapping things out.
> > > > > > > > > 
> > > > > > > > > Also rely on memalloc_nofs_save/memalloc_nofs_restore instead of GFP_NOFS.
> > > > > > > > Uh this part doesn't make sense. Especially since you only do it for the
> > > > > > > > debugfs file, not in general. Which means you've just completely broken
> > > > > > > > the shrinker.
> > > > > > > Are you sure? My impression is that GFP_NOFS should now work much more out
> > > > > > > of the box with the memalloc_nofs_save()/memalloc_nofs_restore().
> > > > > > Yeah, if you'd put it in the right place :-)
> > > > > > 
> > > > > > But also -mm folks are very clear that memalloc_no*() family is for dire
> > > > > > situation where there's really no other way out. For anything where you
> > > > > > know what you're doing, you really should use explicit gfp flags.
> > > > > My impression is just the other way around. You should try to avoid the
> > > > > NOFS/NOIO flags and use the memalloc_no* approach instead.
> > > > Where did you get that idea?
> > > 
> > > Well from the kernel comment on GFP_NOFS:
> > > 
> > >  * %GFP_NOFS will use direct reclaim but will not use any filesystem
> > > interfaces.
> > >  * Please try to avoid using this flag directly and instead use
> > >  * memalloc_nofs_{save,restore} to mark the whole scope which
> > > cannot/shouldn't
> > >  * recurse into the FS layer with a short explanation why. All allocation
> > >  * requests will inherit GFP_NOFS implicitly.
> > 
> > Huh that's interesting, since iirc Willy or Dave told me the opposite, and
> > the memalloc_no* stuff is for e.g. nfs calling into network layer (needs
> > GFP_NOFS) or swap on top of a filesystems (even needs GFP_NOIO I think).
> > 
> > Adding them, maybe I got confused.
> 
> My impression is that the scoped API is preferred these days.
> 
> https://www.kernel.org/doc/html/latest/core-api/gfp_mask-from-fs-io.html
> 
> I'd probably need to spend a few months learning the DRM subsystem to
> have a more detailed opinion on whether passing GFP flags around explicitly
> or using the scope API is the better approach for your situation.

Atm it's a single allocation in the ttm shrinker that's already explicitly
using GFP_NOFS that we're talking about here.

The scoped api might make sense for gpu scheduler, where we really operate
under GFP_NOWAIT for somewhat awkward reasons. But also I thought at least
for GFP_NOIO you generally need a mempool and think about how you
guarantee forward progress anyway. Is that also a bit outdated thinking,
and nowadays we could operate under the assumption that this Just Works?
Given that GFP_NOFS seems to fall over already for us I'm not super sure
about that ...

> I usually defer to Michal on these kinds of questions.
> 
> > > > The kernel is full of explicit gfp_t flag
> > > > passing to make this as explicit as possible. The memalloc_no* stuff
> > > > is just for when you go through entire subsystems and really can't
> > > > wire it through. I can't find the discussion anymore, but that was the
> > > > advice I got from mm/fs people.
> > > > 
> > > > One reason is that generally a small GFP_KERNEL allocation never
> > > > fails. But it absolutely can fail if it's in a memalloc_no* section,
> > > > and these kind of non-obvious non-local effects are a real pain in
> > > > testing and review. Hence explicit gfp_flag passing as much as
> > > > possible.
> 
> I agree with this; it's definitely a problem with the scope API.  I wanted
> to extend it to include GFP_NOWAIT, but if you do that, your chances of
> memory allocation failure go way up, so you really want to set __GFP_NOWARN
> too, but now you need to audit all the places that you're calling to be
> sure they really handle errors correctly.
> 
> So I think I'm giving up on that patch set.

Yeah the auditing is what scares me, and why at least personally I prefer
explicit gfp flags. It's much easier to debug a lockdep splat involving
fs_reclaim than memory allocation failures leading to very strange bugs
because we're not handling the allocation failure properly (or maybe not
even at all).
-Daniel

> 
> > > > > > > > If this is just to paper over the seq_printf doing the wrong allocations,
> > > > > > > > then just move that out from under the fs_reclaim_acquire/release part.
> > > > > > > No, that wasn't the problem.
> > > > > > > 
> > > > > > > We have just seen to many failures to allocate pages for swapout and I think
> > > > > > > that would improve this because in a lot of cases we can then immediately
> > > > > > > swap things out instead of having to rely on upper layers.
> > > > > > Yeah, you broke it. Now the real shrinker is running with GFP_KERNEL,
> > > > > > because your memalloc_no is only around the debugfs function. And ofc it's
> > > > > > much easier to allocate with GFP_KERNEL, right until you deadlock :-)
> > > > > The problem here is that for example kswapd calls the shrinker without
> > > > > holding a FS lock as far as I can see.
> > > > > 
> > > > > And it is rather sad that we can't optimize this case directly.
> > > > I'm still not clear what you want to optimize? You can check for "is
> > > > this kswapd" in pf flags, but that sounds very hairy and fragile.
> > > 
> > > Well we only need the NOFS flag when the shrinker callback really comes from
> > > a memory shortage in the FS subsystem, and that is rather unlikely.
> > > 
> > > When we would allow all other cases to be able to directly IO the freed up
> > > pages to swap it would certainly help.
> > 
> > tbh I'm not sure. i915-gem code has played tricks with special casing the
> > kswapd path, and they do kinda scare me at least. I'm not sure whether
> > there's not some hidden dependencies there that would make this a bad
> > idea. Like afaik direct reclaim can sometimes stall for kswapd to catch up
> > a bit, or at least did in the past (I think, really not much clue about
> > this)
> > 
> > The other thing is that the fs_reclaim_acquire/release annotation really
> > only works well if you use it outside of the direct reclaim path too.
> > Otherwise it's not much better than just lots of testing. That pretty much
> > means you have to annotate the kswapd path.
> > -Daniel
> > 
> > 
> > 
> > > 
> > > Christian.
> > > 
> > > > -Daniel
> > > > 
> > > > > Anyway you are right if some caller doesn't use the memalloc_no*()
> > > > > approach we are busted.
> > > > > 
> > > > > Going to change the patch to only not warn for the moment.
> > > > > 
> > > > > Regards,
> > > > > Christian.
> > > > > 
> > > > > > Shrinking is hard, there's no easy way out here.
> > > > > > 
> > > > > > Cheers, Daniel
> > > > > > 
> > > > > > > Regards,
> > > > > > > Christian.
> > > > > > > 
> > > > > > > 
> > > > > > > > __GFP_NOWARN should be there indeed I think.
> > > > > > > > -Daniel
> > > > > > > > 
> > > > > > > > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > > > > > > > ---
> > > > > > > > >     drivers/gpu/drm/ttm/ttm_tt.c | 5 ++++-
> > > > > > > > >     1 file changed, 4 insertions(+), 1 deletion(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> > > > > > > > > index 2f0833c98d2c..86fa3e82dacc 100644
> > > > > > > > > --- a/drivers/gpu/drm/ttm/ttm_tt.c
> > > > > > > > > +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> > > > > > > > > @@ -369,7 +369,7 @@ static unsigned long ttm_tt_shrinker_scan(struct shrinker *shrink,
> > > > > > > > >             };
> > > > > > > > >             int ret;
> > > > > > > > > -  ret = ttm_bo_swapout(&ctx, GFP_NOFS);
> > > > > > > > > +  ret = ttm_bo_swapout(&ctx, GFP_KERNEL | __GFP_NOWARN);
> > > > > > > > >             return ret < 0 ? SHRINK_EMPTY : ret;
> > > > > > > > >     }
> > > > > > > > > @@ -389,10 +389,13 @@ static unsigned long ttm_tt_shrinker_count(struct shrinker *shrink,
> > > > > > > > >     static int ttm_tt_debugfs_shrink_show(struct seq_file *m, void *data)
> > > > > > > > >     {
> > > > > > > > >             struct shrink_control sc = { .gfp_mask = GFP_KERNEL };
> > > > > > > > > +  unsigned int flags;
> > > > > > > > >             fs_reclaim_acquire(GFP_KERNEL);
> > > > > > > > > +  flags = memalloc_nofs_save();
> > > > > > > > >             seq_printf(m, "%lu/%lu\n", ttm_tt_shrinker_count(&mm_shrinker, &sc),
> > > > > > > > >                        ttm_tt_shrinker_scan(&mm_shrinker, &sc));
> > > > > > > > > +  memalloc_nofs_restore(flags);
> > > > > > > > >             fs_reclaim_release(GFP_KERNEL);
> > > > > > > > >             return 0;
> > > > > > > > > --
> > > > > > > > > 2.25.1
> > > > > > > > > 
> > > > > > > > > _______________________________________________
> > > > > > > > > dri-devel mailing list
> > > > > > > > > dri-devel@lists.freedesktop.org
> > > > > > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > > > 
> > > 
> > 
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-22 14:05               ` Matthew Wilcox
  2021-03-22 14:22                 ` Daniel Vetter
@ 2021-03-22 15:57                 ` Michal Hocko
  2021-03-22 17:02                   ` Daniel Vetter
  1 sibling, 1 reply; 29+ messages in thread
From: Michal Hocko @ 2021-03-22 15:57 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christian König, Dave Chinner, Leo Liu, amd-gfx list,
	dri-devel, Linux MM

On Mon 22-03-21 14:05:48, Matthew Wilcox wrote:
> On Mon, Mar 22, 2021 at 02:49:27PM +0100, Daniel Vetter wrote:
> > On Sun, Mar 21, 2021 at 03:18:28PM +0100, Christian König wrote:
> > > Am 20.03.21 um 14:17 schrieb Daniel Vetter:
> > > > On Sat, Mar 20, 2021 at 10:04 AM Christian König
> > > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > > Am 19.03.21 um 20:06 schrieb Daniel Vetter:
> > > > > > On Fri, Mar 19, 2021 at 07:53:48PM +0100, Christian König wrote:
> > > > > > > Am 19.03.21 um 18:52 schrieb Daniel Vetter:
> > > > > > > > On Fri, Mar 19, 2021 at 03:08:57PM +0100, Christian König wrote:
> > > > > > > > > Don't print a warning when we fail to allocate a page for swapping things out.
> > > > > > > > > 
> > > > > > > > > Also rely on memalloc_nofs_save/memalloc_nofs_restore instead of GFP_NOFS.
> > > > > > > > Uh this part doesn't make sense. Especially since you only do it for the
> > > > > > > > debugfs file, not in general. Which means you've just completely broken
> > > > > > > > the shrinker.
> > > > > > > Are you sure? My impression is that GFP_NOFS should now work much more out
> > > > > > > of the box with the memalloc_nofs_save()/memalloc_nofs_restore().
> > > > > > Yeah, if you'd put it in the right place :-)
> > > > > > 
> > > > > > But also -mm folks are very clear that memalloc_no*() family is for dire
> > > > > > situation where there's really no other way out. For anything where you
> > > > > > know what you're doing, you really should use explicit gfp flags.
> > > > > My impression is just the other way around. You should try to avoid the
> > > > > NOFS/NOIO flags and use the memalloc_no* approach instead.
> > > > Where did you get that idea?
> > > 
> > > Well from the kernel comment on GFP_NOFS:
> > > 
> > >  * %GFP_NOFS will use direct reclaim but will not use any filesystem
> > > interfaces.
> > >  * Please try to avoid using this flag directly and instead use
> > >  * memalloc_nofs_{save,restore} to mark the whole scope which
> > > cannot/shouldn't
> > >  * recurse into the FS layer with a short explanation why. All allocation
> > >  * requests will inherit GFP_NOFS implicitly.
> > 
> > Huh that's interesting, since iirc Willy or Dave told me the opposite, and
> > the memalloc_no* stuff is for e.g. nfs calling into network layer (needs
> > GFP_NOFS) or swap on top of a filesystems (even needs GFP_NOIO I think).
> > 
> > Adding them, maybe I got confused.
> 
> My impression is that the scoped API is preferred these days.
> 
> https://www.kernel.org/doc/html/latest/core-api/gfp_mask-from-fs-io.html
> 
> I'd probably need to spend a few months learning the DRM subsystem to
> have a more detailed opinion on whether passing GFP flags around explicitly
> or using the scope API is the better approach for your situation.

yes, in an ideal world we would have a clearly defined scope of the
reclaim recursion wrt FS/IO associated with it. I've got back to
https://lore.kernel.org/amd-gfx/20210319140857.2262-1-christian.koenig@amd.com/
and there are two things standing out. Why does ttm_tt_debugfs_shrink_show
really require NOFS semantic? And why does it play with
fs_reclaim_acquire?

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-22 15:57                 ` Michal Hocko
@ 2021-03-22 17:02                   ` Daniel Vetter
  2021-03-22 19:34                     ` Christian König
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Vetter @ 2021-03-22 17:02 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Matthew Wilcox, Christian König, dri-devel, Linux MM,
	amd-gfx list, Dave Chinner, Leo Liu

On Mon, Mar 22, 2021 at 5:06 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 22-03-21 14:05:48, Matthew Wilcox wrote:
> > On Mon, Mar 22, 2021 at 02:49:27PM +0100, Daniel Vetter wrote:
> > > On Sun, Mar 21, 2021 at 03:18:28PM +0100, Christian König wrote:
> > > > Am 20.03.21 um 14:17 schrieb Daniel Vetter:
> > > > > On Sat, Mar 20, 2021 at 10:04 AM Christian König
> > > > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > > > Am 19.03.21 um 20:06 schrieb Daniel Vetter:
> > > > > > > On Fri, Mar 19, 2021 at 07:53:48PM +0100, Christian König wrote:
> > > > > > > > Am 19.03.21 um 18:52 schrieb Daniel Vetter:
> > > > > > > > > On Fri, Mar 19, 2021 at 03:08:57PM +0100, Christian König wrote:
> > > > > > > > > > Don't print a warning when we fail to allocate a page for swapping things out.
> > > > > > > > > >
> > > > > > > > > > Also rely on memalloc_nofs_save/memalloc_nofs_restore instead of GFP_NOFS.
> > > > > > > > > Uh this part doesn't make sense. Especially since you only do it for the
> > > > > > > > > debugfs file, not in general. Which means you've just completely broken
> > > > > > > > > the shrinker.
> > > > > > > > Are you sure? My impression is that GFP_NOFS should now work much more out
> > > > > > > > of the box with the memalloc_nofs_save()/memalloc_nofs_restore().
> > > > > > > Yeah, if you'd put it in the right place :-)
> > > > > > >
> > > > > > > But also -mm folks are very clear that memalloc_no*() family is for dire
> > > > > > > situation where there's really no other way out. For anything where you
> > > > > > > know what you're doing, you really should use explicit gfp flags.
> > > > > > My impression is just the other way around. You should try to avoid the
> > > > > > NOFS/NOIO flags and use the memalloc_no* approach instead.
> > > > > Where did you get that idea?
> > > >
> > > > Well from the kernel comment on GFP_NOFS:
> > > >
> > > >  * %GFP_NOFS will use direct reclaim but will not use any filesystem
> > > > interfaces.
> > > >  * Please try to avoid using this flag directly and instead use
> > > >  * memalloc_nofs_{save,restore} to mark the whole scope which
> > > > cannot/shouldn't
> > > >  * recurse into the FS layer with a short explanation why. All allocation
> > > >  * requests will inherit GFP_NOFS implicitly.
> > >
> > > Huh that's interesting, since iirc Willy or Dave told me the opposite, and
> > > the memalloc_no* stuff is for e.g. nfs calling into network layer (needs
> > > GFP_NOFS) or swap on top of a filesystems (even needs GFP_NOIO I think).
> > >
> > > Adding them, maybe I got confused.
> >
> > My impression is that the scoped API is preferred these days.
> >
> > https://www.kernel.org/doc/html/latest/core-api/gfp_mask-from-fs-io.html
> >
> > I'd probably need to spend a few months learning the DRM subsystem to
> > have a more detailed opinion on whether passing GFP flags around explicitly
> > or using the scope API is the better approach for your situation.
>
> yes, in an ideal world we would have a clearly defined scope of the
> reclaim recursion wrt FS/IO associated with it. I've got back to
> https://lore.kernel.org/amd-gfx/20210319140857.2262-1-christian.koenig@amd.com/
> and there are two things standing out. Why does ttm_tt_debugfs_shrink_show
> really require NOFS semantic? And why does it play with
> fs_reclaim_acquire?

It's our shrinker. shrink_show simply triggers that specific shrinker
asking it to shrink everything it can, which helps a lot with testing
without having to drive the entire system against the OOM wall.
fs_reclaim_acquire is there to make sure lockdep understands that this
is a shrinker and that it checks all the dependencies for us like if
we'd be in real reclaim. There is some drop caches interfaces in proc
iirc, but those drop everything, and they don't have the fs_reclaim
annotations to teach lockdep about what we're doing.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-22 17:02                   ` Daniel Vetter
@ 2021-03-22 19:34                     ` Christian König
  2021-03-23  7:38                       ` Michal Hocko
  0 siblings, 1 reply; 29+ messages in thread
From: Christian König @ 2021-03-22 19:34 UTC (permalink / raw)
  To: Daniel Vetter, Michal Hocko
  Cc: Matthew Wilcox, dri-devel, Linux MM, amd-gfx list, Dave Chinner, Leo Liu

Am 22.03.21 um 18:02 schrieb Daniel Vetter:
> On Mon, Mar 22, 2021 at 5:06 PM Michal Hocko <mhocko@suse.com> wrote:
>> On Mon 22-03-21 14:05:48, Matthew Wilcox wrote:
>>> On Mon, Mar 22, 2021 at 02:49:27PM +0100, Daniel Vetter wrote:
>>>> On Sun, Mar 21, 2021 at 03:18:28PM +0100, Christian König wrote:
>>>>> Am 20.03.21 um 14:17 schrieb Daniel Vetter:
>>>>>> On Sat, Mar 20, 2021 at 10:04 AM Christian König
>>>>>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>>>> Am 19.03.21 um 20:06 schrieb Daniel Vetter:
>>>>>>>> On Fri, Mar 19, 2021 at 07:53:48PM +0100, Christian König wrote:
>>>>>>>>> Am 19.03.21 um 18:52 schrieb Daniel Vetter:
>>>>>>>>>> On Fri, Mar 19, 2021 at 03:08:57PM +0100, Christian König wrote:
>>>>>>>>>>> Don't print a warning when we fail to allocate a page for swapping things out.
>>>>>>>>>>>
>>>>>>>>>>> Also rely on memalloc_nofs_save/memalloc_nofs_restore instead of GFP_NOFS.
>>>>>>>>>> Uh this part doesn't make sense. Especially since you only do it for the
>>>>>>>>>> debugfs file, not in general. Which means you've just completely broken
>>>>>>>>>> the shrinker.
>>>>>>>>> Are you sure? My impression is that GFP_NOFS should now work much more out
>>>>>>>>> of the box with the memalloc_nofs_save()/memalloc_nofs_restore().
>>>>>>>> Yeah, if you'd put it in the right place :-)
>>>>>>>>
>>>>>>>> But also -mm folks are very clear that memalloc_no*() family is for dire
>>>>>>>> situation where there's really no other way out. For anything where you
>>>>>>>> know what you're doing, you really should use explicit gfp flags.
>>>>>>> My impression is just the other way around. You should try to avoid the
>>>>>>> NOFS/NOIO flags and use the memalloc_no* approach instead.
>>>>>> Where did you get that idea?
>>>>> Well from the kernel comment on GFP_NOFS:
>>>>>
>>>>>   * %GFP_NOFS will use direct reclaim but will not use any filesystem
>>>>> interfaces.
>>>>>   * Please try to avoid using this flag directly and instead use
>>>>>   * memalloc_nofs_{save,restore} to mark the whole scope which
>>>>> cannot/shouldn't
>>>>>   * recurse into the FS layer with a short explanation why. All allocation
>>>>>   * requests will inherit GFP_NOFS implicitly.
>>>> Huh that's interesting, since iirc Willy or Dave told me the opposite, and
>>>> the memalloc_no* stuff is for e.g. nfs calling into network layer (needs
>>>> GFP_NOFS) or swap on top of a filesystems (even needs GFP_NOIO I think).
>>>>
>>>> Adding them, maybe I got confused.
>>> My impression is that the scoped API is preferred these days.
>>>
>>> https://www.kernel.org/doc/html/latest/core-api/gfp_mask-from-fs-io.html
>>>
>>> I'd probably need to spend a few months learning the DRM subsystem to
>>> have a more detailed opinion on whether passing GFP flags around explicitly
>>> or using the scope API is the better approach for your situation.
>> yes, in an ideal world we would have a clearly defined scope of the
>> reclaim recursion wrt FS/IO associated with it. I've got back to
>> https://lore.kernel.org/amd-gfx/20210319140857.2262-1-christian.koenig@amd.com/
>> and there are two things standing out. Why does ttm_tt_debugfs_shrink_show
>> really require NOFS semantic? And why does it play with
>> fs_reclaim_acquire?
> It's our shrinker. shrink_show simply triggers that specific shrinker
> asking it to shrink everything it can, which helps a lot with testing
> without having to drive the entire system against the OOM wall.
> fs_reclaim_acquire is there to make sure lockdep understands that this
> is a shrinker and that it checks all the dependencies for us like if
> we'd be in real reclaim. There is some drop caches interfaces in proc
> iirc, but those drop everything, and they don't have the fs_reclaim
> annotations to teach lockdep about what we're doing.

To summarize the debugfs code is basically to test if that stuff really 
works with GFP_NOFS.

My only concern is that if I could rely on memalloc_no* being used we 
could optimize this quite a bit further.

Regards,
Christian.

> -Daniel



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-22 19:34                     ` Christian König
@ 2021-03-23  7:38                       ` Michal Hocko
  2021-03-23 11:28                         ` Daniel Vetter
  0 siblings, 1 reply; 29+ messages in thread
From: Michal Hocko @ 2021-03-23  7:38 UTC (permalink / raw)
  To: Christian König
  Cc: Daniel Vetter, Matthew Wilcox, dri-devel, Linux MM, amd-gfx list,
	Dave Chinner, Leo Liu

On Mon 22-03-21 20:34:25, Christian König wrote:
> Am 22.03.21 um 18:02 schrieb Daniel Vetter:
> > On Mon, Mar 22, 2021 at 5:06 PM Michal Hocko <mhocko@suse.com> wrote:
> > > On Mon 22-03-21 14:05:48, Matthew Wilcox wrote:
> > > > On Mon, Mar 22, 2021 at 02:49:27PM +0100, Daniel Vetter wrote:
> > > > > On Sun, Mar 21, 2021 at 03:18:28PM +0100, Christian König wrote:
> > > > > > Am 20.03.21 um 14:17 schrieb Daniel Vetter:
> > > > > > > On Sat, Mar 20, 2021 at 10:04 AM Christian König
> > > > > > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > > > > > Am 19.03.21 um 20:06 schrieb Daniel Vetter:
> > > > > > > > > On Fri, Mar 19, 2021 at 07:53:48PM +0100, Christian König wrote:
> > > > > > > > > > Am 19.03.21 um 18:52 schrieb Daniel Vetter:
> > > > > > > > > > > On Fri, Mar 19, 2021 at 03:08:57PM +0100, Christian König wrote:
> > > > > > > > > > > > Don't print a warning when we fail to allocate a page for swapping things out.
> > > > > > > > > > > > 
> > > > > > > > > > > > Also rely on memalloc_nofs_save/memalloc_nofs_restore instead of GFP_NOFS.
> > > > > > > > > > > Uh this part doesn't make sense. Especially since you only do it for the
> > > > > > > > > > > debugfs file, not in general. Which means you've just completely broken
> > > > > > > > > > > the shrinker.
> > > > > > > > > > Are you sure? My impression is that GFP_NOFS should now work much more out
> > > > > > > > > > of the box with the memalloc_nofs_save()/memalloc_nofs_restore().
> > > > > > > > > Yeah, if you'd put it in the right place :-)
> > > > > > > > > 
> > > > > > > > > But also -mm folks are very clear that memalloc_no*() family is for dire
> > > > > > > > > situation where there's really no other way out. For anything where you
> > > > > > > > > know what you're doing, you really should use explicit gfp flags.
> > > > > > > > My impression is just the other way around. You should try to avoid the
> > > > > > > > NOFS/NOIO flags and use the memalloc_no* approach instead.
> > > > > > > Where did you get that idea?
> > > > > > Well from the kernel comment on GFP_NOFS:
> > > > > > 
> > > > > >   * %GFP_NOFS will use direct reclaim but will not use any filesystem
> > > > > > interfaces.
> > > > > >   * Please try to avoid using this flag directly and instead use
> > > > > >   * memalloc_nofs_{save,restore} to mark the whole scope which
> > > > > > cannot/shouldn't
> > > > > >   * recurse into the FS layer with a short explanation why. All allocation
> > > > > >   * requests will inherit GFP_NOFS implicitly.
> > > > > Huh that's interesting, since iirc Willy or Dave told me the opposite, and
> > > > > the memalloc_no* stuff is for e.g. nfs calling into network layer (needs
> > > > > GFP_NOFS) or swap on top of a filesystems (even needs GFP_NOIO I think).
> > > > > 
> > > > > Adding them, maybe I got confused.
> > > > My impression is that the scoped API is preferred these days.
> > > > 
> > > > https://www.kernel.org/doc/html/latest/core-api/gfp_mask-from-fs-io.html
> > > > 
> > > > I'd probably need to spend a few months learning the DRM subsystem to
> > > > have a more detailed opinion on whether passing GFP flags around explicitly
> > > > or using the scope API is the better approach for your situation.
> > > yes, in an ideal world we would have a clearly defined scope of the
> > > reclaim recursion wrt FS/IO associated with it. I've got back to
> > > https://lore.kernel.org/amd-gfx/20210319140857.2262-1-christian.koenig@amd.com/
> > > and there are two things standing out. Why does ttm_tt_debugfs_shrink_show
> > > really require NOFS semantic? And why does it play with
> > > fs_reclaim_acquire?
> > It's our shrinker. shrink_show simply triggers that specific shrinker
> > asking it to shrink everything it can, which helps a lot with testing
> > without having to drive the entire system against the OOM wall.

Yes I figured that much. But...

> > fs_reclaim_acquire is there to make sure lockdep understands that this
> > is a shrinker and that it checks all the dependencies for us like if
> > we'd be in real reclaim. There is some drop caches interfaces in proc
> > iirc, but those drop everything, and they don't have the fs_reclaim
> > annotations to teach lockdep about what we're doing.

... I really do not follow this. You shouldn't really care whether this
is a reclaim interface or not. Or maybe I just do not understand this...
 
> To summarize the debugfs code is basically to test if that stuff really
> works with GFP_NOFS.

What do you mean by testing GFP_NOFS. Do you mean to test that GFP_NOFS
context is sufficiently powerful to reclaim enough objects due to some
internal constrains?

> My only concern is that if I could rely on memalloc_no* being used we could
> optimize this quite a bit further.

Yes you can use the scope API and you will be guaranteed that _any_
allocation from the enclosed context will inherit GFP_NO* semantic.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-23  7:38                       ` Michal Hocko
@ 2021-03-23 11:28                         ` Daniel Vetter
  2021-03-23 11:46                           ` Michal Hocko
  2021-03-23 11:48                           ` Christian König
  0 siblings, 2 replies; 29+ messages in thread
From: Daniel Vetter @ 2021-03-23 11:28 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Christian König, Daniel Vetter, Matthew Wilcox, dri-devel,
	Linux MM, amd-gfx list, Dave Chinner, Leo Liu

On Tue, Mar 23, 2021 at 08:38:33AM +0100, Michal Hocko wrote:
> On Mon 22-03-21 20:34:25, Christian König wrote:
> > Am 22.03.21 um 18:02 schrieb Daniel Vetter:
> > > On Mon, Mar 22, 2021 at 5:06 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > On Mon 22-03-21 14:05:48, Matthew Wilcox wrote:
> > > > > On Mon, Mar 22, 2021 at 02:49:27PM +0100, Daniel Vetter wrote:
> > > > > > On Sun, Mar 21, 2021 at 03:18:28PM +0100, Christian König wrote:
> > > > > > > Am 20.03.21 um 14:17 schrieb Daniel Vetter:
> > > > > > > > On Sat, Mar 20, 2021 at 10:04 AM Christian König
> > > > > > > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > > > > > > Am 19.03.21 um 20:06 schrieb Daniel Vetter:
> > > > > > > > > > On Fri, Mar 19, 2021 at 07:53:48PM +0100, Christian König wrote:
> > > > > > > > > > > Am 19.03.21 um 18:52 schrieb Daniel Vetter:
> > > > > > > > > > > > On Fri, Mar 19, 2021 at 03:08:57PM +0100, Christian König wrote:
> > > > > > > > > > > > > Don't print a warning when we fail to allocate a page for swapping things out.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Also rely on memalloc_nofs_save/memalloc_nofs_restore instead of GFP_NOFS.
> > > > > > > > > > > > Uh this part doesn't make sense. Especially since you only do it for the
> > > > > > > > > > > > debugfs file, not in general. Which means you've just completely broken
> > > > > > > > > > > > the shrinker.
> > > > > > > > > > > Are you sure? My impression is that GFP_NOFS should now work much more out
> > > > > > > > > > > of the box with the memalloc_nofs_save()/memalloc_nofs_restore().
> > > > > > > > > > Yeah, if you'd put it in the right place :-)
> > > > > > > > > > 
> > > > > > > > > > But also -mm folks are very clear that memalloc_no*() family is for dire
> > > > > > > > > > situation where there's really no other way out. For anything where you
> > > > > > > > > > know what you're doing, you really should use explicit gfp flags.
> > > > > > > > > My impression is just the other way around. You should try to avoid the
> > > > > > > > > NOFS/NOIO flags and use the memalloc_no* approach instead.
> > > > > > > > Where did you get that idea?
> > > > > > > Well from the kernel comment on GFP_NOFS:
> > > > > > > 
> > > > > > >   * %GFP_NOFS will use direct reclaim but will not use any filesystem
> > > > > > > interfaces.
> > > > > > >   * Please try to avoid using this flag directly and instead use
> > > > > > >   * memalloc_nofs_{save,restore} to mark the whole scope which
> > > > > > > cannot/shouldn't
> > > > > > >   * recurse into the FS layer with a short explanation why. All allocation
> > > > > > >   * requests will inherit GFP_NOFS implicitly.
> > > > > > Huh that's interesting, since iirc Willy or Dave told me the opposite, and
> > > > > > the memalloc_no* stuff is for e.g. nfs calling into network layer (needs
> > > > > > GFP_NOFS) or swap on top of a filesystems (even needs GFP_NOIO I think).
> > > > > > 
> > > > > > Adding them, maybe I got confused.
> > > > > My impression is that the scoped API is preferred these days.
> > > > > 
> > > > > https://www.kernel.org/doc/html/latest/core-api/gfp_mask-from-fs-io.html
> > > > > 
> > > > > I'd probably need to spend a few months learning the DRM subsystem to
> > > > > have a more detailed opinion on whether passing GFP flags around explicitly
> > > > > or using the scope API is the better approach for your situation.
> > > > yes, in an ideal world we would have a clearly defined scope of the
> > > > reclaim recursion wrt FS/IO associated with it. I've got back to
> > > > https://lore.kernel.org/amd-gfx/20210319140857.2262-1-christian.koenig@amd.com/
> > > > and there are two things standing out. Why does ttm_tt_debugfs_shrink_show
> > > > really require NOFS semantic? And why does it play with
> > > > fs_reclaim_acquire?
> > > It's our shrinker. shrink_show simply triggers that specific shrinker
> > > asking it to shrink everything it can, which helps a lot with testing
> > > without having to drive the entire system against the OOM wall.
> 
> Yes I figured that much. But...
> 
> > > fs_reclaim_acquire is there to make sure lockdep understands that this
> > > is a shrinker and that it checks all the dependencies for us like if
> > > we'd be in real reclaim. There is some drop caches interfaces in proc
> > > iirc, but those drop everything, and they don't have the fs_reclaim
> > > annotations to teach lockdep about what we're doing.
> 
> ... I really do not follow this. You shouldn't really care whether this
> is a reclaim interface or not. Or maybe I just do not understand this...

We're heavily relying on lockdep and fs_reclaim to make sure we get it all
right. So any drop caches interface that isn't wrapped in fs_reclaim
context is kinda useless for testing. Plus ideally we want to only hit our
own paths, and not trash every other cache in the system. Speed matters in
CI.

> > To summarize the debugfs code is basically to test if that stuff really
> > works with GFP_NOFS.
> 
> What do you mean by testing GFP_NOFS. Do you mean to test that GFP_NOFS
> context is sufficiently powerful to reclaim enough objects due to some
> internal constrains?
> 
> > My only concern is that if I could rely on memalloc_no* being used we could
> > optimize this quite a bit further.
> 
> Yes you can use the scope API and you will be guaranteed that _any_
> allocation from the enclosed context will inherit GFP_NO* semantic.

I think this is where I don't get yet what Christian tries to do: We
really shouldn't do different tricks and calling contexts between direct
reclaim and kswapd reclaim. Otherwise very hard to track down bugs are
pretty much guaranteed. So whether we use explicit gfp flags or the
context apis, result is exactly the same.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-23 11:28                         ` Daniel Vetter
@ 2021-03-23 11:46                           ` Michal Hocko
  2021-03-23 11:51                             ` Christian König
  2021-03-23 11:48                           ` Christian König
  1 sibling, 1 reply; 29+ messages in thread
From: Michal Hocko @ 2021-03-23 11:46 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Christian König, Matthew Wilcox, dri-devel, Linux MM,
	amd-gfx list, Dave Chinner, Leo Liu

On Tue 23-03-21 12:28:20, Daniel Vetter wrote:
> On Tue, Mar 23, 2021 at 08:38:33AM +0100, Michal Hocko wrote:
[...]
> > > > fs_reclaim_acquire is there to make sure lockdep understands that this
> > > > is a shrinker and that it checks all the dependencies for us like if
> > > > we'd be in real reclaim. There is some drop caches interfaces in proc
> > > > iirc, but those drop everything, and they don't have the fs_reclaim
> > > > annotations to teach lockdep about what we're doing.
> > 
> > ... I really do not follow this. You shouldn't really care whether this
> > is a reclaim interface or not. Or maybe I just do not understand this...
> 
> We're heavily relying on lockdep and fs_reclaim to make sure we get it all
> right. So any drop caches interface that isn't wrapped in fs_reclaim
> context is kinda useless for testing. Plus ideally we want to only hit our
> own paths, and not trash every other cache in the system. Speed matters in
> CI.

But what is special about this path to hack around and make it pretend
it is part of the fs reclaim path?
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-23 11:28                         ` Daniel Vetter
  2021-03-23 11:46                           ` Michal Hocko
@ 2021-03-23 11:48                           ` Christian König
  2021-03-23 12:04                             ` Michal Hocko
  1 sibling, 1 reply; 29+ messages in thread
From: Christian König @ 2021-03-23 11:48 UTC (permalink / raw)
  To: Michal Hocko, Matthew Wilcox, dri-devel, Linux MM, amd-gfx list,
	Dave Chinner, Leo Liu

Am 23.03.21 um 12:28 schrieb Daniel Vetter:
> On Tue, Mar 23, 2021 at 08:38:33AM +0100, Michal Hocko wrote:
>> On Mon 22-03-21 20:34:25, Christian König wrote:
>>> Am 22.03.21 um 18:02 schrieb Daniel Vetter:
>>>> On Mon, Mar 22, 2021 at 5:06 PM Michal Hocko <mhocko@suse.com> wrote:
>>>>> On Mon 22-03-21 14:05:48, Matthew Wilcox wrote:
>>>>>> On Mon, Mar 22, 2021 at 02:49:27PM +0100, Daniel Vetter wrote:
>>>>>>> On Sun, Mar 21, 2021 at 03:18:28PM +0100, Christian König wrote:
>>>>>>>> Am 20.03.21 um 14:17 schrieb Daniel Vetter:
>>>>>>>>> On Sat, Mar 20, 2021 at 10:04 AM Christian König
>>>>>>>>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>>>>>>> Am 19.03.21 um 20:06 schrieb Daniel Vetter:
>>>>>>>>>>> On Fri, Mar 19, 2021 at 07:53:48PM +0100, Christian König wrote:
>>>>>>>>>>>> Am 19.03.21 um 18:52 schrieb Daniel Vetter:
>>>>>>>>>>>>> On Fri, Mar 19, 2021 at 03:08:57PM +0100, Christian König wrote:
>>>>>>>>>>>>>> Don't print a warning when we fail to allocate a page for swapping things out.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also rely on memalloc_nofs_save/memalloc_nofs_restore instead of GFP_NOFS.
>>>>>>>>>>>>> Uh this part doesn't make sense. Especially since you only do it for the
>>>>>>>>>>>>> debugfs file, not in general. Which means you've just completely broken
>>>>>>>>>>>>> the shrinker.
>>>>>>>>>>>> Are you sure? My impression is that GFP_NOFS should now work much more out
>>>>>>>>>>>> of the box with the memalloc_nofs_save()/memalloc_nofs_restore().
>>>>>>>>>>> Yeah, if you'd put it in the right place :-)
>>>>>>>>>>>
>>>>>>>>>>> But also -mm folks are very clear that memalloc_no*() family is for dire
>>>>>>>>>>> situation where there's really no other way out. For anything where you
>>>>>>>>>>> know what you're doing, you really should use explicit gfp flags.
>>>>>>>>>> My impression is just the other way around. You should try to avoid the
>>>>>>>>>> NOFS/NOIO flags and use the memalloc_no* approach instead.
>>>>>>>>> Where did you get that idea?
>>>>>>>> Well from the kernel comment on GFP_NOFS:
>>>>>>>>
>>>>>>>>    * %GFP_NOFS will use direct reclaim but will not use any filesystem
>>>>>>>> interfaces.
>>>>>>>>    * Please try to avoid using this flag directly and instead use
>>>>>>>>    * memalloc_nofs_{save,restore} to mark the whole scope which
>>>>>>>> cannot/shouldn't
>>>>>>>>    * recurse into the FS layer with a short explanation why. All allocation
>>>>>>>>    * requests will inherit GFP_NOFS implicitly.
>>>>>>> Huh that's interesting, since iirc Willy or Dave told me the opposite, and
>>>>>>> the memalloc_no* stuff is for e.g. nfs calling into network layer (needs
>>>>>>> GFP_NOFS) or swap on top of a filesystems (even needs GFP_NOIO I think).
>>>>>>>
>>>>>>> Adding them, maybe I got confused.
>>>>>> My impression is that the scoped API is preferred these days.
>>>>>>
>>>>>> https://www.kernel.org/doc/html/latest/core-api/gfp_mask-from-fs-io.html
>>>>>>
>>>>>> I'd probably need to spend a few months learning the DRM subsystem to
>>>>>> have a more detailed opinion on whether passing GFP flags around explicitly
>>>>>> or using the scope API is the better approach for your situation.
>>>>> yes, in an ideal world we would have a clearly defined scope of the
>>>>> reclaim recursion wrt FS/IO associated with it. I've got back to
>>>>> https://lore.kernel.org/amd-gfx/20210319140857.2262-1-christian.koenig@amd.com/
>>>>> and there are two things standing out. Why does ttm_tt_debugfs_shrink_show
>>>>> really require NOFS semantic? And why does it play with
>>>>> fs_reclaim_acquire?
>>>> It's our shrinker. shrink_show simply triggers that specific shrinker
>>>> asking it to shrink everything it can, which helps a lot with testing
>>>> without having to drive the entire system against the OOM wall.
>> Yes I figured that much. But...
>>
>>>> fs_reclaim_acquire is there to make sure lockdep understands that this
>>>> is a shrinker and that it checks all the dependencies for us like if
>>>> we'd be in real reclaim. There is some drop caches interfaces in proc
>>>> iirc, but those drop everything, and they don't have the fs_reclaim
>>>> annotations to teach lockdep about what we're doing.
>> ... I really do not follow this. You shouldn't really care whether this
>> is a reclaim interface or not. Or maybe I just do not understand this...
> We're heavily relying on lockdep and fs_reclaim to make sure we get it all
> right. So any drop caches interface that isn't wrapped in fs_reclaim
> context is kinda useless for testing. Plus ideally we want to only hit our
> own paths, and not trash every other cache in the system. Speed matters in
> CI.
>
>>> To summarize the debugfs code is basically to test if that stuff really
>>> works with GFP_NOFS.
>> What do you mean by testing GFP_NOFS. Do you mean to test that GFP_NOFS
>> context is sufficiently powerful to reclaim enough objects due to some
>> internal constrains?
>>
>>> My only concern is that if I could rely on memalloc_no* being used we could
>>> optimize this quite a bit further.
>> Yes you can use the scope API and you will be guaranteed that _any_
>> allocation from the enclosed context will inherit GFP_NO* semantic.

The question is if this is also guaranteed the other way around?

In other words if somebody calls get_free_page(GFP_NOFS) are the context 
flags set as well?

>> I think this is where I don't get yet what Christian tries to do: We
>> really shouldn't do different tricks and calling contexts between direct
>> reclaim and kswapd reclaim. Otherwise very hard to track down bugs are
>> pretty much guaranteed. So whether we use explicit gfp flags or the
>> context apis, result is exactly the same.

Ok let us recap what TTMs TT shrinker does here:

1. We got memory which is not swapable because it might be accessed by 
the GPU at any time.
2. Make sure the memory is not accessed by the GPU and driver need to 
grab a lock before they can make it accessible again.
3. Allocate a shmem file and copy over the not swapable pages.
4. Free the not swapable/reclaimable pages.

The pages we got from the shmem file are easily swapable to disk after 
the copy is completed. But only if IO is not already blocked because the 
shrinker was called from an allocation restricted by GFP_NOFS or GFP_NOIO.

Regards,
Christian.

> -Daniel



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-23 11:46                           ` Michal Hocko
@ 2021-03-23 11:51                             ` Christian König
  2021-03-23 12:00                               ` Daniel Vetter
  2021-03-23 12:05                               ` Michal Hocko
  0 siblings, 2 replies; 29+ messages in thread
From: Christian König @ 2021-03-23 11:51 UTC (permalink / raw)
  To: Michal Hocko, Daniel Vetter
  Cc: Matthew Wilcox, dri-devel, Linux MM, amd-gfx list, Dave Chinner, Leo Liu



Am 23.03.21 um 12:46 schrieb Michal Hocko:
> On Tue 23-03-21 12:28:20, Daniel Vetter wrote:
>> On Tue, Mar 23, 2021 at 08:38:33AM +0100, Michal Hocko wrote:
> [...]
>>>>> fs_reclaim_acquire is there to make sure lockdep understands that this
>>>>> is a shrinker and that it checks all the dependencies for us like if
>>>>> we'd be in real reclaim. There is some drop caches interfaces in proc
>>>>> iirc, but those drop everything, and they don't have the fs_reclaim
>>>>> annotations to teach lockdep about what we're doing.
>>> ... I really do not follow this. You shouldn't really care whether this
>>> is a reclaim interface or not. Or maybe I just do not understand this...
>> We're heavily relying on lockdep and fs_reclaim to make sure we get it all
>> right. So any drop caches interface that isn't wrapped in fs_reclaim
>> context is kinda useless for testing. Plus ideally we want to only hit our
>> own paths, and not trash every other cache in the system. Speed matters in
>> CI.
> But what is special about this path to hack around and make it pretend
> it is part of the fs reclaim path?

That's just to teach lockdep that there is a dependency.

In other words we pretend in the debugfs file that it is part of the fs 
reclaim path to check for the case when it really becomes part of the fs 
reclaim path.

Regards,
Christian.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-23 11:51                             ` Christian König
@ 2021-03-23 12:00                               ` Daniel Vetter
  2021-03-23 12:05                               ` Michal Hocko
  1 sibling, 0 replies; 29+ messages in thread
From: Daniel Vetter @ 2021-03-23 12:00 UTC (permalink / raw)
  To: Christian König
  Cc: Michal Hocko, Daniel Vetter, Matthew Wilcox, dri-devel, Linux MM,
	amd-gfx list, Dave Chinner, Leo Liu

On Tue, Mar 23, 2021 at 12:51:13PM +0100, Christian König wrote:
> 
> 
> Am 23.03.21 um 12:46 schrieb Michal Hocko:
> > On Tue 23-03-21 12:28:20, Daniel Vetter wrote:
> > > On Tue, Mar 23, 2021 at 08:38:33AM +0100, Michal Hocko wrote:
> > [...]
> > > > > > fs_reclaim_acquire is there to make sure lockdep understands that this
> > > > > > is a shrinker and that it checks all the dependencies for us like if
> > > > > > we'd be in real reclaim. There is some drop caches interfaces in proc
> > > > > > iirc, but those drop everything, and they don't have the fs_reclaim
> > > > > > annotations to teach lockdep about what we're doing.
> > > > ... I really do not follow this. You shouldn't really care whether this
> > > > is a reclaim interface or not. Or maybe I just do not understand this...
> > > We're heavily relying on lockdep and fs_reclaim to make sure we get it all
> > > right. So any drop caches interface that isn't wrapped in fs_reclaim
> > > context is kinda useless for testing. Plus ideally we want to only hit our
> > > own paths, and not trash every other cache in the system. Speed matters in
> > > CI.
> > But what is special about this path to hack around and make it pretend
> > it is part of the fs reclaim path?
> 
> That's just to teach lockdep that there is a dependency.
> 
> In other words we pretend in the debugfs file that it is part of the fs
> reclaim path to check for the case when it really becomes part of the fs
> reclaim path.

Yeah this is only for testing. There's two ways to test your shrinker:

- drive system agains the OOM wall, deal with lots of unrelated hangs and
  issues. Aside from this takes postively forever, which is not good if
  you want CI turn-around time measured in "coffee breaks" as time unit.

- have a debugfs file which reconstructs the calling context of direct
  reclaim sufficiently for lockdep to do its thing, and then test just
  your shrinker in isolation, without crashing your CI machines or even
  hurting it much.

Only one of these options is actually practical.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-23 11:48                           ` Christian König
@ 2021-03-23 12:04                             ` Michal Hocko
  2021-03-23 12:21                               ` Christian König
  2021-03-23 13:15                               ` Daniel Vetter
  0 siblings, 2 replies; 29+ messages in thread
From: Michal Hocko @ 2021-03-23 12:04 UTC (permalink / raw)
  To: Christian König
  Cc: Matthew Wilcox, dri-devel, Linux MM, amd-gfx list, Dave Chinner, Leo Liu

On Tue 23-03-21 12:48:58, Christian König wrote:
> Am 23.03.21 um 12:28 schrieb Daniel Vetter:
> > On Tue, Mar 23, 2021 at 08:38:33AM +0100, Michal Hocko wrote:
> > > On Mon 22-03-21 20:34:25, Christian König wrote:
[...]
> > > > My only concern is that if I could rely on memalloc_no* being used we could
> > > > optimize this quite a bit further.
> > > Yes you can use the scope API and you will be guaranteed that _any_
> > > allocation from the enclosed context will inherit GFP_NO* semantic.
> 
> The question is if this is also guaranteed the other way around?
> 
> In other words if somebody calls get_free_page(GFP_NOFS) are the context
> flags set as well?

gfp mask is always restricted in the page allocator. So say you have
noio scope context and call get_free_page/kmalloc(GFP_NOFS) then the
scope would restrict the allocation flags to GFP_NOIO (aka drop
__GFP_IO). For further details, have a look at current_gfp_context
and its callers.

Does this answer your question?

> > > I think this is where I don't get yet what Christian tries to do: We
> > > really shouldn't do different tricks and calling contexts between direct
> > > reclaim and kswapd reclaim. Otherwise very hard to track down bugs are
> > > pretty much guaranteed. So whether we use explicit gfp flags or the
> > > context apis, result is exactly the same.
> 
> Ok let us recap what TTMs TT shrinker does here:
> 
> 1. We got memory which is not swapable because it might be accessed by the
> GPU at any time.
> 2. Make sure the memory is not accessed by the GPU and driver need to grab a
> lock before they can make it accessible again.
> 3. Allocate a shmem file and copy over the not swapable pages.

This is quite tricky because the shrinker operates in the PF_MEMALLOC
context so such an allocation would be allowed to completely deplete
memory unless you explicitly mark that context as __GFP_NOMEMALLOC. Also
note that if the allocation cannot succeed it will not trigger reclaim
again because you are already called from the reclaim context.

> 4. Free the not swapable/reclaimable pages.
> 
> The pages we got from the shmem file are easily swapable to disk after the
> copy is completed. But only if IO is not already blocked because the
> shrinker was called from an allocation restricted by GFP_NOFS or GFP_NOIO.

Sorry for being dense here but I still do not follow the actual problem
(well, except for the above mentioned one). Is the sole point of this to
emulate a GFP_NO* allocation context and see how shrinker behaves? 
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-23 11:51                             ` Christian König
  2021-03-23 12:00                               ` Daniel Vetter
@ 2021-03-23 12:05                               ` Michal Hocko
  1 sibling, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2021-03-23 12:05 UTC (permalink / raw)
  To: Christian König
  Cc: Daniel Vetter, Matthew Wilcox, dri-devel, Linux MM, amd-gfx list,
	Dave Chinner, Leo Liu

On Tue 23-03-21 12:51:13, Christian König wrote:
> 
> 
> Am 23.03.21 um 12:46 schrieb Michal Hocko:
> > On Tue 23-03-21 12:28:20, Daniel Vetter wrote:
> > > On Tue, Mar 23, 2021 at 08:38:33AM +0100, Michal Hocko wrote:
> > [...]
> > > > > > fs_reclaim_acquire is there to make sure lockdep understands that this
> > > > > > is a shrinker and that it checks all the dependencies for us like if
> > > > > > we'd be in real reclaim. There is some drop caches interfaces in proc
> > > > > > iirc, but those drop everything, and they don't have the fs_reclaim
> > > > > > annotations to teach lockdep about what we're doing.
> > > > ... I really do not follow this. You shouldn't really care whether this
> > > > is a reclaim interface or not. Or maybe I just do not understand this...
> > > We're heavily relying on lockdep and fs_reclaim to make sure we get it all
> > > right. So any drop caches interface that isn't wrapped in fs_reclaim
> > > context is kinda useless for testing. Plus ideally we want to only hit our
> > > own paths, and not trash every other cache in the system. Speed matters in
> > > CI.
> > But what is special about this path to hack around and make it pretend
> > it is part of the fs reclaim path?
> 
> That's just to teach lockdep that there is a dependency.
> 
> In other words we pretend in the debugfs file that it is part of the fs
> reclaim path to check for the case when it really becomes part of the fs
> reclaim path.

OK, our emails crossed and I can see your response only after replying
to your other email. OK, this makes more sense now. But as pointed in
other email this will likely not do what you think. Let's continue
discussing in the other subthread to reduce the further confusion.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-23 12:04                             ` Michal Hocko
@ 2021-03-23 12:21                               ` Christian König
  2021-03-23 12:37                                 ` Michal Hocko
  2021-03-23 13:15                               ` Daniel Vetter
  1 sibling, 1 reply; 29+ messages in thread
From: Christian König @ 2021-03-23 12:21 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Matthew Wilcox, dri-devel, Linux MM, amd-gfx list, Dave Chinner, Leo Liu

Am 23.03.21 um 13:04 schrieb Michal Hocko:
> On Tue 23-03-21 12:48:58, Christian König wrote:
>> Am 23.03.21 um 12:28 schrieb Daniel Vetter:
>>> On Tue, Mar 23, 2021 at 08:38:33AM +0100, Michal Hocko wrote:
>>>> On Mon 22-03-21 20:34:25, Christian König wrote:
> [...]
>>>>> My only concern is that if I could rely on memalloc_no* being used we could
>>>>> optimize this quite a bit further.
>>>> Yes you can use the scope API and you will be guaranteed that _any_
>>>> allocation from the enclosed context will inherit GFP_NO* semantic.
>> The question is if this is also guaranteed the other way around?
>>
>> In other words if somebody calls get_free_page(GFP_NOFS) are the context
>> flags set as well?
> gfp mask is always restricted in the page allocator. So say you have
> noio scope context and call get_free_page/kmalloc(GFP_NOFS) then the
> scope would restrict the allocation flags to GFP_NOIO (aka drop
> __GFP_IO). For further details, have a look at current_gfp_context
> and its callers.
>
> Does this answer your question?

But what happens if you don't have noio scope and somebody calls 
get_free_page(GFP_NOFS)?

Is then the noio scope added automatically? And is it possible that the 
shrinker gets called without noio scope even we would need it?

>>>> I think this is where I don't get yet what Christian tries to do: We
>>>> really shouldn't do different tricks and calling contexts between direct
>>>> reclaim and kswapd reclaim. Otherwise very hard to track down bugs are
>>>> pretty much guaranteed. So whether we use explicit gfp flags or the
>>>> context apis, result is exactly the same.
>> Ok let us recap what TTMs TT shrinker does here:
>>
>> 1. We got memory which is not swapable because it might be accessed by the
>> GPU at any time.
>> 2. Make sure the memory is not accessed by the GPU and driver need to grab a
>> lock before they can make it accessible again.
>> 3. Allocate a shmem file and copy over the not swapable pages.
> This is quite tricky because the shrinker operates in the PF_MEMALLOC
> context so such an allocation would be allowed to completely deplete
> memory unless you explicitly mark that context as __GFP_NOMEMALLOC.

Thanks, exactly that was one thing I was absolutely not sure about. And 
yes I agree that this is really tricky.

Ideally I would like to be able to trigger swapping out the shmem page I 
allocated immediately after doing the copy.

This way I would only need a single page for the whole shrink operation 
at any given time.

> Also note that if the allocation cannot succeed it will not trigger reclaim
> again because you are already called from the reclaim context.
>
>> 4. Free the not swapable/reclaimable pages.
>>
>> The pages we got from the shmem file are easily swapable to disk after the
>> copy is completed. But only if IO is not already blocked because the
>> shrinker was called from an allocation restricted by GFP_NOFS or GFP_NOIO.
> Sorry for being dense here but I still do not follow the actual problem
> (well, except for the above mentioned one). Is the sole point of this to
> emulate a GFP_NO* allocation context and see how shrinker behaves?

Please be as dense as you need to be :)

I think Daniel and I only have a very rough understanding of the memory 
management details here, but we need exactly that knowledge to get the 
GPU memory management into the shape we want it to be.

Thanks,
Christian.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-23 12:21                               ` Christian König
@ 2021-03-23 12:37                                 ` Michal Hocko
  2021-03-23 13:06                                   ` Christian König
  0 siblings, 1 reply; 29+ messages in thread
From: Michal Hocko @ 2021-03-23 12:37 UTC (permalink / raw)
  To: Christian König
  Cc: Matthew Wilcox, dri-devel, Linux MM, amd-gfx list, Dave Chinner, Leo Liu

On Tue 23-03-21 13:21:32, Christian König wrote:
> Am 23.03.21 um 13:04 schrieb Michal Hocko:
> > On Tue 23-03-21 12:48:58, Christian König wrote:
> > > Am 23.03.21 um 12:28 schrieb Daniel Vetter:
> > > > On Tue, Mar 23, 2021 at 08:38:33AM +0100, Michal Hocko wrote:
> > > > > On Mon 22-03-21 20:34:25, Christian König wrote:
> > [...]
> > > > > > My only concern is that if I could rely on memalloc_no* being used we could
> > > > > > optimize this quite a bit further.
> > > > > Yes you can use the scope API and you will be guaranteed that _any_
> > > > > allocation from the enclosed context will inherit GFP_NO* semantic.
> > > The question is if this is also guaranteed the other way around?
> > > 
> > > In other words if somebody calls get_free_page(GFP_NOFS) are the context
> > > flags set as well?
> > gfp mask is always restricted in the page allocator. So say you have
> > noio scope context and call get_free_page/kmalloc(GFP_NOFS) then the
> > scope would restrict the allocation flags to GFP_NOIO (aka drop
> > __GFP_IO). For further details, have a look at current_gfp_context
> > and its callers.
> > 
> > Does this answer your question?
> 
> But what happens if you don't have noio scope and somebody calls
> get_free_page(GFP_NOFS)?

Then this will be a regular NOFS request. Let me repeat scope API will
further restrict any requested allocation mode.

> Is then the noio scope added automatically? And is it possible that the
> shrinker gets called without noio scope even we would need it?

Here you have lost me again.

> > > > > I think this is where I don't get yet what Christian tries to do: We
> > > > > really shouldn't do different tricks and calling contexts between direct
> > > > > reclaim and kswapd reclaim. Otherwise very hard to track down bugs are
> > > > > pretty much guaranteed. So whether we use explicit gfp flags or the
> > > > > context apis, result is exactly the same.
> > > Ok let us recap what TTMs TT shrinker does here:
> > > 
> > > 1. We got memory which is not swapable because it might be accessed by the
> > > GPU at any time.
> > > 2. Make sure the memory is not accessed by the GPU and driver need to grab a
> > > lock before they can make it accessible again.
> > > 3. Allocate a shmem file and copy over the not swapable pages.
> > This is quite tricky because the shrinker operates in the PF_MEMALLOC
> > context so such an allocation would be allowed to completely deplete
> > memory unless you explicitly mark that context as __GFP_NOMEMALLOC.
> 
> Thanks, exactly that was one thing I was absolutely not sure about. And yes
> I agree that this is really tricky.
> 
> Ideally I would like to be able to trigger swapping out the shmem page I
> allocated immediately after doing the copy.

So let me try to rephrase to make sure I understand. You would like to
swap out the existing content from the shrinker and you use shmem as a
way to achieve that. The swapout should happen at the time of copying
(shrinker context) or shortly afterwards?

So effectively to call pageout() on the shmem page after the copy?
 
> This way I would only need a single page for the whole shrink operation at
> any given time.

What do you mean by that? You want the share the same shmem page for
other copy+swapout?
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-23 12:37                                 ` Michal Hocko
@ 2021-03-23 13:06                                   ` Christian König
  2021-03-23 13:41                                     ` Michal Hocko
  0 siblings, 1 reply; 29+ messages in thread
From: Christian König @ 2021-03-23 13:06 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Matthew Wilcox, dri-devel, Linux MM, amd-gfx list, Dave Chinner, Leo Liu

Am 23.03.21 um 13:37 schrieb Michal Hocko:
> On Tue 23-03-21 13:21:32, Christian König wrote:
>> Am 23.03.21 um 13:04 schrieb Michal Hocko:
>>> On Tue 23-03-21 12:48:58, Christian König wrote:
>>>> Am 23.03.21 um 12:28 schrieb Daniel Vetter:
>>>>> On Tue, Mar 23, 2021 at 08:38:33AM +0100, Michal Hocko wrote:
>>>>>> On Mon 22-03-21 20:34:25, Christian König wrote:
>>> [...]
>>>>>>> My only concern is that if I could rely on memalloc_no* being used we could
>>>>>>> optimize this quite a bit further.
>>>>>> Yes you can use the scope API and you will be guaranteed that _any_
>>>>>> allocation from the enclosed context will inherit GFP_NO* semantic.
>>>> The question is if this is also guaranteed the other way around?
>>>>
>>>> In other words if somebody calls get_free_page(GFP_NOFS) are the context
>>>> flags set as well?
>>> gfp mask is always restricted in the page allocator. So say you have
>>> noio scope context and call get_free_page/kmalloc(GFP_NOFS) then the
>>> scope would restrict the allocation flags to GFP_NOIO (aka drop
>>> __GFP_IO). For further details, have a look at current_gfp_context
>>> and its callers.
>>>
>>> Does this answer your question?
>> But what happens if you don't have noio scope and somebody calls
>> get_free_page(GFP_NOFS)?
> Then this will be a regular NOFS request. Let me repeat scope API will
> further restrict any requested allocation mode.

Ok, got it.

>
>> Is then the noio scope added automatically? And is it possible that the
>> shrinker gets called without noio scope even we would need it?
> Here you have lost me again.
>
>>>>>> I think this is where I don't get yet what Christian tries to do: We
>>>>>> really shouldn't do different tricks and calling contexts between direct
>>>>>> reclaim and kswapd reclaim. Otherwise very hard to track down bugs are
>>>>>> pretty much guaranteed. So whether we use explicit gfp flags or the
>>>>>> context apis, result is exactly the same.
>>>> Ok let us recap what TTMs TT shrinker does here:
>>>>
>>>> 1. We got memory which is not swapable because it might be accessed by the
>>>> GPU at any time.
>>>> 2. Make sure the memory is not accessed by the GPU and driver need to grab a
>>>> lock before they can make it accessible again.
>>>> 3. Allocate a shmem file and copy over the not swapable pages.
>>> This is quite tricky because the shrinker operates in the PF_MEMALLOC
>>> context so such an allocation would be allowed to completely deplete
>>> memory unless you explicitly mark that context as __GFP_NOMEMALLOC.
>> Thanks, exactly that was one thing I was absolutely not sure about. And yes
>> I agree that this is really tricky.
>>
>> Ideally I would like to be able to trigger swapping out the shmem page I
>> allocated immediately after doing the copy.
> So let me try to rephrase to make sure I understand. You would like to
> swap out the existing content from the shrinker and you use shmem as a
> way to achieve that. The swapout should happen at the time of copying
> (shrinker context) or shortly afterwards?
>
> So effectively to call pageout() on the shmem page after the copy?

Yes, exactly that.

>> This way I would only need a single page for the whole shrink operation at
>> any given time.
> What do you mean by that? You want the share the same shmem page for
> other copy+swapout?

Correct, yes.

The idea is that we can swap out the content of a full GPU buffer object 
this way to give the backing store of the object back to the core memory 
managment.

Regards,
Christian.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-23 12:04                             ` Michal Hocko
  2021-03-23 12:21                               ` Christian König
@ 2021-03-23 13:15                               ` Daniel Vetter
  2021-03-23 13:48                                 ` Michal Hocko
  1 sibling, 1 reply; 29+ messages in thread
From: Daniel Vetter @ 2021-03-23 13:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Christian König, Matthew Wilcox, amd-gfx list, Linux MM,
	dri-devel, Dave Chinner, Leo Liu

On Tue, Mar 23, 2021 at 01:04:03PM +0100, Michal Hocko wrote:
> On Tue 23-03-21 12:48:58, Christian König wrote:
> > Am 23.03.21 um 12:28 schrieb Daniel Vetter:
> > > On Tue, Mar 23, 2021 at 08:38:33AM +0100, Michal Hocko wrote:
> > > > I think this is where I don't get yet what Christian tries to do: We
> > > > really shouldn't do different tricks and calling contexts between direct
> > > > reclaim and kswapd reclaim. Otherwise very hard to track down bugs are
> > > > pretty much guaranteed. So whether we use explicit gfp flags or the
> > > > context apis, result is exactly the same.
> > 
> > Ok let us recap what TTMs TT shrinker does here:
> > 
> > 1. We got memory which is not swapable because it might be accessed by the
> > GPU at any time.
> > 2. Make sure the memory is not accessed by the GPU and driver need to grab a
> > lock before they can make it accessible again.
> > 3. Allocate a shmem file and copy over the not swapable pages.
> 
> This is quite tricky because the shrinker operates in the PF_MEMALLOC
> context so such an allocation would be allowed to completely deplete
> memory unless you explicitly mark that context as __GFP_NOMEMALLOC. Also
> note that if the allocation cannot succeed it will not trigger reclaim
> again because you are already called from the reclaim context.

[Limiting to that discussion]

Yes it's not emulating real (direct) reclaim correctly, but ime the
biggest issue with direct reclaim is when you do mutex_lock instead of
mutex_trylock or in general block on stuff that you cant. And lockdep +
fs_reclaim annotations gets us that, so pretty good to make sure our
shrinker is correct.

Actual tuning of it and making sure it's not doing silly things is ofc a
different thing, and for that we can't test it in isolation. But it's good
to know that before you tune it, you have rather high confidence it's
at least correct. And for that not running with PF_MEMALLOC is actually
good, since it means more allocation failures, so more testing of those
error/backoff paths in the code.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-23 13:06                                   ` Christian König
@ 2021-03-23 13:41                                     ` Michal Hocko
  2021-03-23 13:56                                       ` Christian König
  0 siblings, 1 reply; 29+ messages in thread
From: Michal Hocko @ 2021-03-23 13:41 UTC (permalink / raw)
  To: Christian König
  Cc: Matthew Wilcox, dri-devel, Linux MM, amd-gfx list, Dave Chinner, Leo Liu

On Tue 23-03-21 14:06:25, Christian König wrote:
> Am 23.03.21 um 13:37 schrieb Michal Hocko:
> > On Tue 23-03-21 13:21:32, Christian König wrote:
[...]
> > > Ideally I would like to be able to trigger swapping out the shmem page I
> > > allocated immediately after doing the copy.
> > So let me try to rephrase to make sure I understand. You would like to
> > swap out the existing content from the shrinker and you use shmem as a
> > way to achieve that. The swapout should happen at the time of copying
> > (shrinker context) or shortly afterwards?
> > 
> > So effectively to call pageout() on the shmem page after the copy?
> 
> Yes, exactly that.

OK, good. I see what you are trying to achieve now. I do not think we
would want to allow pageout from the shrinker's context but what you can
do is to instantiate the shmem page into the tail of the inactive list
so the next reclaim attempt will swap it out (assuming swap is available
of course).

This is not really something that our existing infrastructure gives you
though, I am afraid. There is no way to tell a newly allocated shmem
page should be in fact cold and the first one to swap out. But there are
people more familiar with shmem and its pecularities so I might be wrong
here.

Anyway, I am wondering whether the overall approach is sound. Why don't
you simply use shmem as your backing storage from the beginning and pin
those pages if they are used by the device?

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-23 13:15                               ` Daniel Vetter
@ 2021-03-23 13:48                                 ` Michal Hocko
  0 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2021-03-23 13:48 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Christian König, Matthew Wilcox, amd-gfx list, Linux MM,
	dri-devel, Dave Chinner, Leo Liu

On Tue 23-03-21 14:15:05, Daniel Vetter wrote:
> On Tue, Mar 23, 2021 at 01:04:03PM +0100, Michal Hocko wrote:
> > On Tue 23-03-21 12:48:58, Christian König wrote:
> > > Am 23.03.21 um 12:28 schrieb Daniel Vetter:
> > > > On Tue, Mar 23, 2021 at 08:38:33AM +0100, Michal Hocko wrote:
> > > > > I think this is where I don't get yet what Christian tries to do: We
> > > > > really shouldn't do different tricks and calling contexts between direct
> > > > > reclaim and kswapd reclaim. Otherwise very hard to track down bugs are
> > > > > pretty much guaranteed. So whether we use explicit gfp flags or the
> > > > > context apis, result is exactly the same.
> > > 
> > > Ok let us recap what TTMs TT shrinker does here:
> > > 
> > > 1. We got memory which is not swapable because it might be accessed by the
> > > GPU at any time.
> > > 2. Make sure the memory is not accessed by the GPU and driver need to grab a
> > > lock before they can make it accessible again.
> > > 3. Allocate a shmem file and copy over the not swapable pages.
> > 
> > This is quite tricky because the shrinker operates in the PF_MEMALLOC
> > context so such an allocation would be allowed to completely deplete
> > memory unless you explicitly mark that context as __GFP_NOMEMALLOC. Also
> > note that if the allocation cannot succeed it will not trigger reclaim
> > again because you are already called from the reclaim context.
> 
> [Limiting to that discussion]
> 
> Yes it's not emulating real (direct) reclaim correctly, but ime the
> biggest issue with direct reclaim is when you do mutex_lock instead of
> mutex_trylock or in general block on stuff that you cant. And lockdep +
> fs_reclaim annotations gets us that, so pretty good to make sure our
> shrinker is correct.

I have to confess that I manage to (happily) forget all the nasty
details about fs_reclaim lockdep internals so I am not sure the use by
the proposed patch is actually reasonable. Talk to lockdep guys about
that and make sure to put a big fat comment explaining what is going on.

In general allocating from the reclaim context is a bad idea and you
should avoid that. As already said a simple allocation request from the
reclaim context is not constrained and it will not recurse back into
the reclaim. Calling into shmem from the shrinker context might be
really tricky as well. I am not even sure this is possible for anything
other than full (GFP_KERNEL) reclaim context.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-23 13:41                                     ` Michal Hocko
@ 2021-03-23 13:56                                       ` Christian König
  2021-03-23 15:13                                         ` Michal Hocko
  0 siblings, 1 reply; 29+ messages in thread
From: Christian König @ 2021-03-23 13:56 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Matthew Wilcox, dri-devel, Linux MM, amd-gfx list, Dave Chinner, Leo Liu

Am 23.03.21 um 14:41 schrieb Michal Hocko:
> On Tue 23-03-21 14:06:25, Christian König wrote:
>> Am 23.03.21 um 13:37 schrieb Michal Hocko:
>>> On Tue 23-03-21 13:21:32, Christian König wrote:
> [...]
>>>> Ideally I would like to be able to trigger swapping out the shmem page I
>>>> allocated immediately after doing the copy.
>>> So let me try to rephrase to make sure I understand. You would like to
>>> swap out the existing content from the shrinker and you use shmem as a
>>> way to achieve that. The swapout should happen at the time of copying
>>> (shrinker context) or shortly afterwards?
>>>
>>> So effectively to call pageout() on the shmem page after the copy?
>> Yes, exactly that.
> OK, good. I see what you are trying to achieve now. I do not think we
> would want to allow pageout from the shrinker's context but what you can
> do is to instantiate the shmem page into the tail of the inactive list
> so the next reclaim attempt will swap it out (assuming swap is available
> of course).

Yes, that's at least my understanding of how we currently do it.

Problem with that approach is that I first copy over the whole object 
into shmem and then free it.

So instead of temporary using a single page, I need whatever the buffer 
object is in size as temporary storage for the shmem object and that can 
be a couple of hundred MiB.

> This is not really something that our existing infrastructure gives you
> though, I am afraid. There is no way to tell a newly allocated shmem
> page should be in fact cold and the first one to swap out. But there are
> people more familiar with shmem and its pecularities so I might be wrong
> here.
>
> Anyway, I am wondering whether the overall approach is sound. Why don't
> you simply use shmem as your backing storage from the beginning and pin
> those pages if they are used by the device?

Yeah, that is exactly what the Intel guys are doing for their integrated 
GPUs :)

Problem is for TTM I need to be able to handle dGPUs and those have all 
kinds of funny allocation restrictions. In other words I need to 
guarantee that the allocated memory is coherent accessible to the GPU 
without using SWIOTLB.

The simple case is that the device can only do DMA32, but you also got 
device which can only do 40bits or 48bits.

On top of that you also got AGP, CMA and stuff like CPU cache behavior 
changes (write back vs. write through, vs. uncached).

Regards,
Christian.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-23 13:56                                       ` Christian König
@ 2021-03-23 15:13                                         ` Michal Hocko
  2021-03-23 15:45                                           ` Christian König
  0 siblings, 1 reply; 29+ messages in thread
From: Michal Hocko @ 2021-03-23 15:13 UTC (permalink / raw)
  To: Christian König
  Cc: Matthew Wilcox, dri-devel, Linux MM, amd-gfx list, Dave Chinner, Leo Liu

On Tue 23-03-21 14:56:54, Christian König wrote:
> Am 23.03.21 um 14:41 schrieb Michal Hocko:
[...]
> > Anyway, I am wondering whether the overall approach is sound. Why don't
> > you simply use shmem as your backing storage from the beginning and pin
> > those pages if they are used by the device?
> 
> Yeah, that is exactly what the Intel guys are doing for their integrated
> GPUs :)
> 
> Problem is for TTM I need to be able to handle dGPUs and those have all
> kinds of funny allocation restrictions. In other words I need to guarantee
> that the allocated memory is coherent accessible to the GPU without using
> SWIOTLB.
> 
> The simple case is that the device can only do DMA32, but you also got
> device which can only do 40bits or 48bits.
> 
> On top of that you also got AGP, CMA and stuff like CPU cache behavior
> changes (write back vs. write through, vs. uncached).

OK, so the underlying problem seems to be that gfp mask (thus
mapping_gfp_mask) cannot really reflect your requirements, right?  Would
it help if shmem would allow to provide an allocation callback to
override alloc_page_vma which is used currently? I am pretty sure there
will be more to handle but going through shmem for the whole life time
is just so much easier to reason about than some tricks to abuse shmem
just for the swapout path.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-23 15:13                                         ` Michal Hocko
@ 2021-03-23 15:45                                           ` Christian König
  2021-03-24 10:19                                             ` Thomas Hellström (Intel)
  0 siblings, 1 reply; 29+ messages in thread
From: Christian König @ 2021-03-23 15:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Matthew Wilcox, dri-devel, Linux MM, amd-gfx list, Dave Chinner, Leo Liu

Am 23.03.21 um 16:13 schrieb Michal Hocko:
> On Tue 23-03-21 14:56:54, Christian König wrote:
>> Am 23.03.21 um 14:41 schrieb Michal Hocko:
> [...]
>>> Anyway, I am wondering whether the overall approach is sound. Why don't
>>> you simply use shmem as your backing storage from the beginning and pin
>>> those pages if they are used by the device?
>> Yeah, that is exactly what the Intel guys are doing for their integrated
>> GPUs :)
>>
>> Problem is for TTM I need to be able to handle dGPUs and those have all
>> kinds of funny allocation restrictions. In other words I need to guarantee
>> that the allocated memory is coherent accessible to the GPU without using
>> SWIOTLB.
>>
>> The simple case is that the device can only do DMA32, but you also got
>> device which can only do 40bits or 48bits.
>>
>> On top of that you also got AGP, CMA and stuff like CPU cache behavior
>> changes (write back vs. write through, vs. uncached).
> OK, so the underlying problem seems to be that gfp mask (thus
> mapping_gfp_mask) cannot really reflect your requirements, right?  Would
> it help if shmem would allow to provide an allocation callback to
> override alloc_page_vma which is used currently? I am pretty sure there
> will be more to handle but going through shmem for the whole life time
> is just so much easier to reason about than some tricks to abuse shmem
> just for the swapout path.

Well it's a start, but the pages can have special CPU cache settings. So 
direct IO from/to them usually doesn't work as expected.

Additional to that for AGP and CMA I need to make sure that I give those 
pages back to the relevant subsystems instead of just dropping the page 
reference.

So I would need to block for the swapio to be completed.

Anyway I probably need to revert those patches for now since this isn't 
working as we hoped it would.

Thanks for the explanation how stuff works here.

Christian.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-23 15:45                                           ` Christian König
@ 2021-03-24 10:19                                             ` Thomas Hellström (Intel)
  2021-03-24 11:55                                               ` Daniel Vetter
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Hellström (Intel) @ 2021-03-24 10:19 UTC (permalink / raw)
  To: Christian König, Michal Hocko
  Cc: Matthew Wilcox, amd-gfx list, Linux MM, dri-devel, Dave Chinner, Leo Liu


On 3/23/21 4:45 PM, Christian König wrote:
> Am 23.03.21 um 16:13 schrieb Michal Hocko:
>> On Tue 23-03-21 14:56:54, Christian König wrote:
>>> Am 23.03.21 um 14:41 schrieb Michal Hocko:
>> [...]
>>>> Anyway, I am wondering whether the overall approach is sound. Why 
>>>> don't
>>>> you simply use shmem as your backing storage from the beginning and 
>>>> pin
>>>> those pages if they are used by the device?
>>> Yeah, that is exactly what the Intel guys are doing for their 
>>> integrated
>>> GPUs :)
>>>
>>> Problem is for TTM I need to be able to handle dGPUs and those have all
>>> kinds of funny allocation restrictions. In other words I need to 
>>> guarantee
>>> that the allocated memory is coherent accessible to the GPU without 
>>> using
>>> SWIOTLB.
>>>
>>> The simple case is that the device can only do DMA32, but you also got
>>> device which can only do 40bits or 48bits.
>>>
>>> On top of that you also got AGP, CMA and stuff like CPU cache behavior
>>> changes (write back vs. write through, vs. uncached).
>> OK, so the underlying problem seems to be that gfp mask (thus
>> mapping_gfp_mask) cannot really reflect your requirements, right?  Would
>> it help if shmem would allow to provide an allocation callback to
>> override alloc_page_vma which is used currently? I am pretty sure there
>> will be more to handle but going through shmem for the whole life time
>> is just so much easier to reason about than some tricks to abuse shmem
>> just for the swapout path.
>
> Well it's a start, but the pages can have special CPU cache settings. 
> So direct IO from/to them usually doesn't work as expected.
>
> Additional to that for AGP and CMA I need to make sure that I give 
> those pages back to the relevant subsystems instead of just dropping 
> the page reference.
>
> So I would need to block for the swapio to be completed.
>
> Anyway I probably need to revert those patches for now since this 
> isn't working as we hoped it would.
>
> Thanks for the explanation how stuff works here.

Another alternative here that I've tried before without being successful 
would perhaps be to drop shmem completely and, if it's a normal page (no 
dma or funny caching attributes) just use add_to_swap_cache()? If it's 
something else, try alloc a page with relevant gfp attributes, copy and 
add_to_swap_cache()? Or perhaps that doesn't work well from a shrinker 
either?

/Thomas



>
> Christian.
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-24 10:19                                             ` Thomas Hellström (Intel)
@ 2021-03-24 11:55                                               ` Daniel Vetter
  2021-03-24 12:00                                                 ` Christian König
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Vetter @ 2021-03-24 11:55 UTC (permalink / raw)
  To: Thomas Hellström (Intel)
  Cc: Christian König, Michal Hocko, Matthew Wilcox, dri-devel,
	Linux MM, amd-gfx list, Dave Chinner, Leo Liu

On Wed, Mar 24, 2021 at 11:19:13AM +0100, Thomas Hellström (Intel) wrote:
> 
> On 3/23/21 4:45 PM, Christian König wrote:
> > Am 23.03.21 um 16:13 schrieb Michal Hocko:
> > > On Tue 23-03-21 14:56:54, Christian König wrote:
> > > > Am 23.03.21 um 14:41 schrieb Michal Hocko:
> > > [...]
> > > > > Anyway, I am wondering whether the overall approach is
> > > > > sound. Why don't
> > > > > you simply use shmem as your backing storage from the
> > > > > beginning and pin
> > > > > those pages if they are used by the device?
> > > > Yeah, that is exactly what the Intel guys are doing for their
> > > > integrated
> > > > GPUs :)
> > > > 
> > > > Problem is for TTM I need to be able to handle dGPUs and those have all
> > > > kinds of funny allocation restrictions. In other words I need to
> > > > guarantee
> > > > that the allocated memory is coherent accessible to the GPU
> > > > without using
> > > > SWIOTLB.
> > > > 
> > > > The simple case is that the device can only do DMA32, but you also got
> > > > device which can only do 40bits or 48bits.
> > > > 
> > > > On top of that you also got AGP, CMA and stuff like CPU cache behavior
> > > > changes (write back vs. write through, vs. uncached).
> > > OK, so the underlying problem seems to be that gfp mask (thus
> > > mapping_gfp_mask) cannot really reflect your requirements, right?  Would
> > > it help if shmem would allow to provide an allocation callback to
> > > override alloc_page_vma which is used currently? I am pretty sure there
> > > will be more to handle but going through shmem for the whole life time
> > > is just so much easier to reason about than some tricks to abuse shmem
> > > just for the swapout path.
> > 
> > Well it's a start, but the pages can have special CPU cache settings. So
> > direct IO from/to them usually doesn't work as expected.
> > 
> > Additional to that for AGP and CMA I need to make sure that I give those
> > pages back to the relevant subsystems instead of just dropping the page
> > reference.
> > 
> > So I would need to block for the swapio to be completed.
> > 
> > Anyway I probably need to revert those patches for now since this isn't
> > working as we hoped it would.
> > 
> > Thanks for the explanation how stuff works here.
> 
> Another alternative here that I've tried before without being successful
> would perhaps be to drop shmem completely and, if it's a normal page (no dma
> or funny caching attributes) just use add_to_swap_cache()? If it's something
> else, try alloc a page with relevant gfp attributes, copy and
> add_to_swap_cache()? Or perhaps that doesn't work well from a shrinker
> either?

So before we toss everything and go an a great rewrite-the-world tour,
what if we just try to split up big objects. So for objects which are
bigger than e.g. 10mb

- move them to a special "under eviction" list
- keep a note how far we evicted thus far
- interleave allocating shmem pages, copying data and releasing the ttm
  backing store on a chunk basis (maybe 10mb or whatever, tuning tbh)

If that's not enough, occasionally break out of the shrinker entirely so
other parts of reclaim can reclaim the shmem stuff. But just releasing our
own pages as we go should help a lot I think.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-24 11:55                                               ` Daniel Vetter
@ 2021-03-24 12:00                                                 ` Christian König
  2021-03-24 12:01                                                   ` Daniel Vetter
  0 siblings, 1 reply; 29+ messages in thread
From: Christian König @ 2021-03-24 12:00 UTC (permalink / raw)
  To: Thomas Hellström (Intel),
	Michal Hocko, Matthew Wilcox, dri-devel, Linux MM, amd-gfx list,
	Dave Chinner, Leo Liu

Am 24.03.21 um 12:55 schrieb Daniel Vetter:
> On Wed, Mar 24, 2021 at 11:19:13AM +0100, Thomas Hellström (Intel) wrote:
>> On 3/23/21 4:45 PM, Christian König wrote:
>>> Am 23.03.21 um 16:13 schrieb Michal Hocko:
>>>> On Tue 23-03-21 14:56:54, Christian König wrote:
>>>>> Am 23.03.21 um 14:41 schrieb Michal Hocko:
>>>> [...]
>>>>>> Anyway, I am wondering whether the overall approach is
>>>>>> sound. Why don't
>>>>>> you simply use shmem as your backing storage from the
>>>>>> beginning and pin
>>>>>> those pages if they are used by the device?
>>>>> Yeah, that is exactly what the Intel guys are doing for their
>>>>> integrated
>>>>> GPUs :)
>>>>>
>>>>> Problem is for TTM I need to be able to handle dGPUs and those have all
>>>>> kinds of funny allocation restrictions. In other words I need to
>>>>> guarantee
>>>>> that the allocated memory is coherent accessible to the GPU
>>>>> without using
>>>>> SWIOTLB.
>>>>>
>>>>> The simple case is that the device can only do DMA32, but you also got
>>>>> device which can only do 40bits or 48bits.
>>>>>
>>>>> On top of that you also got AGP, CMA and stuff like CPU cache behavior
>>>>> changes (write back vs. write through, vs. uncached).
>>>> OK, so the underlying problem seems to be that gfp mask (thus
>>>> mapping_gfp_mask) cannot really reflect your requirements, right?  Would
>>>> it help if shmem would allow to provide an allocation callback to
>>>> override alloc_page_vma which is used currently? I am pretty sure there
>>>> will be more to handle but going through shmem for the whole life time
>>>> is just so much easier to reason about than some tricks to abuse shmem
>>>> just for the swapout path.
>>> Well it's a start, but the pages can have special CPU cache settings. So
>>> direct IO from/to them usually doesn't work as expected.
>>>
>>> Additional to that for AGP and CMA I need to make sure that I give those
>>> pages back to the relevant subsystems instead of just dropping the page
>>> reference.
>>>
>>> So I would need to block for the swapio to be completed.
>>>
>>> Anyway I probably need to revert those patches for now since this isn't
>>> working as we hoped it would.
>>>
>>> Thanks for the explanation how stuff works here.
>> Another alternative here that I've tried before without being successful
>> would perhaps be to drop shmem completely and, if it's a normal page (no dma
>> or funny caching attributes) just use add_to_swap_cache()? If it's something
>> else, try alloc a page with relevant gfp attributes, copy and
>> add_to_swap_cache()? Or perhaps that doesn't work well from a shrinker
>> either?
> So before we toss everything and go an a great rewrite-the-world tour,
> what if we just try to split up big objects. So for objects which are
> bigger than e.g. 10mb
>
> - move them to a special "under eviction" list
> - keep a note how far we evicted thus far
> - interleave allocating shmem pages, copying data and releasing the ttm
>    backing store on a chunk basis (maybe 10mb or whatever, tuning tbh)
>
> If that's not enough, occasionally break out of the shrinker entirely so
> other parts of reclaim can reclaim the shmem stuff. But just releasing our
> own pages as we go should help a lot I think.

Yeah, the later is exactly what I was currently prototyping.

I just didn't used a limit but rather a only partially evicted BOs list 
which is used when we fail to allocate a page.

For the 5.12 cycle I think we should just go back to a hard 50% limit 
for now and then resurrect this when we have solved the issues.

Christian.

> -Daniel



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-24 12:00                                                 ` Christian König
@ 2021-03-24 12:01                                                   ` Daniel Vetter
  2021-03-24 12:07                                                     ` Christian König
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Vetter @ 2021-03-24 12:01 UTC (permalink / raw)
  To: Christian König
  Cc: Thomas Hellström (Intel),
	Michal Hocko, Matthew Wilcox, dri-devel, Linux MM, amd-gfx list,
	Dave Chinner, Leo Liu

On Wed, Mar 24, 2021 at 01:00:28PM +0100, Christian König wrote:
> Am 24.03.21 um 12:55 schrieb Daniel Vetter:
> > On Wed, Mar 24, 2021 at 11:19:13AM +0100, Thomas Hellström (Intel) wrote:
> > > On 3/23/21 4:45 PM, Christian König wrote:
> > > > Am 23.03.21 um 16:13 schrieb Michal Hocko:
> > > > > On Tue 23-03-21 14:56:54, Christian König wrote:
> > > > > > Am 23.03.21 um 14:41 schrieb Michal Hocko:
> > > > > [...]
> > > > > > > Anyway, I am wondering whether the overall approach is
> > > > > > > sound. Why don't
> > > > > > > you simply use shmem as your backing storage from the
> > > > > > > beginning and pin
> > > > > > > those pages if they are used by the device?
> > > > > > Yeah, that is exactly what the Intel guys are doing for their
> > > > > > integrated
> > > > > > GPUs :)
> > > > > > 
> > > > > > Problem is for TTM I need to be able to handle dGPUs and those have all
> > > > > > kinds of funny allocation restrictions. In other words I need to
> > > > > > guarantee
> > > > > > that the allocated memory is coherent accessible to the GPU
> > > > > > without using
> > > > > > SWIOTLB.
> > > > > > 
> > > > > > The simple case is that the device can only do DMA32, but you also got
> > > > > > device which can only do 40bits or 48bits.
> > > > > > 
> > > > > > On top of that you also got AGP, CMA and stuff like CPU cache behavior
> > > > > > changes (write back vs. write through, vs. uncached).
> > > > > OK, so the underlying problem seems to be that gfp mask (thus
> > > > > mapping_gfp_mask) cannot really reflect your requirements, right?  Would
> > > > > it help if shmem would allow to provide an allocation callback to
> > > > > override alloc_page_vma which is used currently? I am pretty sure there
> > > > > will be more to handle but going through shmem for the whole life time
> > > > > is just so much easier to reason about than some tricks to abuse shmem
> > > > > just for the swapout path.
> > > > Well it's a start, but the pages can have special CPU cache settings. So
> > > > direct IO from/to them usually doesn't work as expected.
> > > > 
> > > > Additional to that for AGP and CMA I need to make sure that I give those
> > > > pages back to the relevant subsystems instead of just dropping the page
> > > > reference.
> > > > 
> > > > So I would need to block for the swapio to be completed.
> > > > 
> > > > Anyway I probably need to revert those patches for now since this isn't
> > > > working as we hoped it would.
> > > > 
> > > > Thanks for the explanation how stuff works here.
> > > Another alternative here that I've tried before without being successful
> > > would perhaps be to drop shmem completely and, if it's a normal page (no dma
> > > or funny caching attributes) just use add_to_swap_cache()? If it's something
> > > else, try alloc a page with relevant gfp attributes, copy and
> > > add_to_swap_cache()? Or perhaps that doesn't work well from a shrinker
> > > either?
> > So before we toss everything and go an a great rewrite-the-world tour,
> > what if we just try to split up big objects. So for objects which are
> > bigger than e.g. 10mb
> > 
> > - move them to a special "under eviction" list
> > - keep a note how far we evicted thus far
> > - interleave allocating shmem pages, copying data and releasing the ttm
> >    backing store on a chunk basis (maybe 10mb or whatever, tuning tbh)
> > 
> > If that's not enough, occasionally break out of the shrinker entirely so
> > other parts of reclaim can reclaim the shmem stuff. But just releasing our
> > own pages as we go should help a lot I think.
> 
> Yeah, the later is exactly what I was currently prototyping.
> 
> I just didn't used a limit but rather a only partially evicted BOs list
> which is used when we fail to allocate a page.
> 
> For the 5.12 cycle I think we should just go back to a hard 50% limit for
> now and then resurrect this when we have solved the issues.

Can we do the 50% limit without tossing out all the code we've done thus
far? Just so this doesn't get too disruptive.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-24 12:01                                                   ` Daniel Vetter
@ 2021-03-24 12:07                                                     ` Christian König
  2021-03-24 19:20                                                       ` Daniel Vetter
  0 siblings, 1 reply; 29+ messages in thread
From: Christian König @ 2021-03-24 12:07 UTC (permalink / raw)
  To: Thomas Hellström (Intel),
	Michal Hocko, Matthew Wilcox, dri-devel, Linux MM, amd-gfx list,
	Dave Chinner, Leo Liu



Am 24.03.21 um 13:01 schrieb Daniel Vetter:
> On Wed, Mar 24, 2021 at 01:00:28PM +0100, Christian König wrote:
>> Am 24.03.21 um 12:55 schrieb Daniel Vetter:
>>> On Wed, Mar 24, 2021 at 11:19:13AM +0100, Thomas Hellström (Intel) wrote:
>>>> On 3/23/21 4:45 PM, Christian König wrote:
>>>>> Am 23.03.21 um 16:13 schrieb Michal Hocko:
>>>>>> On Tue 23-03-21 14:56:54, Christian König wrote:
>>>>>>> Am 23.03.21 um 14:41 schrieb Michal Hocko:
>>>>>> [...]
>>>>>>>> Anyway, I am wondering whether the overall approach is
>>>>>>>> sound. Why don't
>>>>>>>> you simply use shmem as your backing storage from the
>>>>>>>> beginning and pin
>>>>>>>> those pages if they are used by the device?
>>>>>>> Yeah, that is exactly what the Intel guys are doing for their
>>>>>>> integrated
>>>>>>> GPUs :)
>>>>>>>
>>>>>>> Problem is for TTM I need to be able to handle dGPUs and those have all
>>>>>>> kinds of funny allocation restrictions. In other words I need to
>>>>>>> guarantee
>>>>>>> that the allocated memory is coherent accessible to the GPU
>>>>>>> without using
>>>>>>> SWIOTLB.
>>>>>>>
>>>>>>> The simple case is that the device can only do DMA32, but you also got
>>>>>>> device which can only do 40bits or 48bits.
>>>>>>>
>>>>>>> On top of that you also got AGP, CMA and stuff like CPU cache behavior
>>>>>>> changes (write back vs. write through, vs. uncached).
>>>>>> OK, so the underlying problem seems to be that gfp mask (thus
>>>>>> mapping_gfp_mask) cannot really reflect your requirements, right?  Would
>>>>>> it help if shmem would allow to provide an allocation callback to
>>>>>> override alloc_page_vma which is used currently? I am pretty sure there
>>>>>> will be more to handle but going through shmem for the whole life time
>>>>>> is just so much easier to reason about than some tricks to abuse shmem
>>>>>> just for the swapout path.
>>>>> Well it's a start, but the pages can have special CPU cache settings. So
>>>>> direct IO from/to them usually doesn't work as expected.
>>>>>
>>>>> Additional to that for AGP and CMA I need to make sure that I give those
>>>>> pages back to the relevant subsystems instead of just dropping the page
>>>>> reference.
>>>>>
>>>>> So I would need to block for the swapio to be completed.
>>>>>
>>>>> Anyway I probably need to revert those patches for now since this isn't
>>>>> working as we hoped it would.
>>>>>
>>>>> Thanks for the explanation how stuff works here.
>>>> Another alternative here that I've tried before without being successful
>>>> would perhaps be to drop shmem completely and, if it's a normal page (no dma
>>>> or funny caching attributes) just use add_to_swap_cache()? If it's something
>>>> else, try alloc a page with relevant gfp attributes, copy and
>>>> add_to_swap_cache()? Or perhaps that doesn't work well from a shrinker
>>>> either?
>>> So before we toss everything and go an a great rewrite-the-world tour,
>>> what if we just try to split up big objects. So for objects which are
>>> bigger than e.g. 10mb
>>>
>>> - move them to a special "under eviction" list
>>> - keep a note how far we evicted thus far
>>> - interleave allocating shmem pages, copying data and releasing the ttm
>>>     backing store on a chunk basis (maybe 10mb or whatever, tuning tbh)
>>>
>>> If that's not enough, occasionally break out of the shrinker entirely so
>>> other parts of reclaim can reclaim the shmem stuff. But just releasing our
>>> own pages as we go should help a lot I think.
>> Yeah, the later is exactly what I was currently prototyping.
>>
>> I just didn't used a limit but rather a only partially evicted BOs list
>> which is used when we fail to allocate a page.
>>
>> For the 5.12 cycle I think we should just go back to a hard 50% limit for
>> now and then resurrect this when we have solved the issues.
> Can we do the 50% limit without tossing out all the code we've done thus
> far? Just so this doesn't get too disruptive.

Yeah, I just need to get back to v1 of this patch. Before you convinced 
me that the shrinker is the better approach .)

Cheers,
Christian.

> -Daniel



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
  2021-03-24 12:07                                                     ` Christian König
@ 2021-03-24 19:20                                                       ` Daniel Vetter
  0 siblings, 0 replies; 29+ messages in thread
From: Daniel Vetter @ 2021-03-24 19:20 UTC (permalink / raw)
  To: Christian König
  Cc: Thomas Hellström (Intel),
	Michal Hocko, Matthew Wilcox, dri-devel, Linux MM, amd-gfx list,
	Dave Chinner, Leo Liu

On Wed, Mar 24, 2021 at 01:07:44PM +0100, Christian König wrote:
> 
> 
> Am 24.03.21 um 13:01 schrieb Daniel Vetter:
> > On Wed, Mar 24, 2021 at 01:00:28PM +0100, Christian König wrote:
> > > Am 24.03.21 um 12:55 schrieb Daniel Vetter:
> > > > On Wed, Mar 24, 2021 at 11:19:13AM +0100, Thomas Hellström (Intel) wrote:
> > > > > On 3/23/21 4:45 PM, Christian König wrote:
> > > > > > Am 23.03.21 um 16:13 schrieb Michal Hocko:
> > > > > > > On Tue 23-03-21 14:56:54, Christian König wrote:
> > > > > > > > Am 23.03.21 um 14:41 schrieb Michal Hocko:
> > > > > > > [...]
> > > > > > > > > Anyway, I am wondering whether the overall approach is
> > > > > > > > > sound. Why don't
> > > > > > > > > you simply use shmem as your backing storage from the
> > > > > > > > > beginning and pin
> > > > > > > > > those pages if they are used by the device?
> > > > > > > > Yeah, that is exactly what the Intel guys are doing for their
> > > > > > > > integrated
> > > > > > > > GPUs :)
> > > > > > > > 
> > > > > > > > Problem is for TTM I need to be able to handle dGPUs and those have all
> > > > > > > > kinds of funny allocation restrictions. In other words I need to
> > > > > > > > guarantee
> > > > > > > > that the allocated memory is coherent accessible to the GPU
> > > > > > > > without using
> > > > > > > > SWIOTLB.
> > > > > > > > 
> > > > > > > > The simple case is that the device can only do DMA32, but you also got
> > > > > > > > device which can only do 40bits or 48bits.
> > > > > > > > 
> > > > > > > > On top of that you also got AGP, CMA and stuff like CPU cache behavior
> > > > > > > > changes (write back vs. write through, vs. uncached).
> > > > > > > OK, so the underlying problem seems to be that gfp mask (thus
> > > > > > > mapping_gfp_mask) cannot really reflect your requirements, right?  Would
> > > > > > > it help if shmem would allow to provide an allocation callback to
> > > > > > > override alloc_page_vma which is used currently? I am pretty sure there
> > > > > > > will be more to handle but going through shmem for the whole life time
> > > > > > > is just so much easier to reason about than some tricks to abuse shmem
> > > > > > > just for the swapout path.
> > > > > > Well it's a start, but the pages can have special CPU cache settings. So
> > > > > > direct IO from/to them usually doesn't work as expected.
> > > > > > 
> > > > > > Additional to that for AGP and CMA I need to make sure that I give those
> > > > > > pages back to the relevant subsystems instead of just dropping the page
> > > > > > reference.
> > > > > > 
> > > > > > So I would need to block for the swapio to be completed.
> > > > > > 
> > > > > > Anyway I probably need to revert those patches for now since this isn't
> > > > > > working as we hoped it would.
> > > > > > 
> > > > > > Thanks for the explanation how stuff works here.
> > > > > Another alternative here that I've tried before without being successful
> > > > > would perhaps be to drop shmem completely and, if it's a normal page (no dma
> > > > > or funny caching attributes) just use add_to_swap_cache()? If it's something
> > > > > else, try alloc a page with relevant gfp attributes, copy and
> > > > > add_to_swap_cache()? Or perhaps that doesn't work well from a shrinker
> > > > > either?
> > > > So before we toss everything and go an a great rewrite-the-world tour,
> > > > what if we just try to split up big objects. So for objects which are
> > > > bigger than e.g. 10mb
> > > > 
> > > > - move them to a special "under eviction" list
> > > > - keep a note how far we evicted thus far
> > > > - interleave allocating shmem pages, copying data and releasing the ttm
> > > >     backing store on a chunk basis (maybe 10mb or whatever, tuning tbh)
> > > > 
> > > > If that's not enough, occasionally break out of the shrinker entirely so
> > > > other parts of reclaim can reclaim the shmem stuff. But just releasing our
> > > > own pages as we go should help a lot I think.
> > > Yeah, the later is exactly what I was currently prototyping.
> > > 
> > > I just didn't used a limit but rather a only partially evicted BOs list
> > > which is used when we fail to allocate a page.
> > > 
> > > For the 5.12 cycle I think we should just go back to a hard 50% limit for
> > > now and then resurrect this when we have solved the issues.
> > Can we do the 50% limit without tossing out all the code we've done thus
> > far? Just so this doesn't get too disruptive.
> 
> Yeah, I just need to get back to v1 of this patch. Before you convinced me
> that the shrinker is the better approach .)

I don't think there's anything else than the shrinker if you want
dynamically sized memory usage. Or pinning it all. Implementing our own
kswapd and not tying into direct reclaim does not sound like a good idea
to me.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2021-03-24 19:21 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210319140857.2262-1-christian.koenig@amd.com>
     [not found] ` <YFTk1GSaUDI3wcWt@phenom.ffwll.local>
     [not found]   ` <2831bfcc-140e-dade-1f50-a6431e495e9d@gmail.com>
     [not found]     ` <YFT2LSR97rkkPyEP@phenom.ffwll.local>
     [not found]       ` <1ae415c4-8e49-5183-b44d-bc92088657d5@gmail.com>
     [not found]         ` <CAKMK7uEDhuvSwJj5CX8vHgLb+5zm=rdJPmXwb-VQWdrW6GwQZw@mail.gmail.com>
     [not found]           ` <e6e9df3e-cd2b-d80f-205d-6ca1865819b2@gmail.com>
2021-03-22 13:49             ` [PATCH] drm/ttm: stop warning on TT shrinker failure Daniel Vetter
2021-03-22 14:05               ` Matthew Wilcox
2021-03-22 14:22                 ` Daniel Vetter
2021-03-22 15:57                 ` Michal Hocko
2021-03-22 17:02                   ` Daniel Vetter
2021-03-22 19:34                     ` Christian König
2021-03-23  7:38                       ` Michal Hocko
2021-03-23 11:28                         ` Daniel Vetter
2021-03-23 11:46                           ` Michal Hocko
2021-03-23 11:51                             ` Christian König
2021-03-23 12:00                               ` Daniel Vetter
2021-03-23 12:05                               ` Michal Hocko
2021-03-23 11:48                           ` Christian König
2021-03-23 12:04                             ` Michal Hocko
2021-03-23 12:21                               ` Christian König
2021-03-23 12:37                                 ` Michal Hocko
2021-03-23 13:06                                   ` Christian König
2021-03-23 13:41                                     ` Michal Hocko
2021-03-23 13:56                                       ` Christian König
2021-03-23 15:13                                         ` Michal Hocko
2021-03-23 15:45                                           ` Christian König
2021-03-24 10:19                                             ` Thomas Hellström (Intel)
2021-03-24 11:55                                               ` Daniel Vetter
2021-03-24 12:00                                                 ` Christian König
2021-03-24 12:01                                                   ` Daniel Vetter
2021-03-24 12:07                                                     ` Christian König
2021-03-24 19:20                                                       ` Daniel Vetter
2021-03-23 13:15                               ` Daniel Vetter
2021-03-23 13:48                                 ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).