All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-16 22:29 ` Yosry Ahmed
  0 siblings, 0 replies; 28+ messages in thread
From: Yosry Ahmed @ 2022-05-16 22:29 UTC (permalink / raw)
  To: Johannes Weiner, Michal Hocko, Shakeel Butt, Andrew Morton,
	David Rientjes, Roman Gushchin
  Cc: cgroups, Tejun Heo, Linux-MM, Yu Zhao, Wei Xu, Greg Thelen, Chen Wandun

The discussions on the patch series [1] to add memory.reclaim has
shown that it is desirable to add an argument to control the type of
memory being reclaimed by invoked proactive reclaim using
memory.reclaim.

I am proposing adding a swappiness optional argument to the interface.
If set, it overwrites vm.swappiness and per-memcg swappiness. This
provides a way to enforce user policy on a stateless per-reclaim
basis. We can make policy decisions to perform reclaim differently for
tasks of different app classes based on their individual QoS needs. It
also helps for use cases when particularly page cache is high and we
want to mainly hit that without swapping out.

The interface would be something like this (utilizing the nested-keyed
interface we documented earlier):

$ echo "200M swappiness=30" > memory.reclaim

Looking forward to hearing thoughts about this before I go ahead and
send a patch.

[1]https://lore.kernel.org/lkml/20220331084151.2600229-1-yosryahmed@google.com/


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-16 22:29 ` Yosry Ahmed
  0 siblings, 0 replies; 28+ messages in thread
From: Yosry Ahmed @ 2022-05-16 22:29 UTC (permalink / raw)
  To: Johannes Weiner, Michal Hocko, Shakeel Butt, Andrew Morton,
	David Rientjes, Roman Gushchin
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo, Linux-MM, Yu Zhao,
	Wei Xu, Greg Thelen, Chen Wandun

The discussions on the patch series [1] to add memory.reclaim has
shown that it is desirable to add an argument to control the type of
memory being reclaimed by invoked proactive reclaim using
memory.reclaim.

I am proposing adding a swappiness optional argument to the interface.
If set, it overwrites vm.swappiness and per-memcg swappiness. This
provides a way to enforce user policy on a stateless per-reclaim
basis. We can make policy decisions to perform reclaim differently for
tasks of different app classes based on their individual QoS needs. It
also helps for use cases when particularly page cache is high and we
want to mainly hit that without swapping out.

The interface would be something like this (utilizing the nested-keyed
interface we documented earlier):

$ echo "200M swappiness=30" > memory.reclaim

Looking forward to hearing thoughts about this before I go ahead and
send a patch.

[1]https://lore.kernel.org/lkml/20220331084151.2600229-1-yosryahmed-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-17  6:56   ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2022-05-17  6:56 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Johannes Weiner, Shakeel Butt, Andrew Morton, David Rientjes,
	Roman Gushchin, cgroups, Tejun Heo, Linux-MM, Yu Zhao, Wei Xu,
	Greg Thelen, Chen Wandun

On Mon 16-05-22 15:29:42, Yosry Ahmed wrote:
> The discussions on the patch series [1] to add memory.reclaim has
> shown that it is desirable to add an argument to control the type of
> memory being reclaimed by invoked proactive reclaim using
> memory.reclaim.
> 
> I am proposing adding a swappiness optional argument to the interface.
> If set, it overwrites vm.swappiness and per-memcg swappiness. This
> provides a way to enforce user policy on a stateless per-reclaim
> basis. We can make policy decisions to perform reclaim differently for
> tasks of different app classes based on their individual QoS needs. It
> also helps for use cases when particularly page cache is high and we
> want to mainly hit that without swapping out.

Can you be more specific about the usecase please? Also how do you
define the semantic? Behavior like vm_swappiness is rather vague because
the kernel is free to ignore (and it does indeed) this knob in many
situations. What is the expected behavior when user explicitly requests
a certain swappiness?
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-17  6:56   ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2022-05-17  6:56 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Johannes Weiner, Shakeel Butt, Andrew Morton, David Rientjes,
	Roman Gushchin, cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
	Linux-MM, Yu Zhao, Wei Xu, Greg Thelen, Chen Wandun

On Mon 16-05-22 15:29:42, Yosry Ahmed wrote:
> The discussions on the patch series [1] to add memory.reclaim has
> shown that it is desirable to add an argument to control the type of
> memory being reclaimed by invoked proactive reclaim using
> memory.reclaim.
> 
> I am proposing adding a swappiness optional argument to the interface.
> If set, it overwrites vm.swappiness and per-memcg swappiness. This
> provides a way to enforce user policy on a stateless per-reclaim
> basis. We can make policy decisions to perform reclaim differently for
> tasks of different app classes based on their individual QoS needs. It
> also helps for use cases when particularly page cache is high and we
> want to mainly hit that without swapping out.

Can you be more specific about the usecase please? Also how do you
define the semantic? Behavior like vm_swappiness is rather vague because
the kernel is free to ignore (and it does indeed) this knob in many
situations. What is the expected behavior when user explicitly requests
a certain swappiness?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-17 16:05   ` Roman Gushchin
  0 siblings, 0 replies; 28+ messages in thread
From: Roman Gushchin @ 2022-05-17 16:05 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Johannes Weiner, Michal Hocko, Shakeel Butt, Andrew Morton,
	David Rientjes, cgroups, Tejun Heo, Linux-MM, Yu Zhao, Wei Xu,
	Greg Thelen, Chen Wandun

On Mon, May 16, 2022 at 03:29:42PM -0700, Yosry Ahmed wrote:
> The discussions on the patch series [1] to add memory.reclaim has
> shown that it is desirable to add an argument to control the type of
> memory being reclaimed by invoked proactive reclaim using
> memory.reclaim.
> 
> I am proposing adding a swappiness optional argument to the interface.
> If set, it overwrites vm.swappiness and per-memcg swappiness. This
> provides a way to enforce user policy on a stateless per-reclaim
> basis. We can make policy decisions to perform reclaim differently for
> tasks of different app classes based on their individual QoS needs. It
> also helps for use cases when particularly page cache is high and we
> want to mainly hit that without swapping out.
> 
> The interface would be something like this (utilizing the nested-keyed
> interface we documented earlier):
> 
> $ echo "200M swappiness=30" > memory.reclaim

What are the anticipated use cases except swappiness == 0 and
swappiness == system_default?

IMO it's better to allow specifying the type of memory to reclaim,
e.g. type="file"/"anon"/"slab", it's a way more clear what to expect.

E.g. what
$ echo "200M swappiness=1" > memory.reclaim
means if there is only 10M of pagecache? How much of anon memory will
be reclaimed?

Thanks!


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-17 16:05   ` Roman Gushchin
  0 siblings, 0 replies; 28+ messages in thread
From: Roman Gushchin @ 2022-05-17 16:05 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Johannes Weiner, Michal Hocko, Shakeel Butt, Andrew Morton,
	David Rientjes, cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
	Linux-MM, Yu Zhao, Wei Xu, Greg Thelen, Chen Wandun

On Mon, May 16, 2022 at 03:29:42PM -0700, Yosry Ahmed wrote:
> The discussions on the patch series [1] to add memory.reclaim has
> shown that it is desirable to add an argument to control the type of
> memory being reclaimed by invoked proactive reclaim using
> memory.reclaim.
> 
> I am proposing adding a swappiness optional argument to the interface.
> If set, it overwrites vm.swappiness and per-memcg swappiness. This
> provides a way to enforce user policy on a stateless per-reclaim
> basis. We can make policy decisions to perform reclaim differently for
> tasks of different app classes based on their individual QoS needs. It
> also helps for use cases when particularly page cache is high and we
> want to mainly hit that without swapping out.
> 
> The interface would be something like this (utilizing the nested-keyed
> interface we documented earlier):
> 
> $ echo "200M swappiness=30" > memory.reclaim

What are the anticipated use cases except swappiness == 0 and
swappiness == system_default?

IMO it's better to allow specifying the type of memory to reclaim,
e.g. type="file"/"anon"/"slab", it's a way more clear what to expect.

E.g. what
$ echo "200M swappiness=1" > memory.reclaim
means if there is only 10M of pagecache? How much of anon memory will
be reclaimed?

Thanks!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-17 18:06     ` Yosry Ahmed
  0 siblings, 0 replies; 28+ messages in thread
From: Yosry Ahmed @ 2022-05-17 18:06 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Shakeel Butt, Andrew Morton, David Rientjes,
	Roman Gushchin, cgroups, Tejun Heo, Linux-MM, Yu Zhao, Wei Xu,
	Greg Thelen, Chen Wandun

On Mon, May 16, 2022 at 11:56 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 16-05-22 15:29:42, Yosry Ahmed wrote:
> > The discussions on the patch series [1] to add memory.reclaim has
> > shown that it is desirable to add an argument to control the type of
> > memory being reclaimed by invoked proactive reclaim using
> > memory.reclaim.
> >
> > I am proposing adding a swappiness optional argument to the interface.
> > If set, it overwrites vm.swappiness and per-memcg swappiness. This
> > provides a way to enforce user policy on a stateless per-reclaim
> > basis. We can make policy decisions to perform reclaim differently for
> > tasks of different app classes based on their individual QoS needs. It
> > also helps for use cases when particularly page cache is high and we
> > want to mainly hit that without swapping out.
>
> Can you be more specific about the usecase please? Also how do you

For example for a class of applications it may be known that
reclaiming one type of pages anon/file is more profitable or will
incur an overhead, based on userspace knowledge of the nature of the
app. If most of what an app use for example is anon/tmpfs then it
might be better to explicitly ask the kernel to reclaim anon, and to
avoid reclaiming file pages in order not to hurt the file cache
performance.

It could also be a less aggressive alternative to /proc/sys/vm/drop_caches.

> define the semantic? Behavior like vm_swappiness is rather vague because
> the kernel is free to ignore (and it does indeed) this knob in many
> situations. What is the expected behavior when user explicitly requests
> a certain swappiness?

My initial thoughts was to have the same behavior as vm_swappiness,
but stateless. If a user provides a swappiness value then we use it
instead of vm_swappiness. However, I am aware that the definition is
vague and there are no guarantees here, the only reason I proposed
swappiness vs. explicit type arguments (like the original RFC and
Roman's reply) is flexibility. It looks like explicit type arguments
would be more practical though. I will continue the discussion
replying to Roman.

> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-17 18:06     ` Yosry Ahmed
  0 siblings, 0 replies; 28+ messages in thread
From: Yosry Ahmed @ 2022-05-17 18:06 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Shakeel Butt, Andrew Morton, David Rientjes,
	Roman Gushchin, cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
	Linux-MM, Yu Zhao, Wei Xu, Greg Thelen, Chen Wandun

On Mon, May 16, 2022 at 11:56 PM Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> wrote:
>
> On Mon 16-05-22 15:29:42, Yosry Ahmed wrote:
> > The discussions on the patch series [1] to add memory.reclaim has
> > shown that it is desirable to add an argument to control the type of
> > memory being reclaimed by invoked proactive reclaim using
> > memory.reclaim.
> >
> > I am proposing adding a swappiness optional argument to the interface.
> > If set, it overwrites vm.swappiness and per-memcg swappiness. This
> > provides a way to enforce user policy on a stateless per-reclaim
> > basis. We can make policy decisions to perform reclaim differently for
> > tasks of different app classes based on their individual QoS needs. It
> > also helps for use cases when particularly page cache is high and we
> > want to mainly hit that without swapping out.
>
> Can you be more specific about the usecase please? Also how do you

For example for a class of applications it may be known that
reclaiming one type of pages anon/file is more profitable or will
incur an overhead, based on userspace knowledge of the nature of the
app. If most of what an app use for example is anon/tmpfs then it
might be better to explicitly ask the kernel to reclaim anon, and to
avoid reclaiming file pages in order not to hurt the file cache
performance.

It could also be a less aggressive alternative to /proc/sys/vm/drop_caches.

> define the semantic? Behavior like vm_swappiness is rather vague because
> the kernel is free to ignore (and it does indeed) this knob in many
> situations. What is the expected behavior when user explicitly requests
> a certain swappiness?

My initial thoughts was to have the same behavior as vm_swappiness,
but stateless. If a user provides a swappiness value then we use it
instead of vm_swappiness. However, I am aware that the definition is
vague and there are no guarantees here, the only reason I proposed
swappiness vs. explicit type arguments (like the original RFC and
Roman's reply) is flexibility. It looks like explicit type arguments
would be more practical though. I will continue the discussion
replying to Roman.

> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
  2022-05-17 16:05   ` Roman Gushchin
@ 2022-05-17 18:13     ` Yosry Ahmed
  -1 siblings, 0 replies; 28+ messages in thread
From: Yosry Ahmed @ 2022-05-17 18:13 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Michal Hocko, Shakeel Butt, Andrew Morton,
	David Rientjes, cgroups, Tejun Heo, Linux-MM, Yu Zhao, Wei Xu,
	Greg Thelen, Chen Wandun

On Tue, May 17, 2022 at 9:05 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> On Mon, May 16, 2022 at 03:29:42PM -0700, Yosry Ahmed wrote:
> > The discussions on the patch series [1] to add memory.reclaim has
> > shown that it is desirable to add an argument to control the type of
> > memory being reclaimed by invoked proactive reclaim using
> > memory.reclaim.
> >
> > I am proposing adding a swappiness optional argument to the interface.
> > If set, it overwrites vm.swappiness and per-memcg swappiness. This
> > provides a way to enforce user policy on a stateless per-reclaim
> > basis. We can make policy decisions to perform reclaim differently for
> > tasks of different app classes based on their individual QoS needs. It
> > also helps for use cases when particularly page cache is high and we
> > want to mainly hit that without swapping out.
> >
> > The interface would be something like this (utilizing the nested-keyed
> > interface we documented earlier):
> >
> > $ echo "200M swappiness=30" > memory.reclaim
>
> What are the anticipated use cases except swappiness == 0 and
> swappiness == system_default?
>
> IMO it's better to allow specifying the type of memory to reclaim,
> e.g. type="file"/"anon"/"slab", it's a way more clear what to expect.

I imagined swappiness would give user space flexibility to reclaim a
ratio of file vs. anon as it sees fit based on app class or userspace
policy, but I agree that the guarantees of swappiness are weak and we
might want an explicit argument that directly controls the return
value of get_scan_count() or whether or not we call shrink_slab(). My
fear is that this interface may be less flexible, for example if we
only want to avoid reclaiming file pages, but we are fine with anon or
slab. Maybe in the future we will have a new type of memory to
reclaim, does it get implicitly reclaimed when other types are
specified or not?

Maybe we can use one argument per type instead? E.g.
    $ echo "200M file=no anon=yes slab=yes" > memory.reclaim

The default value would be "yes" for all types unless stated
otherwise. This is also leaves room for future extensions (maybe
file=clean to reclaim clean file pages only?). Interested to hear your
thoughts on this!

>
> E.g. what
> $ echo "200M swappiness=1" > memory.reclaim
> means if there is only 10M of pagecache? How much of anon memory will
> be reclaimed?

Good point. I agree that the type argument or per-type arguments have
multiple advantages over swappiness.

>
> Thanks!


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-17 18:13     ` Yosry Ahmed
  0 siblings, 0 replies; 28+ messages in thread
From: Yosry Ahmed @ 2022-05-17 18:13 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Michal Hocko, Shakeel Butt, Andrew Morton,
	David Rientjes, cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
	Linux-MM, Yu Zhao, Wei Xu, Greg Thelen, Chen Wandun

On Tue, May 17, 2022 at 9:05 AM Roman Gushchin <roman.gushchin-fxUVXftIFDnyG1zEObXtfA@public.gmane.org> wrote:
>
> On Mon, May 16, 2022 at 03:29:42PM -0700, Yosry Ahmed wrote:
> > The discussions on the patch series [1] to add memory.reclaim has
> > shown that it is desirable to add an argument to control the type of
> > memory being reclaimed by invoked proactive reclaim using
> > memory.reclaim.
> >
> > I am proposing adding a swappiness optional argument to the interface.
> > If set, it overwrites vm.swappiness and per-memcg swappiness. This
> > provides a way to enforce user policy on a stateless per-reclaim
> > basis. We can make policy decisions to perform reclaim differently for
> > tasks of different app classes based on their individual QoS needs. It
> > also helps for use cases when particularly page cache is high and we
> > want to mainly hit that without swapping out.
> >
> > The interface would be something like this (utilizing the nested-keyed
> > interface we documented earlier):
> >
> > $ echo "200M swappiness=30" > memory.reclaim
>
> What are the anticipated use cases except swappiness == 0 and
> swappiness == system_default?
>
> IMO it's better to allow specifying the type of memory to reclaim,
> e.g. type="file"/"anon"/"slab", it's a way more clear what to expect.

I imagined swappiness would give user space flexibility to reclaim a
ratio of file vs. anon as it sees fit based on app class or userspace
policy, but I agree that the guarantees of swappiness are weak and we
might want an explicit argument that directly controls the return
value of get_scan_count() or whether or not we call shrink_slab(). My
fear is that this interface may be less flexible, for example if we
only want to avoid reclaiming file pages, but we are fine with anon or
slab. Maybe in the future we will have a new type of memory to
reclaim, does it get implicitly reclaimed when other types are
specified or not?

Maybe we can use one argument per type instead? E.g.
    $ echo "200M file=no anon=yes slab=yes" > memory.reclaim

The default value would be "yes" for all types unless stated
otherwise. This is also leaves room for future extensions (maybe
file=clean to reclaim clean file pages only?). Interested to hear your
thoughts on this!

>
> E.g. what
> $ echo "200M swappiness=1" > memory.reclaim
> means if there is only 10M of pagecache? How much of anon memory will
> be reclaimed?

Good point. I agree that the type argument or per-type arguments have
multiple advantages over swappiness.

>
> Thanks!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-17 19:49       ` Roman Gushchin
  0 siblings, 0 replies; 28+ messages in thread
From: Roman Gushchin @ 2022-05-17 19:49 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Johannes Weiner, Michal Hocko, Shakeel Butt, Andrew Morton,
	David Rientjes, cgroups, Tejun Heo, Linux-MM, Yu Zhao, Wei Xu,
	Greg Thelen, Chen Wandun

On Tue, May 17, 2022 at 11:13:10AM -0700, Yosry Ahmed wrote:
> On Tue, May 17, 2022 at 9:05 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> >
> > On Mon, May 16, 2022 at 03:29:42PM -0700, Yosry Ahmed wrote:
> > > The discussions on the patch series [1] to add memory.reclaim has
> > > shown that it is desirable to add an argument to control the type of
> > > memory being reclaimed by invoked proactive reclaim using
> > > memory.reclaim.
> > >
> > > I am proposing adding a swappiness optional argument to the interface.
> > > If set, it overwrites vm.swappiness and per-memcg swappiness. This
> > > provides a way to enforce user policy on a stateless per-reclaim
> > > basis. We can make policy decisions to perform reclaim differently for
> > > tasks of different app classes based on their individual QoS needs. It
> > > also helps for use cases when particularly page cache is high and we
> > > want to mainly hit that without swapping out.
> > >
> > > The interface would be something like this (utilizing the nested-keyed
> > > interface we documented earlier):
> > >
> > > $ echo "200M swappiness=30" > memory.reclaim
> >
> > What are the anticipated use cases except swappiness == 0 and
> > swappiness == system_default?
> >
> > IMO it's better to allow specifying the type of memory to reclaim,
> > e.g. type="file"/"anon"/"slab", it's a way more clear what to expect.
> 
> I imagined swappiness would give user space flexibility to reclaim a
> ratio of file vs. anon as it sees fit based on app class or userspace
> policy, but I agree that the guarantees of swappiness are weak and we
> might want an explicit argument that directly controls the return
> value of get_scan_count() or whether or not we call shrink_slab(). My
> fear is that this interface may be less flexible, for example if we
> only want to avoid reclaiming file pages, but we are fine with anon or
> slab.
> Maybe in the future we will have a new type of memory to
> reclaim, does it get implicitly reclaimed when other types are
> specified or not?
> 
> Maybe we can use one argument per type instead? E.g.
>     $ echo "200M file=no anon=yes slab=yes" > memory.reclaim
> 
> The default value would be "yes" for all types unless stated
> otherwise. This is also leaves room for future extensions (maybe
> file=clean to reclaim clean file pages only?). Interested to hear your
> thoughts on this!

The question to answer is do you want the code which is determining
the balance of scanning be a part of the interface?

If not, I'd stick with explicitly specifying a type of memory to scan
(and the "I don't care" mode, where you simply ask to reclaim X bytes).

Otherwise you need to describe how the artificial memory pressure will
be distributed over different memory types. And with time it might
start being significantly different to what the generic reclaim code does,
because the reclaim path is free to do what's better, there are no
user-visible guarantees.

> 
> >
> > E.g. what
> > $ echo "200M swappiness=1" > memory.reclaim
> > means if there is only 10M of pagecache? How much of anon memory will
> > be reclaimed?
> 
> Good point. I agree that the type argument or per-type arguments have
> multiple advantages over swappiness.

If a user wants to select multiple types of memory, can they just run several
requests in parallel? Or one by one?

Thanks!


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-17 19:49       ` Roman Gushchin
  0 siblings, 0 replies; 28+ messages in thread
From: Roman Gushchin @ 2022-05-17 19:49 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Johannes Weiner, Michal Hocko, Shakeel Butt, Andrew Morton,
	David Rientjes, cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
	Linux-MM, Yu Zhao, Wei Xu, Greg Thelen, Chen Wandun

On Tue, May 17, 2022 at 11:13:10AM -0700, Yosry Ahmed wrote:
> On Tue, May 17, 2022 at 9:05 AM Roman Gushchin <roman.gushchin-fxUVXftIFDnyG1zEObXtfA@public.gmane.org> wrote:
> >
> > On Mon, May 16, 2022 at 03:29:42PM -0700, Yosry Ahmed wrote:
> > > The discussions on the patch series [1] to add memory.reclaim has
> > > shown that it is desirable to add an argument to control the type of
> > > memory being reclaimed by invoked proactive reclaim using
> > > memory.reclaim.
> > >
> > > I am proposing adding a swappiness optional argument to the interface.
> > > If set, it overwrites vm.swappiness and per-memcg swappiness. This
> > > provides a way to enforce user policy on a stateless per-reclaim
> > > basis. We can make policy decisions to perform reclaim differently for
> > > tasks of different app classes based on their individual QoS needs. It
> > > also helps for use cases when particularly page cache is high and we
> > > want to mainly hit that without swapping out.
> > >
> > > The interface would be something like this (utilizing the nested-keyed
> > > interface we documented earlier):
> > >
> > > $ echo "200M swappiness=30" > memory.reclaim
> >
> > What are the anticipated use cases except swappiness == 0 and
> > swappiness == system_default?
> >
> > IMO it's better to allow specifying the type of memory to reclaim,
> > e.g. type="file"/"anon"/"slab", it's a way more clear what to expect.
> 
> I imagined swappiness would give user space flexibility to reclaim a
> ratio of file vs. anon as it sees fit based on app class or userspace
> policy, but I agree that the guarantees of swappiness are weak and we
> might want an explicit argument that directly controls the return
> value of get_scan_count() or whether or not we call shrink_slab(). My
> fear is that this interface may be less flexible, for example if we
> only want to avoid reclaiming file pages, but we are fine with anon or
> slab.
> Maybe in the future we will have a new type of memory to
> reclaim, does it get implicitly reclaimed when other types are
> specified or not?
> 
> Maybe we can use one argument per type instead? E.g.
>     $ echo "200M file=no anon=yes slab=yes" > memory.reclaim
> 
> The default value would be "yes" for all types unless stated
> otherwise. This is also leaves room for future extensions (maybe
> file=clean to reclaim clean file pages only?). Interested to hear your
> thoughts on this!

The question to answer is do you want the code which is determining
the balance of scanning be a part of the interface?

If not, I'd stick with explicitly specifying a type of memory to scan
(and the "I don't care" mode, where you simply ask to reclaim X bytes).

Otherwise you need to describe how the artificial memory pressure will
be distributed over different memory types. And with time it might
start being significantly different to what the generic reclaim code does,
because the reclaim path is free to do what's better, there are no
user-visible guarantees.

> 
> >
> > E.g. what
> > $ echo "200M swappiness=1" > memory.reclaim
> > means if there is only 10M of pagecache? How much of anon memory will
> > be reclaimed?
> 
> Good point. I agree that the type argument or per-type arguments have
> multiple advantages over swappiness.

If a user wants to select multiple types of memory, can they just run several
requests in parallel? Or one by one?

Thanks!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-17 20:06       ` Johannes Weiner
  0 siblings, 0 replies; 28+ messages in thread
From: Johannes Weiner @ 2022-05-17 20:06 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Michal Hocko, Shakeel Butt, Andrew Morton, David Rientjes,
	Roman Gushchin, cgroups, Tejun Heo, Linux-MM, Yu Zhao, Wei Xu,
	Greg Thelen, Chen Wandun

Hi Yosry,

On Tue, May 17, 2022 at 11:06:36AM -0700, Yosry Ahmed wrote:
> On Mon, May 16, 2022 at 11:56 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Mon 16-05-22 15:29:42, Yosry Ahmed wrote:
> > > The discussions on the patch series [1] to add memory.reclaim has
> > > shown that it is desirable to add an argument to control the type of
> > > memory being reclaimed by invoked proactive reclaim using
> > > memory.reclaim.
> > >
> > > I am proposing adding a swappiness optional argument to the interface.
> > > If set, it overwrites vm.swappiness and per-memcg swappiness. This
> > > provides a way to enforce user policy on a stateless per-reclaim
> > > basis. We can make policy decisions to perform reclaim differently for
> > > tasks of different app classes based on their individual QoS needs. It
> > > also helps for use cases when particularly page cache is high and we
> > > want to mainly hit that without swapping out.
> >
> > Can you be more specific about the usecase please? Also how do you
> 
> For example for a class of applications it may be known that
> reclaiming one type of pages anon/file is more profitable or will
> incur an overhead, based on userspace knowledge of the nature of the
> app.

I want to make sure I understand what you're trying to correct for
with this bias. Could you expand some on what you mean by profitable?

The way the kernel thinks today is that importance of any given page
is its access frequency times the cost of paging it. swappiness exists
to recognize differences in the second part: the cost involved in
swapping a page vs the cost of a file cache miss.

For example, page A is accessed 10 times more frequently than B, but B
is 10 times more expensive to refault/swapin. Combining that, they
should be roughly equal reclaim candidates.

This is the same with the seek parameter of slab shrinkers: some
objects are more expensive to recreate than others. Once corrected for
that, presence of reference bits can be interpreted on an even level.

While access frequency is clearly a workload property, the cost of
refaulting is conventionally not - let alone a per-reclaim property!

If I understand you correctly, you're saying that the backing type of
a piece of memory can say something about the importance of the data
within. Something that goes beyond the work of recreating it.

Is that true or am I misreading this?

If that's your claim, isn't that, if it happens, mostly incidental?

For example, in our fleet we used to copy executable text into
anonymous memory to get THP backing. With file THP support in the
kernel, the text is back in cache. The importance of the memory
*contents* stayed the same. The backing storage changed, but beyond
that the anon/file distinction doesn't mean anything.

Another example. Probably one of the most common workload structures
is text, heap, logging/startup/error handling: hot file, warm anon,
cold file. How does prioritizing either file or anon apply to this?

Maybe I'm misunderstanding and this IS about per-workload backing
types? Maybe the per-cgroup swapfiles that you guys are using?

> If most of what an app use for example is anon/tmpfs then it might
> be better to explicitly ask the kernel to reclaim anon, and to avoid
> reclaiming file pages in order not to hurt the file cache
> performance.

Hm.

Reclaim ages those pools based on their size, so a dominant anon set
should receive more pressure than a small file set. I can see two
options why this doesn't produce the desired results:

1) Reclaim is broken and doesn't allocate scan rates right, or

2) Access frequency x refault cost alone is not a satisfactory
   predictor for the value of any given page.

Can you see another?

I can sort of see the argument for 2), because it can be workload
dependent: a 50ms refault in a single-threaded part of the program is
likely more disruptive than the same refault in an asynchronous worker
thread. This is a factor we're not really taking into account today.

But I don't think an anon/file bias will capture this coefficient?


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-17 20:06       ` Johannes Weiner
  0 siblings, 0 replies; 28+ messages in thread
From: Johannes Weiner @ 2022-05-17 20:06 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Michal Hocko, Shakeel Butt, Andrew Morton, David Rientjes,
	Roman Gushchin, cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
	Linux-MM, Yu Zhao, Wei Xu, Greg Thelen, Chen Wandun

Hi Yosry,

On Tue, May 17, 2022 at 11:06:36AM -0700, Yosry Ahmed wrote:
> On Mon, May 16, 2022 at 11:56 PM Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> wrote:
> >
> > On Mon 16-05-22 15:29:42, Yosry Ahmed wrote:
> > > The discussions on the patch series [1] to add memory.reclaim has
> > > shown that it is desirable to add an argument to control the type of
> > > memory being reclaimed by invoked proactive reclaim using
> > > memory.reclaim.
> > >
> > > I am proposing adding a swappiness optional argument to the interface.
> > > If set, it overwrites vm.swappiness and per-memcg swappiness. This
> > > provides a way to enforce user policy on a stateless per-reclaim
> > > basis. We can make policy decisions to perform reclaim differently for
> > > tasks of different app classes based on their individual QoS needs. It
> > > also helps for use cases when particularly page cache is high and we
> > > want to mainly hit that without swapping out.
> >
> > Can you be more specific about the usecase please? Also how do you
> 
> For example for a class of applications it may be known that
> reclaiming one type of pages anon/file is more profitable or will
> incur an overhead, based on userspace knowledge of the nature of the
> app.

I want to make sure I understand what you're trying to correct for
with this bias. Could you expand some on what you mean by profitable?

The way the kernel thinks today is that importance of any given page
is its access frequency times the cost of paging it. swappiness exists
to recognize differences in the second part: the cost involved in
swapping a page vs the cost of a file cache miss.

For example, page A is accessed 10 times more frequently than B, but B
is 10 times more expensive to refault/swapin. Combining that, they
should be roughly equal reclaim candidates.

This is the same with the seek parameter of slab shrinkers: some
objects are more expensive to recreate than others. Once corrected for
that, presence of reference bits can be interpreted on an even level.

While access frequency is clearly a workload property, the cost of
refaulting is conventionally not - let alone a per-reclaim property!

If I understand you correctly, you're saying that the backing type of
a piece of memory can say something about the importance of the data
within. Something that goes beyond the work of recreating it.

Is that true or am I misreading this?

If that's your claim, isn't that, if it happens, mostly incidental?

For example, in our fleet we used to copy executable text into
anonymous memory to get THP backing. With file THP support in the
kernel, the text is back in cache. The importance of the memory
*contents* stayed the same. The backing storage changed, but beyond
that the anon/file distinction doesn't mean anything.

Another example. Probably one of the most common workload structures
is text, heap, logging/startup/error handling: hot file, warm anon,
cold file. How does prioritizing either file or anon apply to this?

Maybe I'm misunderstanding and this IS about per-workload backing
types? Maybe the per-cgroup swapfiles that you guys are using?

> If most of what an app use for example is anon/tmpfs then it might
> be better to explicitly ask the kernel to reclaim anon, and to avoid
> reclaiming file pages in order not to hurt the file cache
> performance.

Hm.

Reclaim ages those pools based on their size, so a dominant anon set
should receive more pressure than a small file set. I can see two
options why this doesn't produce the desired results:

1) Reclaim is broken and doesn't allocate scan rates right, or

2) Access frequency x refault cost alone is not a satisfactory
   predictor for the value of any given page.

Can you see another?

I can sort of see the argument for 2), because it can be workload
dependent: a 50ms refault in a single-threaded part of the program is
likely more disruptive than the same refault in an asynchronous worker
thread. This is a factor we're not really taking into account today.

But I don't think an anon/file bias will capture this coefficient?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
  2022-05-17 19:49       ` Roman Gushchin
@ 2022-05-17 20:11         ` Yosry Ahmed
  -1 siblings, 0 replies; 28+ messages in thread
From: Yosry Ahmed @ 2022-05-17 20:11 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Michal Hocko, Shakeel Butt, Andrew Morton,
	David Rientjes, cgroups, Tejun Heo, Linux-MM, Yu Zhao, Wei Xu,
	Greg Thelen, Chen Wandun

On Tue, May 17, 2022 at 12:49 PM Roman Gushchin
<roman.gushchin@linux.dev> wrote:
>
> On Tue, May 17, 2022 at 11:13:10AM -0700, Yosry Ahmed wrote:
> > On Tue, May 17, 2022 at 9:05 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > >
> > > On Mon, May 16, 2022 at 03:29:42PM -0700, Yosry Ahmed wrote:
> > > > The discussions on the patch series [1] to add memory.reclaim has
> > > > shown that it is desirable to add an argument to control the type of
> > > > memory being reclaimed by invoked proactive reclaim using
> > > > memory.reclaim.
> > > >
> > > > I am proposing adding a swappiness optional argument to the interface.
> > > > If set, it overwrites vm.swappiness and per-memcg swappiness. This
> > > > provides a way to enforce user policy on a stateless per-reclaim
> > > > basis. We can make policy decisions to perform reclaim differently for
> > > > tasks of different app classes based on their individual QoS needs. It
> > > > also helps for use cases when particularly page cache is high and we
> > > > want to mainly hit that without swapping out.
> > > >
> > > > The interface would be something like this (utilizing the nested-keyed
> > > > interface we documented earlier):
> > > >
> > > > $ echo "200M swappiness=30" > memory.reclaim
> > >
> > > What are the anticipated use cases except swappiness == 0 and
> > > swappiness == system_default?
> > >
> > > IMO it's better to allow specifying the type of memory to reclaim,
> > > e.g. type="file"/"anon"/"slab", it's a way more clear what to expect.
> >
> > I imagined swappiness would give user space flexibility to reclaim a
> > ratio of file vs. anon as it sees fit based on app class or userspace
> > policy, but I agree that the guarantees of swappiness are weak and we
> > might want an explicit argument that directly controls the return
> > value of get_scan_count() or whether or not we call shrink_slab(). My
> > fear is that this interface may be less flexible, for example if we
> > only want to avoid reclaiming file pages, but we are fine with anon or
> > slab.
> > Maybe in the future we will have a new type of memory to
> > reclaim, does it get implicitly reclaimed when other types are
> > specified or not?
> >
> > Maybe we can use one argument per type instead? E.g.
> >     $ echo "200M file=no anon=yes slab=yes" > memory.reclaim
> >
> > The default value would be "yes" for all types unless stated
> > otherwise. This is also leaves room for future extensions (maybe
> > file=clean to reclaim clean file pages only?). Interested to hear your
> > thoughts on this!
>
> The question to answer is do you want the code which is determining
> the balance of scanning be a part of the interface?
>
> If not, I'd stick with explicitly specifying a type of memory to scan
> (and the "I don't care" mode, where you simply ask to reclaim X bytes).
>
> Otherwise you need to describe how the artificial memory pressure will
> be distributed over different memory types. And with time it might
> start being significantly different to what the generic reclaim code does,
> because the reclaim path is free to do what's better, there are no
> user-visible guarantees.

My understanding is that your question is about the swappiness
argument, and I agree it can get complicated. I am on board with
explicitly specifying the type(s) to reclaim. I think an interface
with one argument per type (whitelist/blacklist approach) could be
more flexible in specifying multiple types per invocation (smaller
race window between reading usages and writing to memory.reclaim), and
has room for future extensions (e.g. file=clean). However, if you
still think a type=file/anon/slab parameter is better we can also go
with this.

I imagine this will be an enum/flags that will be passed to
try_to_free_pages() instead of may_swap, and then we can map it to one
bit flags in struct scan_control. The anon/file flags will be used to
control list type in shrink_lruvec (get_scan_counts) and
mem_cgroup_soft_limit_reclaim(), and the slab flag will be used to
control calls to shrink_slab().

This is orthogonal, but while we are at it we can also add a
"controlled_reclaim" flag that we use to control whether we call
vmpressure or not. I assume we don't want to count vmpressure for
controlled reclaim, similar to PSI. We can then also revert
e22c6ed90aa9 ("mm: memcontrol: don't count limit-setting reclaim as
memory pressure") and use the same flag to control calls to psi.

>
> >
> > >
> > > E.g. what
> > > $ echo "200M swappiness=1" > memory.reclaim
> > > means if there is only 10M of pagecache? How much of anon memory will
> > > be reclaimed?
> >
> > Good point. I agree that the type argument or per-type arguments have
> > multiple advantages over swappiness.
>
> If a user wants to select multiple types of memory, can they just run several
> requests in parallel? Or one by one?
>
> Thanks!


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-17 20:11         ` Yosry Ahmed
  0 siblings, 0 replies; 28+ messages in thread
From: Yosry Ahmed @ 2022-05-17 20:11 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Michal Hocko, Shakeel Butt, Andrew Morton,
	David Rientjes, cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
	Linux-MM, Yu Zhao, Wei Xu, Greg Thelen, Chen Wandun

On Tue, May 17, 2022 at 12:49 PM Roman Gushchin
<roman.gushchin-fxUVXftIFDnyG1zEObXtfA@public.gmane.org> wrote:
>
> On Tue, May 17, 2022 at 11:13:10AM -0700, Yosry Ahmed wrote:
> > On Tue, May 17, 2022 at 9:05 AM Roman Gushchin <roman.gushchin-fxUVXftIFDnyG1zEObXtfA@public.gmane.org> wrote:
> > >
> > > On Mon, May 16, 2022 at 03:29:42PM -0700, Yosry Ahmed wrote:
> > > > The discussions on the patch series [1] to add memory.reclaim has
> > > > shown that it is desirable to add an argument to control the type of
> > > > memory being reclaimed by invoked proactive reclaim using
> > > > memory.reclaim.
> > > >
> > > > I am proposing adding a swappiness optional argument to the interface.
> > > > If set, it overwrites vm.swappiness and per-memcg swappiness. This
> > > > provides a way to enforce user policy on a stateless per-reclaim
> > > > basis. We can make policy decisions to perform reclaim differently for
> > > > tasks of different app classes based on their individual QoS needs. It
> > > > also helps for use cases when particularly page cache is high and we
> > > > want to mainly hit that without swapping out.
> > > >
> > > > The interface would be something like this (utilizing the nested-keyed
> > > > interface we documented earlier):
> > > >
> > > > $ echo "200M swappiness=30" > memory.reclaim
> > >
> > > What are the anticipated use cases except swappiness == 0 and
> > > swappiness == system_default?
> > >
> > > IMO it's better to allow specifying the type of memory to reclaim,
> > > e.g. type="file"/"anon"/"slab", it's a way more clear what to expect.
> >
> > I imagined swappiness would give user space flexibility to reclaim a
> > ratio of file vs. anon as it sees fit based on app class or userspace
> > policy, but I agree that the guarantees of swappiness are weak and we
> > might want an explicit argument that directly controls the return
> > value of get_scan_count() or whether or not we call shrink_slab(). My
> > fear is that this interface may be less flexible, for example if we
> > only want to avoid reclaiming file pages, but we are fine with anon or
> > slab.
> > Maybe in the future we will have a new type of memory to
> > reclaim, does it get implicitly reclaimed when other types are
> > specified or not?
> >
> > Maybe we can use one argument per type instead? E.g.
> >     $ echo "200M file=no anon=yes slab=yes" > memory.reclaim
> >
> > The default value would be "yes" for all types unless stated
> > otherwise. This is also leaves room for future extensions (maybe
> > file=clean to reclaim clean file pages only?). Interested to hear your
> > thoughts on this!
>
> The question to answer is do you want the code which is determining
> the balance of scanning be a part of the interface?
>
> If not, I'd stick with explicitly specifying a type of memory to scan
> (and the "I don't care" mode, where you simply ask to reclaim X bytes).
>
> Otherwise you need to describe how the artificial memory pressure will
> be distributed over different memory types. And with time it might
> start being significantly different to what the generic reclaim code does,
> because the reclaim path is free to do what's better, there are no
> user-visible guarantees.

My understanding is that your question is about the swappiness
argument, and I agree it can get complicated. I am on board with
explicitly specifying the type(s) to reclaim. I think an interface
with one argument per type (whitelist/blacklist approach) could be
more flexible in specifying multiple types per invocation (smaller
race window between reading usages and writing to memory.reclaim), and
has room for future extensions (e.g. file=clean). However, if you
still think a type=file/anon/slab parameter is better we can also go
with this.

I imagine this will be an enum/flags that will be passed to
try_to_free_pages() instead of may_swap, and then we can map it to one
bit flags in struct scan_control. The anon/file flags will be used to
control list type in shrink_lruvec (get_scan_counts) and
mem_cgroup_soft_limit_reclaim(), and the slab flag will be used to
control calls to shrink_slab().

This is orthogonal, but while we are at it we can also add a
"controlled_reclaim" flag that we use to control whether we call
vmpressure or not. I assume we don't want to count vmpressure for
controlled reclaim, similar to PSI. We can then also revert
e22c6ed90aa9 ("mm: memcontrol: don't count limit-setting reclaim as
memory pressure") and use the same flag to control calls to psi.

>
> >
> > >
> > > E.g. what
> > > $ echo "200M swappiness=1" > memory.reclaim
> > > means if there is only 10M of pagecache? How much of anon memory will
> > > be reclaimed?
> >
> > Good point. I agree that the type argument or per-type arguments have
> > multiple advantages over swappiness.
>
> If a user wants to select multiple types of memory, can they just run several
> requests in parallel? Or one by one?
>
> Thanks!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-17 20:45           ` Roman Gushchin
  0 siblings, 0 replies; 28+ messages in thread
From: Roman Gushchin @ 2022-05-17 20:45 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Johannes Weiner, Michal Hocko, Shakeel Butt, Andrew Morton,
	David Rientjes, cgroups, Tejun Heo, Linux-MM, Yu Zhao, Wei Xu,
	Greg Thelen, Chen Wandun

On Tue, May 17, 2022 at 01:11:13PM -0700, Yosry Ahmed wrote:
> On Tue, May 17, 2022 at 12:49 PM Roman Gushchin
> <roman.gushchin@linux.dev> wrote:
> >
> > On Tue, May 17, 2022 at 11:13:10AM -0700, Yosry Ahmed wrote:
> > > On Tue, May 17, 2022 at 9:05 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > > >
> > > > On Mon, May 16, 2022 at 03:29:42PM -0700, Yosry Ahmed wrote:
> > > > > The discussions on the patch series [1] to add memory.reclaim has
> > > > > shown that it is desirable to add an argument to control the type of
> > > > > memory being reclaimed by invoked proactive reclaim using
> > > > > memory.reclaim.
> > > > >
> > > > > I am proposing adding a swappiness optional argument to the interface.
> > > > > If set, it overwrites vm.swappiness and per-memcg swappiness. This
> > > > > provides a way to enforce user policy on a stateless per-reclaim
> > > > > basis. We can make policy decisions to perform reclaim differently for
> > > > > tasks of different app classes based on their individual QoS needs. It
> > > > > also helps for use cases when particularly page cache is high and we
> > > > > want to mainly hit that without swapping out.
> > > > >
> > > > > The interface would be something like this (utilizing the nested-keyed
> > > > > interface we documented earlier):
> > > > >
> > > > > $ echo "200M swappiness=30" > memory.reclaim
> > > >
> > > > What are the anticipated use cases except swappiness == 0 and
> > > > swappiness == system_default?
> > > >
> > > > IMO it's better to allow specifying the type of memory to reclaim,
> > > > e.g. type="file"/"anon"/"slab", it's a way more clear what to expect.
> > >
> > > I imagined swappiness would give user space flexibility to reclaim a
> > > ratio of file vs. anon as it sees fit based on app class or userspace
> > > policy, but I agree that the guarantees of swappiness are weak and we
> > > might want an explicit argument that directly controls the return
> > > value of get_scan_count() or whether or not we call shrink_slab(). My
> > > fear is that this interface may be less flexible, for example if we
> > > only want to avoid reclaiming file pages, but we are fine with anon or
> > > slab.
> > > Maybe in the future we will have a new type of memory to
> > > reclaim, does it get implicitly reclaimed when other types are
> > > specified or not?
> > >
> > > Maybe we can use one argument per type instead? E.g.
> > >     $ echo "200M file=no anon=yes slab=yes" > memory.reclaim
> > >
> > > The default value would be "yes" for all types unless stated
> > > otherwise. This is also leaves room for future extensions (maybe
> > > file=clean to reclaim clean file pages only?). Interested to hear your
> > > thoughts on this!
> >
> > The question to answer is do you want the code which is determining
> > the balance of scanning be a part of the interface?
> >
> > If not, I'd stick with explicitly specifying a type of memory to scan
> > (and the "I don't care" mode, where you simply ask to reclaim X bytes).
> >
> > Otherwise you need to describe how the artificial memory pressure will
> > be distributed over different memory types. And with time it might
> > start being significantly different to what the generic reclaim code does,
> > because the reclaim path is free to do what's better, there are no
> > user-visible guarantees.
> 
> My understanding is that your question is about the swappiness
> argument, and I agree it can get complicated. I am on board with
> explicitly specifying the type(s) to reclaim. I think an interface
> with one argument per type (whitelist/blacklist approach) could be
> more flexible in specifying multiple types per invocation (smaller
> race window between reading usages and writing to memory.reclaim), and
> has room for future extensions (e.g. file=clean). However, if you
> still think a type=file/anon/slab parameter is better we can also go
> with this.

If you allow more than one type, how would you balance between them?
E.g. in your example:
     $ echo "200M file=no anon=yes slab=yes" > memory.reclaim
How much slab and anonymous memory will be reclaimed? 100M and 100M?
Probably not (we don't balance slabs with other types of the memory).
And if not, the interface becomes very vague: all we can guarantee
is that *some* pressure will be applied on both anon and slab.

My point is that the interface should have a deterministic behavior
and not rely on the current state of the memory pressure balancing
heuristic. It can be likely done in different ways, I don't have
a strong opinion here.

Thanks!


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-17 20:45           ` Roman Gushchin
  0 siblings, 0 replies; 28+ messages in thread
From: Roman Gushchin @ 2022-05-17 20:45 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Johannes Weiner, Michal Hocko, Shakeel Butt, Andrew Morton,
	David Rientjes, cgroups-u79uwXL29TY76Z2rM5mHXA, Tejun Heo,
	Linux-MM, Yu Zhao, Wei Xu, Greg Thelen, Chen Wandun

On Tue, May 17, 2022 at 01:11:13PM -0700, Yosry Ahmed wrote:
> On Tue, May 17, 2022 at 12:49 PM Roman Gushchin
> <roman.gushchin-fxUVXftIFDnyG1zEObXtfA@public.gmane.org> wrote:
> >
> > On Tue, May 17, 2022 at 11:13:10AM -0700, Yosry Ahmed wrote:
> > > On Tue, May 17, 2022 at 9:05 AM Roman Gushchin <roman.gushchin-fxUVXftIFDnyG1zEObXtfA@public.gmane.org> wrote:
> > > >
> > > > On Mon, May 16, 2022 at 03:29:42PM -0700, Yosry Ahmed wrote:
> > > > > The discussions on the patch series [1] to add memory.reclaim has
> > > > > shown that it is desirable to add an argument to control the type of
> > > > > memory being reclaimed by invoked proactive reclaim using
> > > > > memory.reclaim.
> > > > >
> > > > > I am proposing adding a swappiness optional argument to the interface.
> > > > > If set, it overwrites vm.swappiness and per-memcg swappiness. This
> > > > > provides a way to enforce user policy on a stateless per-reclaim
> > > > > basis. We can make policy decisions to perform reclaim differently for
> > > > > tasks of different app classes based on their individual QoS needs. It
> > > > > also helps for use cases when particularly page cache is high and we
> > > > > want to mainly hit that without swapping out.
> > > > >
> > > > > The interface would be something like this (utilizing the nested-keyed
> > > > > interface we documented earlier):
> > > > >
> > > > > $ echo "200M swappiness=30" > memory.reclaim
> > > >
> > > > What are the anticipated use cases except swappiness == 0 and
> > > > swappiness == system_default?
> > > >
> > > > IMO it's better to allow specifying the type of memory to reclaim,
> > > > e.g. type="file"/"anon"/"slab", it's a way more clear what to expect.
> > >
> > > I imagined swappiness would give user space flexibility to reclaim a
> > > ratio of file vs. anon as it sees fit based on app class or userspace
> > > policy, but I agree that the guarantees of swappiness are weak and we
> > > might want an explicit argument that directly controls the return
> > > value of get_scan_count() or whether or not we call shrink_slab(). My
> > > fear is that this interface may be less flexible, for example if we
> > > only want to avoid reclaiming file pages, but we are fine with anon or
> > > slab.
> > > Maybe in the future we will have a new type of memory to
> > > reclaim, does it get implicitly reclaimed when other types are
> > > specified or not?
> > >
> > > Maybe we can use one argument per type instead? E.g.
> > >     $ echo "200M file=no anon=yes slab=yes" > memory.reclaim
> > >
> > > The default value would be "yes" for all types unless stated
> > > otherwise. This is also leaves room for future extensions (maybe
> > > file=clean to reclaim clean file pages only?). Interested to hear your
> > > thoughts on this!
> >
> > The question to answer is do you want the code which is determining
> > the balance of scanning be a part of the interface?
> >
> > If not, I'd stick with explicitly specifying a type of memory to scan
> > (and the "I don't care" mode, where you simply ask to reclaim X bytes).
> >
> > Otherwise you need to describe how the artificial memory pressure will
> > be distributed over different memory types. And with time it might
> > start being significantly different to what the generic reclaim code does,
> > because the reclaim path is free to do what's better, there are no
> > user-visible guarantees.
> 
> My understanding is that your question is about the swappiness
> argument, and I agree it can get complicated. I am on board with
> explicitly specifying the type(s) to reclaim. I think an interface
> with one argument per type (whitelist/blacklist approach) could be
> more flexible in specifying multiple types per invocation (smaller
> race window between reading usages and writing to memory.reclaim), and
> has room for future extensions (e.g. file=clean). However, if you
> still think a type=file/anon/slab parameter is better we can also go
> with this.

If you allow more than one type, how would you balance between them?
E.g. in your example:
     $ echo "200M file=no anon=yes slab=yes" > memory.reclaim
How much slab and anonymous memory will be reclaimed? 100M and 100M?
Probably not (we don't balance slabs with other types of the memory).
And if not, the interface becomes very vague: all we can guarantee
is that *some* pressure will be applied on both anon and slab.

My point is that the interface should have a deterministic behavior
and not rely on the current state of the memory pressure balancing
heuristic. It can be likely done in different ways, I don't have
a strong opinion here.

Thanks!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
  2022-05-17 20:45           ` Roman Gushchin
@ 2022-05-19  5:17             ` Wei Xu
  -1 siblings, 0 replies; 28+ messages in thread
From: Wei Xu @ 2022-05-19  5:17 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Yosry Ahmed, Johannes Weiner, Michal Hocko, Shakeel Butt,
	Andrew Morton, David Rientjes, Cgroups, Tejun Heo, Linux-MM,
	Yu Zhao, Greg Thelen, Chen Wandun

On Tue, May 17, 2022 at 1:45 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> On Tue, May 17, 2022 at 01:11:13PM -0700, Yosry Ahmed wrote:
> > On Tue, May 17, 2022 at 12:49 PM Roman Gushchin
> > <roman.gushchin@linux.dev> wrote:
> > >
> > > On Tue, May 17, 2022 at 11:13:10AM -0700, Yosry Ahmed wrote:
> > > > On Tue, May 17, 2022 at 9:05 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > > > >
> > > > > On Mon, May 16, 2022 at 03:29:42PM -0700, Yosry Ahmed wrote:
> > > > > > The discussions on the patch series [1] to add memory.reclaim has
> > > > > > shown that it is desirable to add an argument to control the type of
> > > > > > memory being reclaimed by invoked proactive reclaim using
> > > > > > memory.reclaim.
> > > > > >
> > > > > > I am proposing adding a swappiness optional argument to the interface.
> > > > > > If set, it overwrites vm.swappiness and per-memcg swappiness. This
> > > > > > provides a way to enforce user policy on a stateless per-reclaim
> > > > > > basis. We can make policy decisions to perform reclaim differently for
> > > > > > tasks of different app classes based on their individual QoS needs. It
> > > > > > also helps for use cases when particularly page cache is high and we
> > > > > > want to mainly hit that without swapping out.
> > > > > >
> > > > > > The interface would be something like this (utilizing the nested-keyed
> > > > > > interface we documented earlier):
> > > > > >
> > > > > > $ echo "200M swappiness=30" > memory.reclaim
> > > > >
> > > > > What are the anticipated use cases except swappiness == 0 and
> > > > > swappiness == system_default?
> > > > >
> > > > > IMO it's better to allow specifying the type of memory to reclaim,
> > > > > e.g. type="file"/"anon"/"slab", it's a way more clear what to expect.
> > > >
> > > > I imagined swappiness would give user space flexibility to reclaim a
> > > > ratio of file vs. anon as it sees fit based on app class or userspace
> > > > policy, but I agree that the guarantees of swappiness are weak and we
> > > > might want an explicit argument that directly controls the return
> > > > value of get_scan_count() or whether or not we call shrink_slab(). My
> > > > fear is that this interface may be less flexible, for example if we
> > > > only want to avoid reclaiming file pages, but we are fine with anon or
> > > > slab.
> > > > Maybe in the future we will have a new type of memory to
> > > > reclaim, does it get implicitly reclaimed when other types are
> > > > specified or not?
> > > >
> > > > Maybe we can use one argument per type instead? E.g.
> > > >     $ echo "200M file=no anon=yes slab=yes" > memory.reclaim
> > > >
> > > > The default value would be "yes" for all types unless stated
> > > > otherwise. This is also leaves room for future extensions (maybe
> > > > file=clean to reclaim clean file pages only?). Interested to hear your
> > > > thoughts on this!
> > >
> > > The question to answer is do you want the code which is determining
> > > the balance of scanning be a part of the interface?
> > >
> > > If not, I'd stick with explicitly specifying a type of memory to scan
> > > (and the "I don't care" mode, where you simply ask to reclaim X bytes).
> > >
> > > Otherwise you need to describe how the artificial memory pressure will
> > > be distributed over different memory types. And with time it might
> > > start being significantly different to what the generic reclaim code does,
> > > because the reclaim path is free to do what's better, there are no
> > > user-visible guarantees.
> >
> > My understanding is that your question is about the swappiness
> > argument, and I agree it can get complicated. I am on board with
> > explicitly specifying the type(s) to reclaim. I think an interface
> > with one argument per type (whitelist/blacklist approach) could be
> > more flexible in specifying multiple types per invocation (smaller
> > race window between reading usages and writing to memory.reclaim), and
> > has room for future extensions (e.g. file=clean). However, if you
> > still think a type=file/anon/slab parameter is better we can also go
> > with this.
>
> If you allow more than one type, how would you balance between them?
> E.g. in your example:
>      $ echo "200M file=no anon=yes slab=yes" > memory.reclaim
> How much slab and anonymous memory will be reclaimed? 100M and 100M?
> Probably not (we don't balance slabs with other types of the memory).
> And if not, the interface becomes very vague: all we can guarantee
> is that *some* pressure will be applied on both anon and slab.
>
> My point is that the interface should have a deterministic behavior
> and not rely on the current state of the memory pressure balancing
> heuristic. It can be likely done in different ways, I don't have
> a strong opinion here.

I agree that the interface should have a clearly defined semantics and
also like your proposal of just specifying a page type (e..g
type=file/anon) to reclaim.

> Thanks!


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-19  5:17             ` Wei Xu
  0 siblings, 0 replies; 28+ messages in thread
From: Wei Xu @ 2022-05-19  5:17 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Yosry Ahmed, Johannes Weiner, Michal Hocko, Shakeel Butt,
	Andrew Morton, David Rientjes, Cgroups, Tejun Heo, Linux-MM,
	Yu Zhao, Greg Thelen, Chen Wandun

On Tue, May 17, 2022 at 1:45 PM Roman Gushchin <roman.gushchin-fxUVXftIFDnyG1zEObXtfA@public.gmane.org> wrote:
>
> On Tue, May 17, 2022 at 01:11:13PM -0700, Yosry Ahmed wrote:
> > On Tue, May 17, 2022 at 12:49 PM Roman Gushchin
> > <roman.gushchin-fxUVXftIFDnyG1zEObXtfA@public.gmane.org> wrote:
> > >
> > > On Tue, May 17, 2022 at 11:13:10AM -0700, Yosry Ahmed wrote:
> > > > On Tue, May 17, 2022 at 9:05 AM Roman Gushchin <roman.gushchin-fxUVXftIFDnyG1zEObXtfA@public.gmane.org> wrote:
> > > > >
> > > > > On Mon, May 16, 2022 at 03:29:42PM -0700, Yosry Ahmed wrote:
> > > > > > The discussions on the patch series [1] to add memory.reclaim has
> > > > > > shown that it is desirable to add an argument to control the type of
> > > > > > memory being reclaimed by invoked proactive reclaim using
> > > > > > memory.reclaim.
> > > > > >
> > > > > > I am proposing adding a swappiness optional argument to the interface.
> > > > > > If set, it overwrites vm.swappiness and per-memcg swappiness. This
> > > > > > provides a way to enforce user policy on a stateless per-reclaim
> > > > > > basis. We can make policy decisions to perform reclaim differently for
> > > > > > tasks of different app classes based on their individual QoS needs. It
> > > > > > also helps for use cases when particularly page cache is high and we
> > > > > > want to mainly hit that without swapping out.
> > > > > >
> > > > > > The interface would be something like this (utilizing the nested-keyed
> > > > > > interface we documented earlier):
> > > > > >
> > > > > > $ echo "200M swappiness=30" > memory.reclaim
> > > > >
> > > > > What are the anticipated use cases except swappiness == 0 and
> > > > > swappiness == system_default?
> > > > >
> > > > > IMO it's better to allow specifying the type of memory to reclaim,
> > > > > e.g. type="file"/"anon"/"slab", it's a way more clear what to expect.
> > > >
> > > > I imagined swappiness would give user space flexibility to reclaim a
> > > > ratio of file vs. anon as it sees fit based on app class or userspace
> > > > policy, but I agree that the guarantees of swappiness are weak and we
> > > > might want an explicit argument that directly controls the return
> > > > value of get_scan_count() or whether or not we call shrink_slab(). My
> > > > fear is that this interface may be less flexible, for example if we
> > > > only want to avoid reclaiming file pages, but we are fine with anon or
> > > > slab.
> > > > Maybe in the future we will have a new type of memory to
> > > > reclaim, does it get implicitly reclaimed when other types are
> > > > specified or not?
> > > >
> > > > Maybe we can use one argument per type instead? E.g.
> > > >     $ echo "200M file=no anon=yes slab=yes" > memory.reclaim
> > > >
> > > > The default value would be "yes" for all types unless stated
> > > > otherwise. This is also leaves room for future extensions (maybe
> > > > file=clean to reclaim clean file pages only?). Interested to hear your
> > > > thoughts on this!
> > >
> > > The question to answer is do you want the code which is determining
> > > the balance of scanning be a part of the interface?
> > >
> > > If not, I'd stick with explicitly specifying a type of memory to scan
> > > (and the "I don't care" mode, where you simply ask to reclaim X bytes).
> > >
> > > Otherwise you need to describe how the artificial memory pressure will
> > > be distributed over different memory types. And with time it might
> > > start being significantly different to what the generic reclaim code does,
> > > because the reclaim path is free to do what's better, there are no
> > > user-visible guarantees.
> >
> > My understanding is that your question is about the swappiness
> > argument, and I agree it can get complicated. I am on board with
> > explicitly specifying the type(s) to reclaim. I think an interface
> > with one argument per type (whitelist/blacklist approach) could be
> > more flexible in specifying multiple types per invocation (smaller
> > race window between reading usages and writing to memory.reclaim), and
> > has room for future extensions (e.g. file=clean). However, if you
> > still think a type=file/anon/slab parameter is better we can also go
> > with this.
>
> If you allow more than one type, how would you balance between them?
> E.g. in your example:
>      $ echo "200M file=no anon=yes slab=yes" > memory.reclaim
> How much slab and anonymous memory will be reclaimed? 100M and 100M?
> Probably not (we don't balance slabs with other types of the memory).
> And if not, the interface becomes very vague: all we can guarantee
> is that *some* pressure will be applied on both anon and slab.
>
> My point is that the interface should have a deterministic behavior
> and not rely on the current state of the memory pressure balancing
> heuristic. It can be likely done in different ways, I don't have
> a strong opinion here.

I agree that the interface should have a clearly defined semantics and
also like your proposal of just specifying a page type (e..g
type=file/anon) to reclaim.

> Thanks!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-19  5:44         ` Wei Xu
  0 siblings, 0 replies; 28+ messages in thread
From: Wei Xu @ 2022-05-19  5:44 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Yosry Ahmed, Michal Hocko, Shakeel Butt, Andrew Morton,
	David Rientjes, Roman Gushchin, Cgroups, Tejun Heo, Linux-MM,
	Yu Zhao, Greg Thelen, Chen Wandun

On Tue, May 17, 2022 at 1:06 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> Hi Yosry,
>
> On Tue, May 17, 2022 at 11:06:36AM -0700, Yosry Ahmed wrote:
> > On Mon, May 16, 2022 at 11:56 PM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Mon 16-05-22 15:29:42, Yosry Ahmed wrote:
> > > > The discussions on the patch series [1] to add memory.reclaim has
> > > > shown that it is desirable to add an argument to control the type of
> > > > memory being reclaimed by invoked proactive reclaim using
> > > > memory.reclaim.
> > > >
> > > > I am proposing adding a swappiness optional argument to the interface.
> > > > If set, it overwrites vm.swappiness and per-memcg swappiness. This
> > > > provides a way to enforce user policy on a stateless per-reclaim
> > > > basis. We can make policy decisions to perform reclaim differently for
> > > > tasks of different app classes based on their individual QoS needs. It
> > > > also helps for use cases when particularly page cache is high and we
> > > > want to mainly hit that without swapping out.
> > >
> > > Can you be more specific about the usecase please? Also how do you
> >
> > For example for a class of applications it may be known that
> > reclaiming one type of pages anon/file is more profitable or will
> > incur an overhead, based on userspace knowledge of the nature of the
> > app.
>
> I want to make sure I understand what you're trying to correct for
> with this bias. Could you expand some on what you mean by profitable?
>
> The way the kernel thinks today is that importance of any given page
> is its access frequency times the cost of paging it. swappiness exists
> to recognize differences in the second part: the cost involved in
> swapping a page vs the cost of a file cache miss.
>
> For example, page A is accessed 10 times more frequently than B, but B
> is 10 times more expensive to refault/swapin. Combining that, they
> should be roughly equal reclaim candidates.
>
> This is the same with the seek parameter of slab shrinkers: some
> objects are more expensive to recreate than others. Once corrected for
> that, presence of reference bits can be interpreted on an even level.
>
> While access frequency is clearly a workload property, the cost of
> refaulting is conventionally not - let alone a per-reclaim property!
>
> If I understand you correctly, you're saying that the backing type of
> a piece of memory can say something about the importance of the data
> within. Something that goes beyond the work of recreating it.
>
> Is that true or am I misreading this?
>
> If that's your claim, isn't that, if it happens, mostly incidental?
>
> For example, in our fleet we used to copy executable text into
> anonymous memory to get THP backing. With file THP support in the
> kernel, the text is back in cache. The importance of the memory
> *contents* stayed the same. The backing storage changed, but beyond
> that the anon/file distinction doesn't mean anything.
>
> Another example. Probably one of the most common workload structures
> is text, heap, logging/startup/error handling: hot file, warm anon,
> cold file. How does prioritizing either file or anon apply to this?
>
> Maybe I'm misunderstanding and this IS about per-workload backing
> types? Maybe the per-cgroup swapfiles that you guys are using?
>
> > If most of what an app use for example is anon/tmpfs then it might
> > be better to explicitly ask the kernel to reclaim anon, and to avoid
> > reclaiming file pages in order not to hurt the file cache
> > performance.
>
> Hm.
>
> Reclaim ages those pools based on their size, so a dominant anon set
> should receive more pressure than a small file set. I can see two
> options why this doesn't produce the desired results:
>
> 1) Reclaim is broken and doesn't allocate scan rates right, or
>
> 2) Access frequency x refault cost alone is not a satisfactory
>    predictor for the value of any given page.
>
> Can you see another?
>
> I can sort of see the argument for 2), because it can be workload
> dependent: a 50ms refault in a single-threaded part of the program is
> likely more disruptive than the same refault in an asynchronous worker
> thread. This is a factor we're not really taking into account today.
>
> But I don't think an anon/file bias will capture this coefficient?

It essentially provides the userspace proactive reclaimer an ability
to define its own reclaim policy by adding an argument to specify
which type of pages to reclaim via memory.reclaim.

Even though the page type (file vs anon) doesn't always accurately
reflect the performance impact of a page, the separation of different
types of pages is still meaningful w.r.t reclaim.

The reclaim costs of anon and file pages are different. With zswap,
anon pages can be reclaimed via memory compression, which doesn't
involve I/Os, but reclaiming dirty file pages needs I/O for writeback.

The access patterns of anon and file pages are also different: Anon
pages are mostly mapped and accessed directly by CPU, whereas file
pages are often accessed via read/write syscalls. A single accessed
(young) bit can carry very different performance weights for different
types of pages.

Because anon/tmpfs pages account for the vast majority of memory usage
in Google data centers and our proactive reclaim algorithm is tuned
only for anon pages, we'd like to have the option to only proactively
reclaim anon pages.

It is not desirable to set the global vm.swappiness to disable file
page reclaim because we still want to use the kernel reclaimer to
reclaim file pages when proactive reclaimer fails to keep up with the
memory demand.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-19  5:44         ` Wei Xu
  0 siblings, 0 replies; 28+ messages in thread
From: Wei Xu @ 2022-05-19  5:44 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Yosry Ahmed, Michal Hocko, Shakeel Butt, Andrew Morton,
	David Rientjes, Roman Gushchin, Cgroups, Tejun Heo, Linux-MM,
	Yu Zhao, Greg Thelen, Chen Wandun

On Tue, May 17, 2022 at 1:06 PM Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote:
>
> Hi Yosry,
>
> On Tue, May 17, 2022 at 11:06:36AM -0700, Yosry Ahmed wrote:
> > On Mon, May 16, 2022 at 11:56 PM Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> wrote:
> > >
> > > On Mon 16-05-22 15:29:42, Yosry Ahmed wrote:
> > > > The discussions on the patch series [1] to add memory.reclaim has
> > > > shown that it is desirable to add an argument to control the type of
> > > > memory being reclaimed by invoked proactive reclaim using
> > > > memory.reclaim.
> > > >
> > > > I am proposing adding a swappiness optional argument to the interface.
> > > > If set, it overwrites vm.swappiness and per-memcg swappiness. This
> > > > provides a way to enforce user policy on a stateless per-reclaim
> > > > basis. We can make policy decisions to perform reclaim differently for
> > > > tasks of different app classes based on their individual QoS needs. It
> > > > also helps for use cases when particularly page cache is high and we
> > > > want to mainly hit that without swapping out.
> > >
> > > Can you be more specific about the usecase please? Also how do you
> >
> > For example for a class of applications it may be known that
> > reclaiming one type of pages anon/file is more profitable or will
> > incur an overhead, based on userspace knowledge of the nature of the
> > app.
>
> I want to make sure I understand what you're trying to correct for
> with this bias. Could you expand some on what you mean by profitable?
>
> The way the kernel thinks today is that importance of any given page
> is its access frequency times the cost of paging it. swappiness exists
> to recognize differences in the second part: the cost involved in
> swapping a page vs the cost of a file cache miss.
>
> For example, page A is accessed 10 times more frequently than B, but B
> is 10 times more expensive to refault/swapin. Combining that, they
> should be roughly equal reclaim candidates.
>
> This is the same with the seek parameter of slab shrinkers: some
> objects are more expensive to recreate than others. Once corrected for
> that, presence of reference bits can be interpreted on an even level.
>
> While access frequency is clearly a workload property, the cost of
> refaulting is conventionally not - let alone a per-reclaim property!
>
> If I understand you correctly, you're saying that the backing type of
> a piece of memory can say something about the importance of the data
> within. Something that goes beyond the work of recreating it.
>
> Is that true or am I misreading this?
>
> If that's your claim, isn't that, if it happens, mostly incidental?
>
> For example, in our fleet we used to copy executable text into
> anonymous memory to get THP backing. With file THP support in the
> kernel, the text is back in cache. The importance of the memory
> *contents* stayed the same. The backing storage changed, but beyond
> that the anon/file distinction doesn't mean anything.
>
> Another example. Probably one of the most common workload structures
> is text, heap, logging/startup/error handling: hot file, warm anon,
> cold file. How does prioritizing either file or anon apply to this?
>
> Maybe I'm misunderstanding and this IS about per-workload backing
> types? Maybe the per-cgroup swapfiles that you guys are using?
>
> > If most of what an app use for example is anon/tmpfs then it might
> > be better to explicitly ask the kernel to reclaim anon, and to avoid
> > reclaiming file pages in order not to hurt the file cache
> > performance.
>
> Hm.
>
> Reclaim ages those pools based on their size, so a dominant anon set
> should receive more pressure than a small file set. I can see two
> options why this doesn't produce the desired results:
>
> 1) Reclaim is broken and doesn't allocate scan rates right, or
>
> 2) Access frequency x refault cost alone is not a satisfactory
>    predictor for the value of any given page.
>
> Can you see another?
>
> I can sort of see the argument for 2), because it can be workload
> dependent: a 50ms refault in a single-threaded part of the program is
> likely more disruptive than the same refault in an asynchronous worker
> thread. This is a factor we're not really taking into account today.
>
> But I don't think an anon/file bias will capture this coefficient?

It essentially provides the userspace proactive reclaimer an ability
to define its own reclaim policy by adding an argument to specify
which type of pages to reclaim via memory.reclaim.

Even though the page type (file vs anon) doesn't always accurately
reflect the performance impact of a page, the separation of different
types of pages is still meaningful w.r.t reclaim.

The reclaim costs of anon and file pages are different. With zswap,
anon pages can be reclaimed via memory compression, which doesn't
involve I/Os, but reclaiming dirty file pages needs I/O for writeback.

The access patterns of anon and file pages are also different: Anon
pages are mostly mapped and accessed directly by CPU, whereas file
pages are often accessed via read/write syscalls. A single accessed
(young) bit can carry very different performance weights for different
types of pages.

Because anon/tmpfs pages account for the vast majority of memory usage
in Google data centers and our proactive reclaim algorithm is tuned
only for anon pages, we'd like to have the option to only proactively
reclaim anon pages.

It is not desirable to set the global vm.swappiness to disable file
page reclaim because we still want to use the kernel reclaimer to
reclaim file pages when proactive reclaimer fails to keep up with the
memory demand.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-19  8:51           ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2022-05-19  8:51 UTC (permalink / raw)
  To: Wei Xu
  Cc: Johannes Weiner, Yosry Ahmed, Shakeel Butt, Andrew Morton,
	David Rientjes, Roman Gushchin, Cgroups, Tejun Heo, Linux-MM,
	Yu Zhao, Greg Thelen, Chen Wandun

On Wed 18-05-22 22:44:13, Wei Xu wrote:
> On Tue, May 17, 2022 at 1:06 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
[...]
> > But I don't think an anon/file bias will capture this coefficient?
> 
> It essentially provides the userspace proactive reclaimer an ability
> to define its own reclaim policy by adding an argument to specify
> which type of pages to reclaim via memory.reclaim.

I am not sure the swappiness is really a proper interface for that.
Historically this tunable has changed behavior several times and the
reclaim algorithm is free to ignore it completely in many cases. If you
want to build a userspace reclaim policy, then it really has to have a
predictable and stable behavior. That would mean that the semantic would
have to be much stronger than the global vm_swappiness.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-19  8:51           ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2022-05-19  8:51 UTC (permalink / raw)
  To: Wei Xu
  Cc: Johannes Weiner, Yosry Ahmed, Shakeel Butt, Andrew Morton,
	David Rientjes, Roman Gushchin, Cgroups, Tejun Heo, Linux-MM,
	Yu Zhao, Greg Thelen, Chen Wandun

On Wed 18-05-22 22:44:13, Wei Xu wrote:
> On Tue, May 17, 2022 at 1:06 PM Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote:
[...]
> > But I don't think an anon/file bias will capture this coefficient?
> 
> It essentially provides the userspace proactive reclaimer an ability
> to define its own reclaim policy by adding an argument to specify
> which type of pages to reclaim via memory.reclaim.

I am not sure the swappiness is really a proper interface for that.
Historically this tunable has changed behavior several times and the
reclaim algorithm is free to ignore it completely in many cases. If you
want to build a userspace reclaim policy, then it really has to have a
predictable and stable behavior. That would mean that the semantic would
have to be much stronger than the global vm_swappiness.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-19 15:29             ` Wei Xu
  0 siblings, 0 replies; 28+ messages in thread
From: Wei Xu @ 2022-05-19 15:29 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Yosry Ahmed, Shakeel Butt, Andrew Morton,
	David Rientjes, Roman Gushchin, Cgroups, Tejun Heo, Linux-MM,
	Yu Zhao, Greg Thelen, Chen Wandun

On Thu, May 19, 2022 at 1:51 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 18-05-22 22:44:13, Wei Xu wrote:
> > On Tue, May 17, 2022 at 1:06 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
> [...]
> > > But I don't think an anon/file bias will capture this coefficient?
> >
> > It essentially provides the userspace proactive reclaimer an ability
> > to define its own reclaim policy by adding an argument to specify
> > which type of pages to reclaim via memory.reclaim.
>
> I am not sure the swappiness is really a proper interface for that.
> Historically this tunable has changed behavior several times and the
> reclaim algorithm is free to ignore it completely in many cases. If you
> want to build a userspace reclaim policy, then it really has to have a
> predictable and stable behavior. That would mean that the semantic would
> have to be much stronger than the global vm_swappiness.

I agree. As what I replied to Roman's comments earlier, it is cleaner
to just specify the type of pages to reclaim.

> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-19 15:29             ` Wei Xu
  0 siblings, 0 replies; 28+ messages in thread
From: Wei Xu @ 2022-05-19 15:29 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Yosry Ahmed, Shakeel Butt, Andrew Morton,
	David Rientjes, Roman Gushchin, Cgroups, Tejun Heo, Linux-MM,
	Yu Zhao, Greg Thelen, Chen Wandun

On Thu, May 19, 2022 at 1:51 AM Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> wrote:
>
> On Wed 18-05-22 22:44:13, Wei Xu wrote:
> > On Tue, May 17, 2022 at 1:06 PM Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote:
> [...]
> > > But I don't think an anon/file bias will capture this coefficient?
> >
> > It essentially provides the userspace proactive reclaimer an ability
> > to define its own reclaim policy by adding an argument to specify
> > which type of pages to reclaim via memory.reclaim.
>
> I am not sure the swappiness is really a proper interface for that.
> Historically this tunable has changed behavior several times and the
> reclaim algorithm is free to ignore it completely in many cases. If you
> want to build a userspace reclaim policy, then it really has to have a
> predictable and stable behavior. That would mean that the semantic would
> have to be much stronger than the global vm_swappiness.

I agree. As what I replied to Roman's comments earlier, it is cleaner
to just specify the type of pages to reclaim.

> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-19 18:24             ` Yosry Ahmed
  0 siblings, 0 replies; 28+ messages in thread
From: Yosry Ahmed @ 2022-05-19 18:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Wei Xu, Johannes Weiner, Shakeel Butt, Andrew Morton,
	David Rientjes, Roman Gushchin, Cgroups, Tejun Heo, Linux-MM,
	Yu Zhao, Greg Thelen, Chen Wandun

On Thu, May 19, 2022 at 1:51 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 18-05-22 22:44:13, Wei Xu wrote:
> > On Tue, May 17, 2022 at 1:06 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
> [...]
> > > But I don't think an anon/file bias will capture this coefficient?
> >
> > It essentially provides the userspace proactive reclaimer an ability
> > to define its own reclaim policy by adding an argument to specify
> > which type of pages to reclaim via memory.reclaim.
>
> I am not sure the swappiness is really a proper interface for that.
> Historically this tunable has changed behavior several times and the
> reclaim algorithm is free to ignore it completely in many cases. If you
> want to build a userspace reclaim policy, then it really has to have a
> predictable and stable behavior. That would mean that the semantic would
> have to be much stronger than the global vm_swappiness.

Agreed as well. I will work on an interface similar to what Roman
suggested (type=file/anon/slab).
Thanks everyone for participating in this discussion!

> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Add swappiness argument to memory.reclaim
@ 2022-05-19 18:24             ` Yosry Ahmed
  0 siblings, 0 replies; 28+ messages in thread
From: Yosry Ahmed @ 2022-05-19 18:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Wei Xu, Johannes Weiner, Shakeel Butt, Andrew Morton,
	David Rientjes, Roman Gushchin, Cgroups, Tejun Heo, Linux-MM,
	Yu Zhao, Greg Thelen, Chen Wandun

On Thu, May 19, 2022 at 1:51 AM Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> wrote:
>
> On Wed 18-05-22 22:44:13, Wei Xu wrote:
> > On Tue, May 17, 2022 at 1:06 PM Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote:
> [...]
> > > But I don't think an anon/file bias will capture this coefficient?
> >
> > It essentially provides the userspace proactive reclaimer an ability
> > to define its own reclaim policy by adding an argument to specify
> > which type of pages to reclaim via memory.reclaim.
>
> I am not sure the swappiness is really a proper interface for that.
> Historically this tunable has changed behavior several times and the
> reclaim algorithm is free to ignore it completely in many cases. If you
> want to build a userspace reclaim policy, then it really has to have a
> predictable and stable behavior. That would mean that the semantic would
> have to be much stronger than the global vm_swappiness.

Agreed as well. I will work on an interface similar to what Roman
suggested (type=file/anon/slab).
Thanks everyone for participating in this discussion!

> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-05-19 18:25 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-16 22:29 [RFC] Add swappiness argument to memory.reclaim Yosry Ahmed
2022-05-16 22:29 ` Yosry Ahmed
2022-05-17  6:56 ` Michal Hocko
2022-05-17  6:56   ` Michal Hocko
2022-05-17 18:06   ` Yosry Ahmed
2022-05-17 18:06     ` Yosry Ahmed
2022-05-17 20:06     ` Johannes Weiner
2022-05-17 20:06       ` Johannes Weiner
2022-05-19  5:44       ` Wei Xu
2022-05-19  5:44         ` Wei Xu
2022-05-19  8:51         ` Michal Hocko
2022-05-19  8:51           ` Michal Hocko
2022-05-19 15:29           ` Wei Xu
2022-05-19 15:29             ` Wei Xu
2022-05-19 18:24           ` Yosry Ahmed
2022-05-19 18:24             ` Yosry Ahmed
2022-05-17 16:05 ` Roman Gushchin
2022-05-17 16:05   ` Roman Gushchin
2022-05-17 18:13   ` Yosry Ahmed
2022-05-17 18:13     ` Yosry Ahmed
2022-05-17 19:49     ` Roman Gushchin
2022-05-17 19:49       ` Roman Gushchin
2022-05-17 20:11       ` Yosry Ahmed
2022-05-17 20:11         ` Yosry Ahmed
2022-05-17 20:45         ` Roman Gushchin
2022-05-17 20:45           ` Roman Gushchin
2022-05-19  5:17           ` Wei Xu
2022-05-19  5:17             ` Wei Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.