Linux-Doc Archive on lore.kernel.org
 help / color / Atom feed
From: Ivan Teterevkov <ivan.teterevkov@nutanix.com>
To: David Rientjes <rientjes@google.com>
Cc: "corbet@lwn.net" <corbet@lwn.net>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"mchehab+samsung@kernel.org" <mchehab+samsung@kernel.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"jpoimboe@redhat.com" <jpoimboe@redhat.com>,
	"pawan.kumar.gupta@linux.intel.com" 
	<pawan.kumar.gupta@linux.intel.com>,
	"jgross@suse.com" <jgross@suse.com>,
	"oneukum@suse.com" <oneukum@suse.com>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: RE: [PATCH] mm/vmscan: add vm_swappiness configuration knobs
Date: Thu, 12 Mar 2020 12:48:22 +0000
Message-ID: <BL0PR02MB5601808F36BE202813E9D562E9FD0@BL0PR02MB5601.namprd02.prod.outlook.com> (raw)
In-Reply-To: <alpine.DEB.2.21.2003111227230.171292@chino.kir.corp.google.com>

On Wed, 11 Mar 2020, David Rientjes wrote:

> On Wed, 11 Mar 2020, Ivan Teterevkov wrote:
> 
> > This patch adds a couple of knobs:
> >
> > - The configuration option (CONFIG_VM_SWAPPINESS).
> > - The command line parameter (vm_swappiness).
> >
> > The default value is preserved, but now defined by CONFIG_VM_SWAPPINESS.
> >
> > Historically, the default swappiness is set to the well-known value 60,
> > and this works well for the majority of cases. The vm_swappiness is also
> > exposed as the kernel parameter that can be changed at runtime too, e.g.
> > with sysctl.
> >
> > This approach might not suit well some configurations, e.g. systemd-based
> > distros, where systemd is put in charge of the cgroup controllers,
> > including the memory one. In such cases, the default swappiness 60
> > is copied across the cgroup subtrees early at startup, when systemd
> > is arranging the slices for its services, before the sysctl.conf
> > or tmpfiles.d/*.conf changes are applied.
> >
> 
> Seems like something that can be fully handled by an initscript that would
> set the sysctl and then iterate the memcg hierarchy propagating the
> non-default value.  I don't think that's too much of an ask if userspace
> wants to manipulate the swappiness value.
> 

This is exactly what I'm trying to avoid: in some distros there is no way
to tackle the configuration early enough, e.g. in systemd-based systems
the systemd is the process that starts first and arranges memcg in a way
it's configured, but unfortunately, it doesn't offer the swappiness knob.

There could be a script to iterate the memcg later, but there would be a
race condition with the system entity that's put in charge of the memcg
because the configuration can't be changed atomically, e.g. a possible
script could iterate the memcg tree and update each memory.swappiness
while systemd is creating another slice or scope subtree.

> Or maybe we can be more clever: have memcg->swappiness store -1 by default
> unless it is changed by the user explicitly and then have
> mem_cgroup_swappiness() return vm_swappiness for this value.  If the user
> overwrites it, it's intended.
> 

Does it mean that -1 would become a reference to the vm_swappiness
or the parent's memory.swappiness? It sounds interesting and if so then
it would address my issues with the swappiness but would also change
the existing memcg behaviour: if the referred-to value changed, would
the memory.swappiness backed by -1 also change?

> So there are a couple options here but I don't think one of them is to add
> a new config option or kernel command line option.
> 

The vm_swappiness starts its lifespan in the kernel and thus
why not to facilitate it with a simple "constructor" there?

> > One could run a script to traverse the cgroup trees later and set the
> > desired memory.swappiness individually in each occurrence when the runtime
> > is set up, but this would require some amount of work to implement
> > properly. Instead, why not set the default swappiness as early as possible?
> >
> > Signed-off-by: Ivan Teterevkov <ivan.teterevkov@nutanix.com>
> > ---
> >  .../admin-guide/kernel-parameters.txt         |  4 ++++
> >  mm/Kconfig                                    | 10 ++++++++
> >  mm/vmscan.c                                   | 24 ++++++++++++++++++-
> >  3 files changed, 37 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt
> b/Documentation/admin-guide/kernel-parameters.txt
> > index c07815d230bc..5d54a4303522 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -5317,6 +5317,10 @@
> >  			  P	Enable page structure init time poisoning
> >  			  -	Disable all of the above options
> >
> > +	vm_swappiness=	[KNL]
> > +			Sets the default vm_swappiness.
> > +			Ranges from 0 to 100, the default value is 60.
> > +
> >  	vmalloc=nn[KMG]	[KNL,BOOT] Forces the vmalloc area to have an
> exact
> >  			size of <nn>. This can be used to increase the
> >  			minimum size (128MB on x86). It can also be used to
> diff --git a/mm/Kconfig b/mm/Kconfig index ab80933be65f..ec59c19e578e
> 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -739,4 +739,14 @@ config ARCH_HAS_HUGEPD  config
> MAPPING_DIRTY_HELPERS
> >          bool
> >
> > +config VM_SWAPPINESS
> > +	int "Default memory swappiness"
> > +	default 60
> > +	range 0 100
> > +	help
> > +	  Sets the default vm_swappiness, that could be changed later
> > +	  in the runtime, e.g. kernel command line, sysctl, etc.
> > +
> > +	  Higher value means more swappy. Historically, defaults to 60.
> > +
> >  endmenu
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 876370565455..7d2d3550f698 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -163,7 +163,29 @@ struct scan_control {
> >  /*
> >   * From 0 .. 100.  Higher means more swappy.
> >   */
> > -int vm_swappiness = 60;
> > +int vm_swappiness = CONFIG_VM_SWAPPINESS;
> > +
> > +static int __init swappiness_cmdline(char *str) {
> > +	int val, err;
> > +
> > +	if (!str)
> > +		return -EINVAL;
> > +
> > +	err = kstrtoint(str, 10, &val);
> > +	if (err)
> > +		return -EINVAL;
> > +
> > +	if (val < 0 || val > 100)
> > +		return -EINVAL;
> > +
> > +	vm_swappiness = val;
> > +
> > +	return 0;
> > +}
> > +
> > +early_param("vm_swappiness", swappiness_cmdline);
> > +
> >  /*
> >   * The total number of pages which are beyond the high watermark within all
> >   * zones.

  reply index

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-11 17:45 Ivan Teterevkov
2020-03-11 19:31 ` David Rientjes
2020-03-12 12:48   ` Ivan Teterevkov [this message]
2020-03-12 13:36     ` Matthew Wilcox
2020-03-12 14:03       ` Chris Down
2020-03-13 10:49         ` Ivan Teterevkov
2020-03-13 21:50           ` David Rientjes
2020-03-16 16:03             ` Ivan Teterevkov
2020-03-12  9:25 ` Michal Hocko
2020-03-12 12:54   ` Ivan Teterevkov
2020-03-12 13:26     ` Michal Hocko
2020-03-16 14:53       ` Vlastimil Babka
2020-03-16 16:14         ` Ivan Teterevkov
2020-03-17  8:29         ` Michal Hocko
2020-03-17 14:51           ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BL0PR02MB5601808F36BE202813E9D562E9FD0@BL0PR02MB5601.namprd02.prod.outlook.com \
    --to=ivan.teterevkov@nutanix.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=jgross@suse.com \
    --cc=jpoimboe@redhat.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mchehab+samsung@kernel.org \
    --cc=oneukum@suse.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=rientjes@google.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Doc Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-doc/0 linux-doc/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-doc linux-doc/ https://lore.kernel.org/linux-doc \
		linux-doc@vger.kernel.org
	public-inbox-index linux-doc

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-doc


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git