Linux-Doc Archive on lore.kernel.org
 help / color / Atom feed
From: Ivan Teterevkov <ivan.teterevkov@nutanix.com>
To: Chris Down <chris@chrisdown.name>, Matthew Wilcox <willy@infradead.org>
Cc: David Rientjes <rientjes@google.com>,
	"corbet@lwn.net" <corbet@lwn.net>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"mchehab+samsung@kernel.org" <mchehab+samsung@kernel.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"jpoimboe@redhat.com" <jpoimboe@redhat.com>,
	"pawan.kumar.gupta@linux.intel.com" 
	<pawan.kumar.gupta@linux.intel.com>,
	"jgross@suse.com" <jgross@suse.com>,
	"oneukum@suse.com" <oneukum@suse.com>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: RE: [PATCH] mm/vmscan: add vm_swappiness configuration knobs
Date: Fri, 13 Mar 2020 10:49:32 +0000
Message-ID: <BL0PR02MB56011828432D343371088516E9FA0@BL0PR02MB5601.namprd02.prod.outlook.com> (raw)
In-Reply-To: <20200312140326.GA1701917@chrisdown.name>

On Thurs, 12 Mar 2020, Chris Down wrote:

> Matthew Wilcox writes:
> >On Thu, Mar 12, 2020 at 12:48:22PM +0000, Ivan Teterevkov wrote:
> >> This is exactly what I'm trying to avoid: in some distros there is no
> >> way to tackle the configuration early enough, e.g. in systemd-based
> >> systems the systemd is the process that starts first and arranges
> >> memcg in a way it's configured, but unfortunately, it doesn't offer the
> swappiness knob.
> >
> >This sounds like a systemd problem.  Have you talked to the systemd
> >people about fixing it in systemd?
> 
> Hi there ;-)
> 
> In general most of us maintaining cgroups in systemd run with cgroup v2, so this
> isn't a problem we run into in production. The swappiness controls in general
> don't make a whole lot of sense being distributed hierarchically, so they've been
> phased out entirely in cgroup v2.
> 
> If there had been a patch years ago implementing this in systemd we'd probably
> have accepted it, but cgroup v1 is dying and I am really not in favour of adding
> more code to massage its rough edges. We already have enough problems
> generated by it already.
> 
> However, the following kludge in tmpfiles.d should work to solve your
> immediate
> problem:
> 
> 	w /sys/fs/cgroup/memory/system.slice/memory.swappiness - - - - value
> 
> Taking my systemd hat off and putting my -mm hat on: let's not add more hacky
> APIs at cgroup v1's behest, or we'll be here until we're pushing up the daisies.
> 
> Thanks,
> 
> Chris

The above approach doesn't work for me in el7 with systemd 219 or ubuntu 19
with systemd 242 because presumably the systemd-tmpfiles services start too
late. Please find the snippet at the bottom of the email.

The only approach that seems to work is to set up a service to run:

$ find /sys/fs/cgroup/memory/ -name memory.swappiness | while read -r name; do echo 0 > "${name}"; done

I think this is quite ugly because there might be a race condition with
the systemd that could be creating the slices while the find is running.
One could suggest constraining the depth and going from top to the
bottom of the memcg but this still looks inherently unstable.

This is why I ended up with the vm_swappiness patch (which I don't
quite like myself) but this appears to be the only rock solid option
unless I've missed anything obvious. There is no doubt that cgroup v1
is due for replacement and vm_swappiness is frightening but they still
have certain advantages to employ until they are history.

$ systemctl --version
systemd 242 (242)
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid

$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=19.10
DISTRIB_CODENAME=eoan
DISTRIB_DESCRIPTION="Ubuntu 19.10"

$ uname -a
Linux ubuntu 5.6.0-rc5-custom #1 SMP Wed Mar 11 14:59:15 GMT 2020 x86_64 x86_64 x86_64 GNU/Linux

$ tail -1 /etc/sysctl.conf
vm.swappiness=10

$ cat /etc/tmpfiles.d/10-swap.conf
w /sys/fs/cgroup/memory/system.slice/memory.swappiness - - - - 20
w /sys/fs/cgroup/memory/user.slice/memory.swappiness   - - - - 30

$ find /sys/fs/cgroup/memory -name memory.swappiness | while read -r name; do cat "${name}"; done | sort | uniq -c
      1 10
     32 20
      6 30
     21 60

$ find /sys/fs/cgroup/memory -name memory.swappiness | while read -r name; do echo "${name}"; cat "${name}"; done | grep --before-context=1 60
/sys/fs/cgroup/memory/system.slice/systemd-udevd.service/memory.swappiness
60
--
/sys/fs/cgroup/memory/system.slice/sys-fs-fuse-connections.mount/memory.swappiness
60
/sys/fs/cgroup/memory/system.slice/snap-gnome\x2d3\x2d28\x2d1804-116.mount/memory.swappiness
60
/sys/fs/cgroup/memory/system.slice/snap-gnome\x2dlogs-81.mount/memory.swappiness
60
/sys/fs/cgroup/memory/system.slice/sys-kernel-config.mount/memory.swappiness
60
--
/sys/fs/cgroup/memory/system.slice/snap-core-7917.mount/memory.swappiness
60
/sys/fs/cgroup/memory/system.slice/sys-kernel-debug.mount/memory.swappiness
60
--
/sys/fs/cgroup/memory/system.slice/snap-gnome\x2dcharacters-399.mount/memory.swappiness
60
/sys/fs/cgroup/memory/system.slice/swapfile.swap/memory.swappiness
60
--
/sys/fs/cgroup/memory/system.slice/snap-gtk\x2dcommon\x2dthemes-1440.mount/memory.swappiness
60
/sys/fs/cgroup/memory/system.slice/snap-gnome\x2dcharacters-317.mount/memory.swappiness
60
--
/sys/fs/cgroup/memory/system.slice/systemd-journald.service/memory.swappiness
60
--
/sys/fs/cgroup/memory/system.slice/dev-mqueue.mount/memory.swappiness
60
--
/sys/fs/cgroup/memory/system.slice/snap-gtk\x2dcommon\x2dthemes-1353.mount/memory.swappiness
60
/sys/fs/cgroup/memory/system.slice/snap-core-8689.mount/memory.swappiness
60
--
/sys/fs/cgroup/memory/system.slice/snap-gnome\x2d3\x2d28\x2d1804-71.mount/memory.swappiness
60
--
/sys/fs/cgroup/memory/system.slice/snap-core18-1668.mount/memory.swappiness
60
--
/sys/fs/cgroup/memory/system.slice/snap-gnome\x2dcalculator-501.mount/memory.swappiness
60
--
/sys/fs/cgroup/memory/system.slice/dev-hugepages.mount/memory.swappiness
60
--
/sys/fs/cgroup/memory/system.slice/snap-gnome\x2dcalculator-544.mount/memory.swappiness
60
--
/sys/fs/cgroup/memory/system.slice/snap-core18-1223.mount/memory.swappiness
60

  reply index

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-11 17:45 Ivan Teterevkov
2020-03-11 19:31 ` David Rientjes
2020-03-12 12:48   ` Ivan Teterevkov
2020-03-12 13:36     ` Matthew Wilcox
2020-03-12 14:03       ` Chris Down
2020-03-13 10:49         ` Ivan Teterevkov [this message]
2020-03-13 21:50           ` David Rientjes
2020-03-16 16:03             ` Ivan Teterevkov
2020-03-12  9:25 ` Michal Hocko
2020-03-12 12:54   ` Ivan Teterevkov
2020-03-12 13:26     ` Michal Hocko
2020-03-16 14:53       ` Vlastimil Babka
2020-03-16 16:14         ` Ivan Teterevkov
2020-03-17  8:29         ` Michal Hocko
2020-03-17 14:51           ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BL0PR02MB56011828432D343371088516E9FA0@BL0PR02MB5601.namprd02.prod.outlook.com \
    --to=ivan.teterevkov@nutanix.com \
    --cc=akpm@linux-foundation.org \
    --cc=chris@chrisdown.name \
    --cc=corbet@lwn.net \
    --cc=jgross@suse.com \
    --cc=jpoimboe@redhat.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mchehab+samsung@kernel.org \
    --cc=oneukum@suse.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=rientjes@google.com \
    --cc=tglx@linutronix.de \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Doc Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-doc/0 linux-doc/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-doc linux-doc/ https://lore.kernel.org/linux-doc \
		linux-doc@vger.kernel.org
	public-inbox-index linux-doc

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-doc


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git