All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: jingrui <jingrui@huawei.com>, "tj@kernel.org" <tj@kernel.org>,
	Lizefan <lizefan@huawei.com>,
	"mhocko@kernel.org" <mhocko@kernel.org>,
	"vdavydov.dev@gmail.com" <vdavydov.dev@gmail.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	caihaomin <caihaomin@huawei.com>,
	"Weiwei (N)" <wick.wei@huawei.com>, <guro@cmpxchg.org>
Subject: Re: PROBLEM: cgroup cost too much memory when transfer small files to tmpfs
Date: Tue, 21 Jul 2020 11:49:59 -0700	[thread overview]
Message-ID: <20200721184959.GA8266@carbon.DHCP.thefacebook.com> (raw)
In-Reply-To: <20200721174126.GA271870@cmpxchg.org>

On Tue, Jul 21, 2020 at 01:41:26PM -0400, Johannes Weiner wrote:
> On Tue, Jul 21, 2020 at 11:19:52AM +0000, jingrui wrote:
> > Cc: Johannes Weiner <hannes@cmpxchg.org> ; Michal Hocko <mhocko@kernel.org>; Vladimir Davydov <vdavydov.dev@gmail.com>
> > 
> > Thanks.
> > 
> > ---
> > PROBLEM: cgroup cost too much memory when transfer small files to tmpfs.
> > 
> > keywords: cgroup PERCPU/memory cost too much.
> > 
> > description:
> > 
> > We send small files from node-A to node-B tmpfs /tmp directory using sftp. On
> > node-B the systemd configured with pam on like below.
> > 
> > cat /etc/pam.d/password-auth | grep systemd
> > -session     optional      pam_systemd.so
> > 
> > So when transfer a file, a systemd session is created, that means a cgroup is
> > created, then file saved at /tmp will associated with a cgroup object. After
> > file transferred, session and cgroup-dir will be removed, but the file in /tmp
> > still associated with the cgroup object. The PERCPU memory in cgroup/css object
> > cost a lot(about 0.5MB/per-cgroup-object) on 200/cpus machine.
> 
> CC Roman who had a patch series to free all this extended (percpu)
> memory upon cgroup deletion:
> 
> https://lore.kernel.org/patchwork/cover/1050508/
> 
> It looks like it never got merged for some reason.

The mentioned patchset can make the problem less noticeable, but can't solve it completely.
It has never been merged, because the dying cgroup problem was mostly solved by other methods:
slab memory reparenting and various reclaim fixes. So there was no more reason to complicate
the code to release the memcg memory early.

The overhead of creating and destroying a new memory cgroup for a transfer of a small
file will be noticeable anyway. So IMO the solution is to use a single cgroup for all
transfers. I don't know if systemd supports such mode out of the box, but it shouldn't
be hard to add it.

But also I wonder if we need a special tmpfs mount option, something like "noaccount".
Not only for this specific case, but also for the case when tmpfs is extensively
shared between multiple cgroups or if it's used to pass some data from one cgroup
to another, or if we care about the performance more than about the accounting;
in other words for cases where the accounting makes more harm than good.

Thanks!

WARNING: multiple messages have this Message-ID (diff)
From: Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>
To: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: jingrui <jingrui-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>,
	"tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org"
	<tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Lizefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>,
	"mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org"
	<mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	"vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org"
	<vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	"akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org"
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	"linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"
	<linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>,
	"cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	caihaomin <caihaomin-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>,
	"Weiwei (N)" <wick.wei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>,
	guro-druUgvl0LCNAfugRpC6u6w@public.gmane.org
Subject: Re: PROBLEM: cgroup cost too much memory when transfer small files to tmpfs
Date: Tue, 21 Jul 2020 11:49:59 -0700	[thread overview]
Message-ID: <20200721184959.GA8266@carbon.DHCP.thefacebook.com> (raw)
In-Reply-To: <20200721174126.GA271870-druUgvl0LCNAfugRpC6u6w@public.gmane.org>

On Tue, Jul 21, 2020 at 01:41:26PM -0400, Johannes Weiner wrote:
> On Tue, Jul 21, 2020 at 11:19:52AM +0000, jingrui wrote:
> > Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> ; Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>; Vladimir Davydov <vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> > 
> > Thanks.
> > 
> > ---
> > PROBLEM: cgroup cost too much memory when transfer small files to tmpfs.
> > 
> > keywords: cgroup PERCPU/memory cost too much.
> > 
> > description:
> > 
> > We send small files from node-A to node-B tmpfs /tmp directory using sftp. On
> > node-B the systemd configured with pam on like below.
> > 
> > cat /etc/pam.d/password-auth | grep systemd
> > -session     optional      pam_systemd.so
> > 
> > So when transfer a file, a systemd session is created, that means a cgroup is
> > created, then file saved at /tmp will associated with a cgroup object. After
> > file transferred, session and cgroup-dir will be removed, but the file in /tmp
> > still associated with the cgroup object. The PERCPU memory in cgroup/css object
> > cost a lot(about 0.5MB/per-cgroup-object) on 200/cpus machine.
> 
> CC Roman who had a patch series to free all this extended (percpu)
> memory upon cgroup deletion:
> 
> https://lore.kernel.org/patchwork/cover/1050508/
> 
> It looks like it never got merged for some reason.

The mentioned patchset can make the problem less noticeable, but can't solve it completely.
It has never been merged, because the dying cgroup problem was mostly solved by other methods:
slab memory reparenting and various reclaim fixes. So there was no more reason to complicate
the code to release the memcg memory early.

The overhead of creating and destroying a new memory cgroup for a transfer of a small
file will be noticeable anyway. So IMO the solution is to use a single cgroup for all
transfers. I don't know if systemd supports such mode out of the box, but it shouldn't
be hard to add it.

But also I wonder if we need a special tmpfs mount option, something like "noaccount".
Not only for this specific case, but also for the case when tmpfs is extensively
shared between multiple cgroups or if it's used to pass some data from one cgroup
to another, or if we care about the performance more than about the accounting;
in other words for cases where the accounting makes more harm than good.

Thanks!

  reply	other threads:[~2020-07-21 18:51 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-21 11:19 PROBLEM: cgroup cost too much memory when transfer small files to tmpfs jingrui
2020-07-21 11:19 ` jingrui
2020-07-21 14:45 ` Shakeel Butt
2020-07-21 14:45   ` Shakeel Butt
2020-07-21 14:45   ` Shakeel Butt
2020-07-21 17:41 ` Johannes Weiner
2020-07-21 17:41   ` Johannes Weiner
2020-07-21 18:49   ` Roman Gushchin [this message]
2020-07-21 18:49     ` Roman Gushchin
2020-07-21 19:12     ` Shakeel Butt
2020-07-21 19:12       ` Shakeel Butt
2020-07-21 19:12       ` Shakeel Butt
2020-07-21 19:27       ` Roman Gushchin
2020-07-21 19:27         ` Roman Gushchin
2020-07-24  7:55 ` Michal Hocko
2020-07-24  7:55   ` Michal Hocko
2020-07-24  9:35   ` 答复: " jingrui
2020-07-24  9:35     ` jingrui
2020-07-24 11:35     ` Michal Hocko
2020-07-24 11:35       ` Michal Hocko
2020-07-27  3:14       ` jingrui
2020-07-27  3:14         ` jingrui
2020-07-27 13:40         ` 答复: " Fangxiuning (Jack, EulerOS)
2020-07-27 13:40           ` Fangxiuning (Jack, EulerOS)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200721184959.GA8266@carbon.DHCP.thefacebook.com \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=caihaomin@huawei.com \
    --cc=cgroups@vger.kernel.org \
    --cc=guro@cmpxchg.org \
    --cc=hannes@cmpxchg.org \
    --cc=jingrui@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=mhocko@kernel.org \
    --cc=tj@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    --cc=wick.wei@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.