All of lore.kernel.org
 help / color / mirror / Atom feed
From: Muchun Song <songmuchun@bytedance.com>
To: Roman Gushchin <guro@fb.com>
Cc: viro@zeniv.linux.org.uk, Jan Kara <jack@suse.cz>,
	amir73il@gmail.com, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	andrii@kernel.org, Martin KaFai Lau <kafai@fb.com>,
	Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	kpsingh@kernel.org, mingo@redhat.com,
	Peter Zijlstra <peterz@infradead.org>,
	juri.lelli@redhat.com,
	Vincent Guittot <vincent.guittot@linaro.org>,
	dietmar.eggemann@arm.com, Steven Rostedt <rostedt@goodmis.org>,
	Benjamin Segall <bsegall@google.com>,
	mgorman@suse.de, bristot@redhat.com,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Shakeel Butt <shakeelb@google.com>,
	Alex Shi <alex.shi@linux.alibaba.com>,
	Chris Down <chris@chrisdown.name>,
	richard.weiyang@gmail.com, Vlastimil Babka <vbabka@suse.cz>,
	mathieu.desnoyers@efficios.com, posk@google.com,
	Jann Horn <jannh@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	longman@redhat.com, Michel Lespinasse <walken@google.com>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Kees Cook <keescook@chromium.org>,
	krisman@collabora.com, esyr@redhat.com,
	Suren Baghdasaryan <surenb@google.com>,
	Marco Elver <elver@google.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
	Cgroups <cgroups@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Xiongchun duan <duanxiongchun@bytedance.com>
Subject: Re: [External] Re: [PATCH 0/5] Use obj_cgroup APIs to change kmem pages
Date: Tue, 2 Mar 2021 10:50:04 +0800	[thread overview]
Message-ID: <CAMZfGtUzB1duVS+pSEHvB-g6BSQ25mQMvUjopcADx0v2go3Q0g@mail.gmail.com> (raw)
In-Reply-To: <YD2Q5q2HfKXPnDte@carbon.dhcp.thefacebook.com>

On Tue, Mar 2, 2021 at 9:12 AM Roman Gushchin <guro@fb.com> wrote:
>
> Hi Muchun!
>
> On Mon, Mar 01, 2021 at 02:22:22PM +0800, Muchun Song wrote:
> > Since Roman series "The new cgroup slab memory controller" applied. All
> > slab objects are changed via the new APIs of obj_cgroup. This new APIs
> > introduce a struct obj_cgroup instead of using struct mem_cgroup directly
> > to charge slab objects. It prevents long-living objects from pinning the
> > original memory cgroup in the memory. But there are still some corner
> > objects (e.g. allocations larger than order-1 page on SLUB) which are
> > not charged via the API of obj_cgroup. Those objects (include the pages
> > which are allocated from buddy allocator directly) are charged as kmem
> > pages which still hold a reference to the memory cgroup.
>
> Yes, this is a good idea, large kmallocs should be treated the same
> way as small ones.
>
> >
> > E.g. We know that the kernel stack is charged as kmem pages because the
> > size of the kernel stack can be greater than 2 pages (e.g. 16KB on x86_64
> > or arm64). If we create a thread (suppose the thread stack is charged to
> > memory cgroup A) and then move it from memory cgroup A to memory cgroup
> > B. Because the kernel stack of the thread hold a reference to the memory
> > cgroup A. The thread can pin the memory cgroup A in the memory even if
> > we remove the cgroup A. If we want to see this scenario by using the
> > following script. We can see that the system has added 500 dying cgroups.
> >
> >       #!/bin/bash
> >
> >       cat /proc/cgroups | grep memory
> >
> >       cd /sys/fs/cgroup/memory
> >       echo 1 > memory.move_charge_at_immigrate
> >
> >       for i in range{1..500}
> >       do
> >               mkdir kmem_test
> >               echo $$ > kmem_test/cgroup.procs
> >               sleep 3600 &
> >               echo $$ > cgroup.procs
> >               echo `cat kmem_test/cgroup.procs` > cgroup.procs
> >               rmdir kmem_test
> >       done
> >
> >       cat /proc/cgroups | grep memory
>
> Well, moving processes between cgroups always created a lot of issues
> and corner cases and this one is definitely not the worst. So this problem
> looks a bit artificial, unless I'm missing something. But if it doesn't
> introduce any new performance costs and doesn't make the code more complex,
> I have nothing against.

OK. I just want to show that large kmallocs are charged as kmem pages.
So I constructed this test case.

>
> Btw, can you, please, run the spell-checker on commit logs? There are many
> typos (starting from the title of the series, I guess), which make the patchset
> look less appealing.

Sorry for my poor English. I will do that. Thanks for your suggestions.


>
> Thank you!
>
> >
> > This patchset aims to make those kmem pages drop the reference to memory
> > cgroup by using the APIs of obj_cgroup. Finally, we can see that the number
> > of the dying cgroups will not increase if we run the above test script.
> >
> > Patch 1-3 are using obj_cgroup APIs to charge kmem pages. The remote
> > memory cgroup charing APIs is a mechanism to charge kernel memory to a
> > given memory cgroup. So I also make it use the APIs of obj_cgroup.
> > Patch 4-5 are doing this.
> >
> > Muchun Song (5):
> >   mm: memcontrol: introduce obj_cgroup_{un}charge_page
> >   mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem
> >     page
> >   mm: memcontrol: reparent the kmem pages on cgroup removal
> >   mm: memcontrol: move remote memcg charging APIs to CONFIG_MEMCG_KMEM
> >   mm: memcontrol: use object cgroup for remote memory cgroup charging
> >
> >  fs/buffer.c                          |  10 +-
> >  fs/notify/fanotify/fanotify.c        |   6 +-
> >  fs/notify/fanotify/fanotify_user.c   |   2 +-
> >  fs/notify/group.c                    |   3 +-
> >  fs/notify/inotify/inotify_fsnotify.c |   8 +-
> >  fs/notify/inotify/inotify_user.c     |   2 +-
> >  include/linux/bpf.h                  |   2 +-
> >  include/linux/fsnotify_backend.h     |   2 +-
> >  include/linux/memcontrol.h           | 109 +++++++++++---
> >  include/linux/sched.h                |   6 +-
> >  include/linux/sched/mm.h             |  30 ++--
> >  kernel/bpf/syscall.c                 |  35 ++---
> >  kernel/fork.c                        |   4 +-
> >  mm/memcontrol.c                      | 276 ++++++++++++++++++++++-------------
> >  mm/page_alloc.c                      |   4 +-
> >  15 files changed, 324 insertions(+), 175 deletions(-)
> >
> > --
> > 2.11.0
> >

WARNING: multiple messages have this Message-ID (diff)
From: Muchun Song <songmuchun@bytedance.com>
To: Roman Gushchin <guro@fb.com>
Cc: viro@zeniv.linux.org.uk, Jan Kara <jack@suse.cz>,
	amir73il@gmail.com, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	andrii@kernel.org, Martin KaFai Lau <kafai@fb.com>,
	Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	kpsingh@kernel.org, mingo@redhat.com,
	Peter Zijlstra <peterz@infradead.org>,
	juri.lelli@redhat.com,
	Vincent Guittot <vincent.guittot@linaro.org>,
	dietmar.eggemann@arm.com, Steven Rostedt <rostedt@goodmis.org>,
	Benjamin Segall <bsegall@google.com>,
	mgorman@suse.de, bristot@redhat.com,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Shakeel Butt <shakeelb@google>
Subject: Re: [External] Re: [PATCH 0/5] Use obj_cgroup APIs to change kmem pages
Date: Tue, 2 Mar 2021 10:50:04 +0800	[thread overview]
Message-ID: <CAMZfGtUzB1duVS+pSEHvB-g6BSQ25mQMvUjopcADx0v2go3Q0g@mail.gmail.com> (raw)
In-Reply-To: <YD2Q5q2HfKXPnDte@carbon.dhcp.thefacebook.com>

On Tue, Mar 2, 2021 at 9:12 AM Roman Gushchin <guro@fb.com> wrote:
>
> Hi Muchun!
>
> On Mon, Mar 01, 2021 at 02:22:22PM +0800, Muchun Song wrote:
> > Since Roman series "The new cgroup slab memory controller" applied. All
> > slab objects are changed via the new APIs of obj_cgroup. This new APIs
> > introduce a struct obj_cgroup instead of using struct mem_cgroup directly
> > to charge slab objects. It prevents long-living objects from pinning the
> > original memory cgroup in the memory. But there are still some corner
> > objects (e.g. allocations larger than order-1 page on SLUB) which are
> > not charged via the API of obj_cgroup. Those objects (include the pages
> > which are allocated from buddy allocator directly) are charged as kmem
> > pages which still hold a reference to the memory cgroup.
>
> Yes, this is a good idea, large kmallocs should be treated the same
> way as small ones.
>
> >
> > E.g. We know that the kernel stack is charged as kmem pages because the
> > size of the kernel stack can be greater than 2 pages (e.g. 16KB on x86_64
> > or arm64). If we create a thread (suppose the thread stack is charged to
> > memory cgroup A) and then move it from memory cgroup A to memory cgroup
> > B. Because the kernel stack of the thread hold a reference to the memory
> > cgroup A. The thread can pin the memory cgroup A in the memory even if
> > we remove the cgroup A. If we want to see this scenario by using the
> > following script. We can see that the system has added 500 dying cgroups.
> >
> >       #!/bin/bash
> >
> >       cat /proc/cgroups | grep memory
> >
> >       cd /sys/fs/cgroup/memory
> >       echo 1 > memory.move_charge_at_immigrate
> >
> >       for i in range{1..500}
> >       do
> >               mkdir kmem_test
> >               echo $$ > kmem_test/cgroup.procs
> >               sleep 3600 &
> >               echo $$ > cgroup.procs
> >               echo `cat kmem_test/cgroup.procs` > cgroup.procs
> >               rmdir kmem_test
> >       done
> >
> >       cat /proc/cgroups | grep memory
>
> Well, moving processes between cgroups always created a lot of issues
> and corner cases and this one is definitely not the worst. So this problem
> looks a bit artificial, unless I'm missing something. But if it doesn't
> introduce any new performance costs and doesn't make the code more complex,
> I have nothing against.

OK. I just want to show that large kmallocs are charged as kmem pages.
So I constructed this test case.

>
> Btw, can you, please, run the spell-checker on commit logs? There are many
> typos (starting from the title of the series, I guess), which make the patchset
> look less appealing.

Sorry for my poor English. I will do that. Thanks for your suggestions.


>
> Thank you!
>
> >
> > This patchset aims to make those kmem pages drop the reference to memory
> > cgroup by using the APIs of obj_cgroup. Finally, we can see that the number
> > of the dying cgroups will not increase if we run the above test script.
> >
> > Patch 1-3 are using obj_cgroup APIs to charge kmem pages. The remote
> > memory cgroup charing APIs is a mechanism to charge kernel memory to a
> > given memory cgroup. So I also make it use the APIs of obj_cgroup.
> > Patch 4-5 are doing this.
> >
> > Muchun Song (5):
> >   mm: memcontrol: introduce obj_cgroup_{un}charge_page
> >   mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem
> >     page
> >   mm: memcontrol: reparent the kmem pages on cgroup removal
> >   mm: memcontrol: move remote memcg charging APIs to CONFIG_MEMCG_KMEM
> >   mm: memcontrol: use object cgroup for remote memory cgroup charging
> >
> >  fs/buffer.c                          |  10 +-
> >  fs/notify/fanotify/fanotify.c        |   6 +-
> >  fs/notify/fanotify/fanotify_user.c   |   2 +-
> >  fs/notify/group.c                    |   3 +-
> >  fs/notify/inotify/inotify_fsnotify.c |   8 +-
> >  fs/notify/inotify/inotify_user.c     |   2 +-
> >  include/linux/bpf.h                  |   2 +-
> >  include/linux/fsnotify_backend.h     |   2 +-
> >  include/linux/memcontrol.h           | 109 +++++++++++---
> >  include/linux/sched.h                |   6 +-
> >  include/linux/sched/mm.h             |  30 ++--
> >  kernel/bpf/syscall.c                 |  35 ++---
> >  kernel/fork.c                        |   4 +-
> >  mm/memcontrol.c                      | 276 ++++++++++++++++++++++-------------
> >  mm/page_alloc.c                      |   4 +-
> >  15 files changed, 324 insertions(+), 175 deletions(-)
> >
> > --
> > 2.11.0
> >

  reply	other threads:[~2021-03-02  7:57 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-01  6:22 [PATCH 0/5] Use obj_cgroup APIs to change kmem pages Muchun Song
2021-03-01  6:22 ` Muchun Song
2021-03-01  6:22 ` [PATCH 1/5] mm: memcontrol: introduce obj_cgroup_{un}charge_page Muchun Song
2021-03-01  6:22   ` Muchun Song
2021-03-01  6:22 ` [PATCH 2/5] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page Muchun Song
2021-03-01  6:22   ` Muchun Song
2021-03-01 18:11   ` Shakeel Butt
2021-03-01 18:11     ` Shakeel Butt
2021-03-01 18:11     ` Shakeel Butt
2021-03-01 19:09     ` Johannes Weiner
2021-03-01 19:09       ` Johannes Weiner
2021-03-02  3:49       ` [External] " Muchun Song
2021-03-02  3:49         ` Muchun Song
2021-03-02  3:49         ` Muchun Song
2021-03-02  3:03     ` Muchun Song
2021-03-02  3:03       ` Muchun Song
2021-03-02  3:03       ` Muchun Song
2021-03-02  3:35       ` Shakeel Butt
2021-03-02  3:35         ` Shakeel Butt
2021-03-02  3:35         ` Shakeel Butt
2021-03-02  3:51         ` Muchun Song
2021-03-02  3:51           ` Muchun Song
2021-03-02  3:51           ` Muchun Song
2021-03-01  6:22 ` [PATCH 3/5] mm: memcontrol: reparent the kmem pages on cgroup removal Muchun Song
2021-03-01  6:22   ` Muchun Song
2021-03-01  6:22 ` [PATCH 4/5] mm: memcontrol: move remote memcg charging APIs to CONFIG_MEMCG_KMEM Muchun Song
2021-03-01  6:22   ` Muchun Song
2021-03-02  1:15   ` Roman Gushchin
2021-03-02  1:15     ` Roman Gushchin
2021-03-02  3:43     ` Shakeel Butt
2021-03-02  3:43       ` Shakeel Butt
2021-03-02  3:58       ` Roman Gushchin
2021-03-02  3:58         ` Roman Gushchin
2021-03-02  4:12     ` [External] " Muchun Song
2021-03-02  4:12       ` Muchun Song
2021-03-02  4:12       ` Muchun Song
2021-03-01  6:22 ` [PATCH 5/5] mm: memcontrol: use object cgroup for remote memory cgroup charging Muchun Song
2021-03-01  6:22   ` Muchun Song
2021-03-02  1:29   ` Roman Gushchin
2021-03-02  4:11     ` [External] " Muchun Song
2021-03-02  4:11       ` Muchun Song
2021-03-02  1:12 ` [PATCH 0/5] Use obj_cgroup APIs to change kmem pages Roman Gushchin
2021-03-02  1:12   ` Roman Gushchin
2021-03-02  2:50   ` Muchun Song [this message]
2021-03-02  2:50     ` [External] " Muchun Song
2021-03-02  2:50     ` Muchun Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMZfGtUzB1duVS+pSEHvB-g6BSQ25mQMvUjopcADx0v2go3Q0g@mail.gmail.com \
    --to=songmuchun@bytedance.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linux.alibaba.com \
    --cc=amir73il@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chris@chrisdown.name \
    --cc=christian.brauner@ubuntu.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=daniel@iogearbox.net \
    --cc=dietmar.eggemann@arm.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=ebiederm@xmission.com \
    --cc=elver@google.com \
    --cc=esyr@redhat.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=juri.lelli@redhat.com \
    --cc=kafai@fb.com \
    --cc=keescook@chromium.org \
    --cc=kpsingh@kernel.org \
    --cc=krisman@collabora.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=posk@google.com \
    --cc=richard.weiyang@gmail.com \
    --cc=rostedt@goodmis.org \
    --cc=shakeelb@google.com \
    --cc=songliubraving@fb.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=vdavydov.dev@gmail.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=walken@google.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.