All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <songmuchun@bytedance.com>,
	vdavydov.dev@gmail.com, akpm@linux-foundation.org,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2
Date: Tue, 16 Feb 2021 18:19:39 +0100	[thread overview]
Message-ID: <YCv+q749iP4J/pWC@dhcp22.suse.cz> (raw)
In-Reply-To: <YCv51LgGIWxVjLHT@cmpxchg.org>

On Tue 16-02-21 11:59:00, Johannes Weiner wrote:
> Hello Muchun,
> 
> On Sat, Feb 13, 2021 at 01:01:59AM +0800, Muchun Song wrote:
> > The swap charges the actual number of swap entries on cgroup v2.
> > If a swap cache page is charged successful, and then we uncharge
> > the swap counter. It is wrong on cgroup v2. Because the swap
> > entry is not freed.
> 
> The patch makes sense to me. But this code is a bit tricky, we should
> add more documentation to how it works and what the problem is.
> 
> How about this for the changelog?
> 
> ---
> mm: memcontrol: fix swap undercounting for shared pages in cgroup2
> 
> When shared pages are swapped in partially, we can have some page
> tables referencing the in-memory page and some referencing the swap
> slot. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly
> differently due to the nature of how they account memory and swap:
> 
> Cgroup1 has a unified memory+swap counter that tracks a data page
> regardless whether it's in-core or swapped out. On swapin, we transfer
> the charge from the swap entry to the newly allocated swapcache page,
> even though the swap entry might stick around for a while. That's why
> we have a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge().
> 
> Cgroup2 tracks memory and swap as separate, independent resources and
> thus has split memory and swap counters. On swapin, we charge the
> newly allocated swapcache page as memory, while the swap slot in turn
> must remain charged to the swap counter as long as its allocated too.
> 
> The cgroup2 logic was broken by commit 2d1c498072de ("mm: memcontrol:
> make swap tracking an integral part of memory control"), because it
> accidentally removed the do_memsw_account() check in the branch inside
> mem_cgroup_uncharge() that was supposed to tell the difference between
> the charge transfer in cgroup1 and the separate counters in cgroup2.
> 
> As a result, cgroup2 currently undercounts consumed swap when shared
> pages are partially swapped back in. This in turn allows a cgroup to
> consume more swap than its configured limit intends.
> 
> Add the do_memsw_account() check back to fix this problem.

Yes this clarfies both the issue and the subtlety of the accounting.
Thanks a lot Johannes! This is a great example of how changelogs should
really look.

> ---
> 
> > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control")
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> 
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> > ---
> >  mm/memcontrol.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index c737c8f05992..be6bc5044150 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)
> >  	memcg_check_events(memcg, page);
> >  	local_irq_enable();
> >  
> > -	if (PageSwapCache(page)) {
> > +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) {
> 
> It's more descriptive to use do_memsw_account() here, IMO.
> 
> We should also add a comment. How about this above the branch?
> 
> 	/*
> 	 * Cgroup1's unified memory+swap counter has been charged with the
> 	 * new swapcache page, finish the transfer by uncharging the swap
> 	 * slot. The swap slot would also get uncharged when it dies, but
> 	 * for shared pages it can stick around indefinitely and we'd count
> 	 * the page twice the entire time.
> 	 *
> 	 * Cgroup2 has separate resource counters for memory and swap,
> 	 * so this is a non-issue here. Memory and swap charge lifetimes
> 	 * correspond 1:1 to page and swap slot lifetimes: we charge the
> 	 * page to memory here, and uncharge swap when the slot is freed.
> 	 */

Yes very helpful.

With the changelog update and the comment
Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>
To: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>,
	vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2
Date: Tue, 16 Feb 2021 18:19:39 +0100	[thread overview]
Message-ID: <YCv+q749iP4J/pWC@dhcp22.suse.cz> (raw)
In-Reply-To: <YCv51LgGIWxVjLHT-druUgvl0LCNAfugRpC6u6w@public.gmane.org>

On Tue 16-02-21 11:59:00, Johannes Weiner wrote:
> Hello Muchun,
> 
> On Sat, Feb 13, 2021 at 01:01:59AM +0800, Muchun Song wrote:
> > The swap charges the actual number of swap entries on cgroup v2.
> > If a swap cache page is charged successful, and then we uncharge
> > the swap counter. It is wrong on cgroup v2. Because the swap
> > entry is not freed.
> 
> The patch makes sense to me. But this code is a bit tricky, we should
> add more documentation to how it works and what the problem is.
> 
> How about this for the changelog?
> 
> ---
> mm: memcontrol: fix swap undercounting for shared pages in cgroup2
> 
> When shared pages are swapped in partially, we can have some page
> tables referencing the in-memory page and some referencing the swap
> slot. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly
> differently due to the nature of how they account memory and swap:
> 
> Cgroup1 has a unified memory+swap counter that tracks a data page
> regardless whether it's in-core or swapped out. On swapin, we transfer
> the charge from the swap entry to the newly allocated swapcache page,
> even though the swap entry might stick around for a while. That's why
> we have a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge().
> 
> Cgroup2 tracks memory and swap as separate, independent resources and
> thus has split memory and swap counters. On swapin, we charge the
> newly allocated swapcache page as memory, while the swap slot in turn
> must remain charged to the swap counter as long as its allocated too.
> 
> The cgroup2 logic was broken by commit 2d1c498072de ("mm: memcontrol:
> make swap tracking an integral part of memory control"), because it
> accidentally removed the do_memsw_account() check in the branch inside
> mem_cgroup_uncharge() that was supposed to tell the difference between
> the charge transfer in cgroup1 and the separate counters in cgroup2.
> 
> As a result, cgroup2 currently undercounts consumed swap when shared
> pages are partially swapped back in. This in turn allows a cgroup to
> consume more swap than its configured limit intends.
> 
> Add the do_memsw_account() check back to fix this problem.

Yes this clarfies both the issue and the subtlety of the accounting.
Thanks a lot Johannes! This is a great example of how changelogs should
really look.

> ---
> 
> > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control")
> > Signed-off-by: Muchun Song <songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
> 
> Acked-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
> 
> > ---
> >  mm/memcontrol.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index c737c8f05992..be6bc5044150 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)
> >  	memcg_check_events(memcg, page);
> >  	local_irq_enable();
> >  
> > -	if (PageSwapCache(page)) {
> > +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && PageSwapCache(page)) {
> 
> It's more descriptive to use do_memsw_account() here, IMO.
> 
> We should also add a comment. How about this above the branch?
> 
> 	/*
> 	 * Cgroup1's unified memory+swap counter has been charged with the
> 	 * new swapcache page, finish the transfer by uncharging the swap
> 	 * slot. The swap slot would also get uncharged when it dies, but
> 	 * for shared pages it can stick around indefinitely and we'd count
> 	 * the page twice the entire time.
> 	 *
> 	 * Cgroup2 has separate resource counters for memory and swap,
> 	 * so this is a non-issue here. Memory and swap charge lifetimes
> 	 * correspond 1:1 to page and swap slot lifetimes: we charge the
> 	 * page to memory here, and uncharge swap when the slot is freed.
> 	 */

Yes very helpful.

With the changelog update and the comment
Acked-by: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>

Thanks!

-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2021-02-16 17:20 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-12 17:01 [PATCH 1/4] mm: memcontrol: remove memcg check from memcg_oom_recover Muchun Song
2021-02-12 17:01 ` Muchun Song
2021-02-12 17:01 ` [PATCH 2/4] mm: memcontrol: add missing memcg_oom_recover() when uncharge slab page Muchun Song
2021-02-12 17:01   ` Muchun Song
2021-02-15  9:37   ` Michal Hocko
2021-02-15  9:37     ` Michal Hocko
2021-02-12 17:01 ` [PATCH 3/4] mm: memcontrol: bail out early when id is zero Muchun Song
2021-02-12 17:01   ` Muchun Song
2021-02-15  9:39   ` Michal Hocko
2021-02-15  9:39     ` Michal Hocko
2021-02-15 10:09     ` [External] " Muchun Song
2021-02-15 10:09       ` Muchun Song
2021-02-15 10:09       ` Muchun Song
2021-02-15 10:27       ` Michal Hocko
2021-02-15 10:27         ` Michal Hocko
2021-02-15 11:34         ` Muchun Song
2021-02-15 11:34           ` Muchun Song
2021-02-15 11:34           ` Muchun Song
2021-02-12 17:01 ` [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2 Muchun Song
2021-02-12 17:01   ` Muchun Song
2021-02-12 18:56   ` Shakeel Butt
2021-02-12 18:56     ` Shakeel Butt
2021-02-13  6:48     ` [External] " Muchun Song
2021-02-13  6:48       ` Muchun Song
2021-02-16 17:16       ` Shakeel Butt
2021-02-16 17:16         ` Shakeel Butt
2021-02-16 17:16         ` Shakeel Butt
2021-02-15  9:47   ` Michal Hocko
2021-02-15  9:47     ` Michal Hocko
2021-02-15 10:15     ` [External] " Muchun Song
2021-02-15 10:15       ` Muchun Song
2021-02-15 10:15       ` Muchun Song
2021-02-15 10:24       ` Michal Hocko
2021-02-15 10:24         ` Michal Hocko
2021-02-16 16:59   ` Johannes Weiner
2021-02-16 17:17     ` Shakeel Butt
2021-02-16 17:17       ` Shakeel Butt
2021-02-16 17:17       ` Shakeel Butt
2021-02-16 17:19     ` Michal Hocko [this message]
2021-02-16 17:19       ` Michal Hocko
2021-02-16 17:28     ` Johannes Weiner
2021-02-16 17:28       ` Johannes Weiner
2021-02-17  5:15       ` [External] " Muchun Song
2021-02-17  5:15         ` Muchun Song
2021-02-17  5:15         ` Muchun Song
2021-02-17  5:11     ` Muchun Song
2021-02-17  5:11       ` Muchun Song
2021-02-17  5:11       ` Muchun Song
2021-02-15  9:24 ` [PATCH 1/4] mm: memcontrol: remove memcg check from memcg_oom_recover Michal Hocko
2021-02-15  9:24   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YCv+q749iP4J/pWC@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=songmuchun@bytedance.com \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.