From: Johannes Weiner <hannes@cmpxchg.org>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Shakeel Butt <shakeelb@google.com>,
Michal Hocko <mhocko@suse.com>,
linux-mm@kvack.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 2/2] mm: memcontrol: deprecate swapaccounting=0 mode
Date: Mon, 22 Mar 2021 11:53:41 -0400 [thread overview]
Message-ID: <YFi9hVQM9maWyhWv@cmpxchg.org> (raw)
In-Reply-To: <alpine.LSU.2.11.2103191633350.1043@eggly.anvils>
On Fri, Mar 19, 2021 at 05:37:05PM -0700, Hugh Dickins wrote:
> On Fri, 19 Mar 2021, Johannes Weiner wrote:
>
> > The swapaccounting= commandline option already does very little
> > today. To close a trivial containment failure case, the swap ownership
> > tracking part of the swap controller has recently become mandatory
> > (see commit 2d1c498072de ("mm: memcontrol: make swap tracking an
> > integral part of memory control") for details), which makes up the
> > majority of the work during swapout, swapin, and the swap slot map.
> >
> > The only thing left under this flag is the page_counter operations and
> > the visibility of the swap control files in the first place, which are
> > rather meager savings. There also aren't many scenarios, if any, where
> > controlling the memory of a cgroup while allowing it unlimited access
> > to a global swap space is a workable resource isolation stragegy.
> >
> > On the other hand, there have been several bugs and confusion around
> > the many possible swap controller states (cgroup1 vs cgroup2 behavior,
> > memory accounting without swap accounting, memcg runtime disabled).
> >
> > This puts the maintenance overhead of retaining the toggle above its
> > practical benefits. Deprecate it.
> >
> > Suggested-by: Shakeel Butt <shakeelb@google.com>
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
>
> This crashes, and needs a fix: see below (plus some nits).
>
> But it's a very welcome cleanup: just getting rid of all those
> !cgroup_memory_noswap double negatives is a relief in itself.
>
> It does suggest eliminating CONFIG_MEMCG_SWAP altogether (just
> using #ifdef CONFIG_SWAP instead, in those parts of CONFIG_MEMCG code);
> but you're right that's a separate cleanup, and not nearly so worthwhile
> as this one (I notice CONFIG_MEMCG_SWAP in some of the arch defconfigs,
> and don't know whether whoever removes CONFIG_MEMCG_SWAP would be
> obligated to remove those too).
2d1c498072de ("mm: memcontrol: make swap tracking an integral part of
memory control") made it invisible to the user already, I only kept
the symbol for convenience in the Makefile:
obj-$(CONFIG_MEMCG_SWAP) += swap_cgroup.o
But I guess we could replace it with
ifdef CONFIG_SWAP
obj-$(CONFIG_MEMCG) += swap_cgroup.c
endif
and I agree, everywhere else it's currently used would be nicer
without it. I'll send a separate patch.
> > @@ -99,7 +92,11 @@ static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq);
> > /* Whether legacy memory+swap accounting is active */
> > static bool do_memsw_account(void)
> > {
> > - return !cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_noswap;
> > + /* cgroup2 doesn't do mem+swap accounting */
> > + if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
> > + return false;
> > +
> > + return true;
>
> Nit: I'm not fond of the "if (boolean()) return true; else return false;"
> codestyle, and would prefer the straightforward
>
> return !cgroup_subsys_on_dfl(memory_cgrp_subsys);
>
> but you've chosen otherwise, so, okay.
I prefer a series of branches if a single expression becomes unwieldy,
and individual conditions deserve their own comments.
But it's indeed pretty silly in this case, I'll make it a single line.
> > @@ -7124,7 +7121,7 @@ long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg)
> > {
> > long nr_swap_pages = get_nr_swap_pages();
> >
> > - if (cgroup_memory_noswap || !cgroup_subsys_on_dfl(memory_cgrp_subsys))
> > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
>
> That needs to check mem_cgroup_disabled() where it used to check
> cgroup_memory_noswap. The convolutions are confusing (which is
> precisely why this is such a welcome cleanup), but without a
> mem_cgroup_disabled() (or NULL memcg) check there, the
> cgroup_disable=memory case oopses on NULLish pointer dereference
> when mem_cgroup_get_nr_swap_pages() is called from get_scan_count().
>
> (This little function has been repeatedly troublesome that way.)
Sigh, yes, this will hopefully be the last instance of this bug...
Thanks for catching that, I'll fix it up.
> > @@ -7141,7 +7138,7 @@ bool mem_cgroup_swap_full(struct page *page)
> >
> > if (vm_swap_full())
> > return true;
> > - if (cgroup_memory_noswap || !cgroup_subsys_on_dfl(memory_cgrp_subsys))
> > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
>
> Would it now be better to say "if (do_memsw_account())" there?
Yes, I'll change that.
> Or would it be better to eliminate do_memsw_account() altogether,
> and just use !cgroup_subsys_on_dfl(memory_cgrp_subsys) throughout?
I have found do_memsw_account() useful in the past to find those
related pieces. The details elude me know but I remember searching for
this string often during the recent work in this area.
> (Though I don't find "cgroup_subsys_on_dfl" very informative.)
This routinely bothers me as well, but I have not been able to come up
with a good replacement.
memcg_v1()? memcg_legacy_mode()? memory_cgroup_in_bytes()?
> > @@ -7291,27 +7287,13 @@ static struct cftype memsw_files[] = {
> > { }, /* terminate */
> > };
> >
> > -/*
> > - * If mem_cgroup_swap_init() is implemented as a subsys_initcall()
> > - * instead of a core_initcall(), this could mean cgroup_memory_noswap still
> > - * remains set to false even when memcg is disabled via "cgroup_disable=memory"
> > - * boot parameter. This may result in premature OOPS inside
> > - * mem_cgroup_get_nr_swap_pages() function in corner cases.
> > - */
> > static int __init mem_cgroup_swap_init(void)
> > {
> > - /* No memory control -> no swap control */
> > - if (mem_cgroup_disabled())
> > - cgroup_memory_noswap = true;
> > -
> > - if (cgroup_memory_noswap)
> > - return 0;
> > -
>
> Shakeel suggested "if (mem_cgroup_disabled()) return 0;" here,
> and that was the first thing I tried when I got the crash.
It really isn't obviously safe. I'll put it back in.
next prev parent reply other threads:[~2021-03-22 15:54 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-19 5:49 [PATCH 1/2] mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled Johannes Weiner
2021-03-19 5:49 ` [PATCH 2/2] mm: memcontrol: deprecate swapaccounting=0 mode Johannes Weiner
2021-03-19 13:49 ` Shakeel Butt
2021-03-19 17:36 ` Johannes Weiner
2021-03-19 18:20 ` Shakeel Butt
2021-03-20 0:37 ` Hugh Dickins
2021-03-22 15:53 ` Johannes Weiner [this message]
2021-03-19 14:00 ` [PATCH 1/2] mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled Shakeel Butt
2021-03-19 23:33 ` Hugh Dickins
2021-03-22 9:41 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YFi9hVQM9maWyhWv@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=hughd@google.com \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=shakeelb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).