linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Yosry Ahmed <yosryahmed@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeelb@google.com>,
	Muchun Song <muchun.song@linux.dev>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Tejun Heo <tj@kernel.org>, Zefan Li <lizefan.x@bytedance.com>,
	Yu Zhao <yuzhao@google.com>, Luis Chamberlain <mcgrof@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Iurii Zaikin <yzaikin@google.com>,
	"T.J. Mercier" <tjmercier@google.com>,
	Greg Thelen <gthelen@google.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	cgroups@vger.kernel.org
Subject: Re: [RFC PATCH 0/8] memory recharging for offline memcgs
Date: Thu, 20 Jul 2023 11:35:15 -0400	[thread overview]
Message-ID: <20230720153515.GA1003248@cmpxchg.org> (raw)
In-Reply-To: <20230720070825.992023-1-yosryahmed@google.com>

On Thu, Jul 20, 2023 at 07:08:17AM +0000, Yosry Ahmed wrote:
> This patch series implements the proposal in LSF/MM/BPF 2023 conference
> for reducing offline/zombie memcgs by memory recharging [1]. The main
> difference is that this series focuses on recharging and does not
> include eviction of any memory charged to offline memcgs.
> 
> Two methods of recharging are proposed:
> 
> (a) Recharging of mapped folios.
> 
> When a memcg is offlined, queue an asynchronous worker that will walk
> the lruvec of the offline memcg and try to recharge any mapped folios to
> the memcg of one of the processes mapping the folio. The main assumption
> is that a process mapping the folio is the "rightful" owner of the
> memory.
> 
> Currently, this is only supported for evictable folios, as the
> unevictable lru is imaginary and we cannot iterate the folios on it. A
> separate proposal [2] was made to revive the unevictable lru, which
> would allow recharging of unevictable folios.
> 
> (b) Deferred recharging of folios.
> 
> For folios that are unmapped, or mapped but we fail to recharge them
> with (a), we rely on deferred recharging. Simply put, any time a folio
> is accessed or dirtied by a userspace process, and that folio is charged
> to an offline memcg, we will try to recharge it to the memcg of the
> process accessing the folio. Again, we assume this process should be the
> "rightful" owner of the memory. This is also done asynchronously to avoid
> slowing down the data access path.

I'm super skeptical of this proposal.

Recharging *might* be the most desirable semantics from a user pov,
but only if it applies consistently to the whole memory footprint.
There is no mention of slab allocations such as inodes, dentries,
network buffers etc. which can be a significant part of a cgroup's
footprint. These are currently reparented. I don't think doing one
thing with half of the memory, and a totally different thing with the
other half upon cgroup deletion is going to be acceptable semantics.

It appears this also brings back the reliability issue that caused us
to deprecate charge moving. The recharge path has trylocks, LRU
isolation attempts, GFP_ATOMIC allocations. These introduce a variable
error rate into the relocation process, which causes pages that should
belong to the same domain to be scattered around all over the place.
It also means that zombie pinning still exists, but it's now even more
influenced by timing and race conditions, and so less predictable.

There are two issues being conflated here:

a) the problem of zombie cgroups, and

b) who controls resources that outlive the control domain.

For a), reparenting is still the most reasonable proposal. It's
reliable for one, but it also fixes the problem fully within the
established, user-facing semantics: resources that belong to a cgroup
also hierarchically belong to all ancestral groups; if those resources
outlive the last-level control domain, they continue to belong to the
parents. This is how it works today, and this is how it continues to
work with reparenting. The only difference is that those resources no
longer pin a dead cgroup anymore, but instead are physically linked to
the next online ancestor. Since dead cgroups have no effective control
parameters anymore, this is semantically equivalent - it's just a more
memory efficient implementation of the same exact thing.

b) is a discussion totally separate from this. We can argue what we
want this behavior to be, but I'd argue strongly that whatever we do
here should apply to all resources managed by the controller equally.

It could also be argued that if you don't want to lose control over a
set of resources, then maybe don't delete their control domain while
they are still alive and in use. For example, when restarting a
workload, and the new instance is expected to have largely the same
workingset, consider reusing the cgroup instead of making a new one.

For the zombie problem, I think we should merge Muchun's patches
ASAP. They've been proposed several times, they have Roman's reviews
and acks, and they do not change user-facing semantics. There is no
good reason not to merge them.


  parent reply	other threads:[~2023-07-20 15:35 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-20  7:08 [RFC PATCH 0/8] memory recharging for offline memcgs Yosry Ahmed
2023-07-20  7:08 ` [RFC PATCH 1/8] memcg: refactor updating memcg->moving_account Yosry Ahmed
2023-07-20  7:08 ` [RFC PATCH 2/8] mm: vmscan: add lruvec_for_each_list() helper Yosry Ahmed
2023-07-20  7:08 ` [RFC PATCH 3/8] memcg: recharge mapped folios when a memcg is offlined Yosry Ahmed
2023-07-20  7:08 ` [RFC PATCH 4/8] memcg: support deferred memcg recharging Yosry Ahmed
2023-07-20  7:08 ` [RFC PATCH 5/8] memcg: recharge folios when accessed or dirtied Yosry Ahmed
2023-07-20  7:08 ` [RFC PATCH 6/8] memcg: add stats for offline memcgs recharging Yosry Ahmed
2023-07-20  7:08 ` [RFC PATCH 7/8] memcg: add sysctl and config option to control memory recharging Yosry Ahmed
2023-07-20 18:13   ` Luis Chamberlain
2023-07-20 18:24     ` Yosry Ahmed
2023-07-20 18:30       ` Luis Chamberlain
2023-07-20  7:08 ` [RFC PATCH 8/8] selftests: cgroup: test_memcontrol: add a selftest for memcg recharging Yosry Ahmed
2023-07-20 15:35 ` Johannes Weiner [this message]
2023-07-20 19:57   ` [RFC PATCH 0/8] memory recharging for offline memcgs Tejun Heo
2023-07-20 21:34     ` Yosry Ahmed
2023-07-20 22:11       ` Tejun Heo
2023-07-20 22:23         ` Yosry Ahmed
2023-07-20 22:31           ` Tejun Heo
2023-07-20 23:24             ` T.J. Mercier
2023-07-20 23:33               ` Tejun Heo
2023-07-21 18:15             ` Yosry Ahmed
2023-07-21 18:26               ` Tejun Heo
2023-07-21 18:47                 ` Yosry Ahmed
2023-07-21 19:18                   ` Tejun Heo
2023-07-21 20:37                     ` Yosry Ahmed
2023-07-21 20:44                   ` Johannes Weiner
2023-07-21 20:59                     ` Yosry Ahmed
2023-07-20 21:33   ` Yosry Ahmed
2023-08-01  9:54   ` Michal Hocko
2023-07-21  0:02 ` Roman Gushchin
2023-07-21  0:07   ` Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230720153515.GA1003248@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=gthelen@google.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mcgrof@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    --cc=tjmercier@google.com \
    --cc=willy@infradead.org \
    --cc=yosryahmed@google.com \
    --cc=yuzhao@google.com \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).