linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Shakeel Butt <shakeelb@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Michal Hocko" <mhocko@suse.com>, "Roman Gushchin" <guro@fb.com>,
	"Yang Shi" <yang.shi@linux.alibaba.com>,
	"Greg Thelen" <gthelen@google.com>,
	"David Rientjes" <rientjes@google.com>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Linux MM" <linux-mm@kvack.org>,
	Cgroups <cgroups@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"Andrea Righi" <andrea.righi@canonical.com>,
	"SeongJae Park" <sjpark@amazon.com>
Subject: Re: [PATCH] memcg: introduce per-memcg reclaim interface
Date: Thu, 8 Oct 2020 08:55:57 -0700	[thread overview]
Message-ID: <CALvZod5-EtB0jNi9DXTmLSKrUzK2jXRhW8h6+7sqB356k0t1+g@mail.gmail.com> (raw)
In-Reply-To: <20201008145336.GA163830@cmpxchg.org>

On Thu, Oct 8, 2020 at 7:55 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Tue, Oct 06, 2020 at 09:55:43AM -0700, Shakeel Butt wrote:
> > On Thu, Oct 1, 2020 at 7:33 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > >
> > [snip]
> > > > >    So instead of asking users for a target size whose suitability
> > > > >    heavily depends on the kernel's LRU implementation, the readahead
> > > > >    code, the IO device's capability and general load, why not directly
> > > > >    ask the user for a pressure level that the workload is comfortable
> > > > >    with and which captures all of the above factors implicitly? Then
> > > > >    let the kernel do this feedback loop from a per-cgroup worker.
> > > >
> > > > I am assuming here by pressure level you are referring to the PSI like
> > > > interface e.g. allowing the users to tell about their jobs that X
> > > > amount of stalls in a fixed time window is tolerable.
> > >
> > > Right, essentially the same parameters that psi poll() would take.
> >
> > I thought a bit more on the semantics of the psi usage for the
> > proactive reclaim.
> >
> > Suppose I have a top level cgroup A on which I want to enable
> > proactive reclaim. Which memory psi events should the proactive
> > reclaim should consider?
> >
> > The simplest would be the memory.psi at 'A'. However memory.psi is
> > hierarchical and I would not really want the pressure due limits in
> > children of 'A' to impact the proactive reclaim.
>
> I don't think pressure from limits down the tree can be separated out,
> generally. All events are accounted recursively as well. Of course, we
> remember the reclaim level for evicted entries - but if there is
> reclaim triggered at A and A/B concurrently, the distribution of who
> ends up reclaiming the physical pages in A/B is pretty arbitrary/racy.
>
> If A/B decides to do its own proactive reclaim with the sublimit, and
> ends up consuming the pressure budget assigned to proactive reclaim in
> A, there isn't much that can be done.
>
> It's also possible that proactive reclaim in A keeps A/B from hitting
> its limit in the first place.
>
> I have to say, the configuration doesn't really strike me as sensible,
> though. Limits make sense for doing fixed partitioning: A gets 4G, A/B
> gets 2G out of that. But if you do proactive reclaim on A you're
> essentially saying A as a whole is auto-sizing dynamically based on
> its memory access pattern. I'm not sure what it means to then start
> doing fixed partitions in the sublevel.
>

Think of the scenario where there is an infrastructure owner and the
large number of job owners. The aim of the infra owner is to reduce
cost by stuffing as many jobs as possible on the same machine while
job owners want consistent performance.

The job owners usually have meta jobs i.e. a set of small jobs that
run on the same machines and they manage these sub-jobs themselves.

The infra owner wants to do proactive reclaim to trim the current jobs
without impacting their performance and more importantly to have
enough memory to land new jobs (We have learned the hard way that
depending on global reclaim for memory overcommit is really bad for
isolation).

In the above scenario the configuration you mentioned might not be
sensible is really possible. This is exactly what we have in prod. You
can also get the idea why I am asking for flexibility for the cost of
proactive reclaim.


  reply	other threads:[~2020-10-08 15:56 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-09 21:57 [PATCH] memcg: introduce per-memcg reclaim interface Shakeel Butt
2020-09-10  6:36 ` SeongJae Park
2020-09-10 16:10   ` Shakeel Butt
2020-09-10 16:34     ` SeongJae Park
2020-09-21 16:30 ` Michal Hocko
2020-09-21 17:50   ` Shakeel Butt
2020-09-22 11:49     ` Michal Hocko
2020-09-22 15:54       ` Shakeel Butt
2020-09-22 16:55         ` Michal Hocko
2020-09-22 18:10           ` Shakeel Butt
2020-09-22 18:31             ` Michal Hocko
2020-09-22 18:56               ` Shakeel Butt
2020-09-22 19:08             ` Michal Hocko
2020-09-22 20:02               ` Yang Shi
2020-09-22 22:38               ` Shakeel Butt
2020-09-28 21:02 ` Johannes Weiner
2020-09-29 15:04   ` Michal Hocko
2020-09-29 21:53     ` Johannes Weiner
2020-09-30 15:45       ` Shakeel Butt
2020-10-01 14:31         ` Johannes Weiner
2020-10-06 16:55           ` Shakeel Butt
2020-10-08 14:53             ` Johannes Weiner
2020-10-08 15:55               ` Shakeel Butt [this message]
2020-10-08 21:09                 ` Johannes Weiner
2020-09-30 15:26   ` Shakeel Butt
2020-10-01 15:10     ` Johannes Weiner
2020-10-05 21:59       ` Shakeel Butt
2020-10-08 15:14         ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALvZod5-EtB0jNi9DXTmLSKrUzK2jXRhW8h6+7sqB356k0t1+g@mail.gmail.com \
    --to=shakeelb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrea.righi@canonical.com \
    --cc=cgroups@vger.kernel.org \
    --cc=gthelen@google.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mkoutny@suse.com \
    --cc=rientjes@google.com \
    --cc=sjpark@amazon.com \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).