linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Roman Gushchin <guro@fb.com>, Tejun Heo <tj@kernel.org>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH v2 3/3] mm: memcontrol: recursive memory.low protection
Date: Thu, 30 Jan 2020 18:00:20 +0100	[thread overview]
Message-ID: <20200130170020.GZ24244@dhcp22.suse.cz> (raw)
In-Reply-To: <20191219200718.15696-4-hannes@cmpxchg.org>

On Thu 19-12-19 15:07:18, Johannes Weiner wrote:
> Right now, the effective protection of any given cgroup is capped by
> its own explicit memory.low setting, regardless of what the parent
> says. The reasons for this are mostly historical and ease of
> implementation: to make delegation of memory.low safe, effective
> protection is the min() of all memory.low up the tree.
> 
> Unfortunately, this limitation makes it impossible to protect an
> entire subtree from another without forcing the user to make explicit
> protection allocations all the way to the leaf cgroups - something
> that is highly undesirable in real life scenarios.
> 
> Consider memory in a data center host. At the cgroup top level, we
> have a distinction between system management software and the actual
> workload the system is executing. Both branches are further subdivided
> into individual services, job components etc.
> 
> We want to protect the workload as a whole from the system management
> software, but that doesn't mean we want to protect and prioritize
> individual workload wrt each other. Their memory demand can vary over
> time, and we'd want the VM to simply cache the hottest data within the
> workload subtree. Yet, the current memory.low limitations force us to
> allocate a fixed amount of protection to each workload component in
> order to get protection from system management software in
> general. This results in very inefficient resource distribution.

I do agree that configuring the reclaim protection is not an easy task.
Especially in a deeper reclaim hierarchy. systemd tends to create a deep
and commonly shared subtrees. So having a protected workload really
requires to be put directly into a new first level cgroup in practice
AFAICT. That is a simpler example though. Just imagine you want to
protect a certain user slice.

You seem to be facing a different problem though IIUC. You know how much
memory you want to protect and you do not have to care about the cgroup
hierarchy up but you do not know/care how to distribute that protection
among workloads running under that protection. I agree that this is a
reasonable usecase.

Those both problems however show that we have a more general
configurability problem for both leaf and intermediate nodes. They are
both a result of strong requirements imposed by delegation as you have
noted above. I am thinking didn't we just go too rigid here?

Delegation points are certainly a security boundary and they should
be treated like that but do we really need a strong containment when
the reclaim protection is under admin full control? Does the admin
really have to reconfigure a large part of the hierarchy to protect a
particular subtree?

I do not have a great answer on how to implement this unfortunately. The
best I could come up with was to add a "$inherited_protection" magic
value to distinguish from an explicit >=0 protection. What's the
difference? $inherited_protection would be a default and it would always
refer to the closest explicit protection up the hierarchy (with 0 as a
default if there is none defined).
        A
       / \
      B   C (low=10G)
         / \
        D   E (low = 5G)

A, B don't get any protection (low=0). C gets protection (10G) and
distributes the pressure to D, E when in excess. D inherits (low=10G)
and E overrides the protection to 5G.

That would help both usecases AFAICS while the delegation should be
still possible (configure the delegation point with an explicit
value). I have very likely not thought that through completely.  Does
that sound like a completely insane idea?

Or do you think that the two usecases are simply impossible to handle
at the same time?
[...]
-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2020-01-30 17:00 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-19 20:07 [PATCH v2 0/3] mm: memcontrol: recursive memory protection Johannes Weiner
2019-12-19 20:07 ` [PATCH v2 1/3] mm: memcontrol: fix memory.low proportional distribution Johannes Weiner
2020-01-30 11:49   ` Michal Hocko
2020-02-03 21:21     ` Johannes Weiner
2020-02-03 21:38       ` Roman Gushchin
2019-12-19 20:07 ` [PATCH v2 2/3] mm: memcontrol: clean up and document effective low/min calculations Johannes Weiner
2020-01-30 12:54   ` Michal Hocko
2020-02-21 17:10   ` Michal Koutný
2020-02-25 18:40     ` Johannes Weiner
2020-02-26 16:46       ` Michal Koutný
2020-02-26 19:40         ` Johannes Weiner
2019-12-19 20:07 ` [PATCH v2 3/3] mm: memcontrol: recursive memory.low protection Johannes Weiner
2020-01-30 17:00   ` Michal Hocko [this message]
2020-02-03 21:52     ` Johannes Weiner
2020-02-10 15:21       ` Johannes Weiner
2020-02-11 16:47       ` Michal Hocko
2020-02-12 17:08         ` Johannes Weiner
2020-02-13  7:40           ` Michal Hocko
2020-02-13 13:23             ` Johannes Weiner
2020-02-13 15:46               ` Michal Hocko
2020-02-13 17:41                 ` Johannes Weiner
2020-02-13 17:58                   ` Johannes Weiner
2020-02-14  7:59                     ` Michal Hocko
2020-02-13 13:53             ` Tejun Heo
2020-02-13 15:47               ` Michal Hocko
2020-02-13 15:52                 ` Tejun Heo
2020-02-13 16:36                   ` Michal Hocko
2020-02-13 16:57                     ` Tejun Heo
2020-02-14  7:15                       ` Michal Hocko
2020-02-14 13:57                         ` Tejun Heo
2020-02-14 15:13                           ` Michal Hocko
2020-02-14 15:40                             ` Tejun Heo
2020-02-14 16:53                             ` Johannes Weiner
2020-02-14 17:17                               ` Tejun Heo
2020-02-17  8:41                               ` Michal Hocko
2020-02-18 19:52                                 ` Johannes Weiner
2020-02-21 10:11                                   ` Michal Hocko
2020-02-21 15:43                                     ` Johannes Weiner
2020-02-25 12:20                                       ` Michal Hocko
2020-02-25 18:17                                         ` Johannes Weiner
2020-02-26 17:56                                           ` Michal Hocko
2020-02-21 17:12   ` Michal Koutný
2020-02-21 18:58     ` Johannes Weiner
2020-02-25 13:37       ` Michal Koutný
2020-02-25 15:03         ` Johannes Weiner
2020-02-26 13:22           ` Michal Koutný
2020-02-26 15:05             ` Johannes Weiner
2020-02-27 13:35               ` Michal Koutný
2020-02-27 15:06                 ` Johannes Weiner
2019-12-19 20:22 ` [PATCH v2 0/3] mm: memcontrol: recursive memory protection Tejun Heo
2019-12-20  4:06 ` Roman Gushchin
2019-12-20  4:29 ` Chris Down

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200130170020.GZ24244@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).