From: Michal Hocko <mhocko@kernel.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Roman Gushchin <guro@fb.com>, Tejun Heo <tj@kernel.org>,
linux-mm@kvack.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH v2 2/3] mm: memcontrol: clean up and document effective low/min calculations
Date: Thu, 30 Jan 2020 13:54:55 +0100 [thread overview]
Message-ID: <20200130125455.GV24244@dhcp22.suse.cz> (raw)
In-Reply-To: <20191219200718.15696-3-hannes@cmpxchg.org>
On Thu 19-12-19 15:07:17, Johannes Weiner wrote:
> The effective protection of any given cgroup is a somewhat complicated
> construct that depends on the ancestor's configuration, siblings'
> configurations, as well as current memory utilization in all these
> groups. It's done this way to satisfy hierarchical delegation
> requirements while also making the configuration semantics flexible
> and expressive in complex real life scenarios.
>
> Unfortunately, all the rules and requirements are sparsely documented,
> and the code is a little too clever in merging different scenarios
> into a single min() expression. This makes it hard to reason about the
> implementation and avoid breaking semantics when making changes to it.
>
> This patch documents each semantic rule individually and splits out
> the handling of the overcommit case from the regular case.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
It took me a while to double check that the new code is indeed
equivalent but the result is easier to grasp indeed.
Acked-by: Michal Hocko <mhocko@suse.com>
> ---
> mm/memcontrol.c | 190 ++++++++++++++++++++++++++----------------------
> 1 file changed, 104 insertions(+), 86 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 874a0b00f89b..9c771c4d6339 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6204,65 +6204,55 @@ struct cgroup_subsys memory_cgrp_subsys = {
> .early_init = 0,
> };
>
> -/**
> - * mem_cgroup_protected - check if memory consumption is in the normal range
> - * @root: the top ancestor of the sub-tree being checked
> - * @memcg: the memory cgroup to check
> - *
> - * WARNING: This function is not stateless! It can only be used as part
> - * of a top-down tree iteration, not for isolated queries.
> - *
> - * Returns one of the following:
> - * MEMCG_PROT_NONE: cgroup memory is not protected
> - * MEMCG_PROT_LOW: cgroup memory is protected as long there is
> - * an unprotected supply of reclaimable memory from other cgroups.
> - * MEMCG_PROT_MIN: cgroup memory is protected
> - *
> - * @root is exclusive; it is never protected when looked at directly
> +/*
> + * This function calculates an individual cgroup's effective
> + * protection which is derived from its own memory.min/low, its
> + * parent's and siblings' settings, as well as the actual memory
> + * distribution in the tree.
> *
> - * To provide a proper hierarchical behavior, effective memory.min/low values
> - * are used. Below is the description of how effective memory.low is calculated.
> - * Effective memory.min values is calculated in the same way.
> + * The following rules apply to the effective protection values:
> *
> - * Effective memory.low is always equal or less than the original memory.low.
> - * If there is no memory.low overcommittment (which is always true for
> - * top-level memory cgroups), these two values are equal.
> - * Otherwise, it's a part of parent's effective memory.low,
> - * calculated as a cgroup's memory.low usage divided by sum of sibling's
> - * memory.low usages, where memory.low usage is the size of actually
> - * protected memory.
> + * 1. At the first level of reclaim, effective protection is equal to
> + * the declared protection in memory.min and memory.low.
> *
> - * low_usage
> - * elow = min( memory.low, parent->elow * ------------------ ),
> - * siblings_low_usage
> + * 2. To enable safe delegation of the protection configuration, at
> + * subsequent levels the effective protection is capped to the
> + * parent's effective protection.
> *
> - * low_usage = min(memory.low, memory.current)
> + * 3. To make complex and dynamic subtrees easier to configure, the
> + * user is allowed to overcommit the declared protection at a given
> + * level. If that is the case, the parent's effective protection is
> + * distributed to the children in proportion to how much protection
> + * they have declared and how much of it they are utilizing.
> *
> + * This makes distribution proportional, but also work-conserving:
> + * if one cgroup claims much more protection than it uses memory,
> + * the unused remainder is available to its siblings.
> *
> - * Such definition of the effective memory.low provides the expected
> - * hierarchical behavior: parent's memory.low value is limiting
> - * children, unprotected memory is reclaimed first and cgroups,
> - * which are not using their guarantee do not affect actual memory
> - * distribution.
> + * Consider the following example tree:
> *
> - * For example, if there are memcgs A, A/B, A/C, A/D and A/E:
> + * A A/memory.low = 2G, A/memory.current = 6G
> + * //\\
> + * BC DE B/memory.low = 3G B/memory.current = 2G
> + * C/memory.low = 1G C/memory.current = 2G
> + * D/memory.low = 0 D/memory.current = 2G
> + * E/memory.low = 10G E/memory.current = 0
> *
> - * A A/memory.low = 2G, A/memory.current = 6G
> - * //\\
> - * BC DE B/memory.low = 3G B/memory.current = 2G
> - * C/memory.low = 1G C/memory.current = 2G
> - * D/memory.low = 0 D/memory.current = 2G
> - * E/memory.low = 10G E/memory.current = 0
> + * and memory pressure is applied, the following memory
> + * distribution is expected (approximately*):
> *
> - * and the memory pressure is applied, the following memory distribution
> - * is expected (approximately):
> + * A/memory.current = 2G
> + * B/memory.current = 1.3G
> + * C/memory.current = 0.6G
> + * D/memory.current = 0
> + * E/memory.current = 0
> *
> - * A/memory.current = 2G
> + * *assuming equal allocation rate and reclaimability
> *
> - * B/memory.current = 1.3G
> - * C/memory.current = 0.6G
> - * D/memory.current = 0
> - * E/memory.current = 0
> + * 4. Conversely, when the declared protection is undercommitted at a
> + * given level, the distribution of the larger parental protection
> + * budget is NOT proportional. A cgroup's protection from a sibling
> + * is capped to its own memory.min/low setting.
> *
> * These calculations require constant tracking of the actual low usages
> * (see propagate_protected_usage()), as well as recursive calculation of
> @@ -6272,12 +6262,63 @@ struct cgroup_subsys memory_cgrp_subsys = {
> * for next usage. This part is intentionally racy, but it's ok,
> * as memory.low is a best-effort mechanism.
> */
> +static unsigned long effective_protection(unsigned long usage,
> + unsigned long setting,
> + unsigned long parent_effective,
> + unsigned long siblings_protected)
> +{
> + unsigned long protected;
> +
> + protected = min(usage, setting);
> + /*
> + * If all cgroups at this level combined claim and use more
> + * protection then what the parent affords them, distribute
> + * shares in proportion to utilization.
> + *
> + * We are using actual utilization rather than the statically
> + * claimed protection in order to be work-conserving: claimed
> + * but unused protection is available to siblings that would
> + * otherwise get a smaller chunk than what they claimed.
> + */
> + if (siblings_protected > parent_effective)
> + return protected * parent_effective / siblings_protected;
> +
> + /*
> + * Ok, utilized protection of all children is within what the
> + * parent affords them, so we know whatever this child claims
> + * and utilizes is effectively protected.
> + *
> + * If there is unprotected usage beyond this value, reclaim
> + * will apply pressure in proportion to that amount.
> + *
> + * If there is unutilized protection, the cgroup will be fully
> + * shielded from reclaim, but we do return a smaller value for
> + * protection than what the group could enjoy in theory. This
> + * is okay. With the overcommit distribution above, effective
> + * protection is always dependent on how memory is actually
> + * consumed among the siblings anyway.
> + */
> + return protected;
> +}
> +
> +/**
> + * mem_cgroup_protected - check if memory consumption is in the normal range
> + * @root: the top ancestor of the sub-tree being checked
> + * @memcg: the memory cgroup to check
> + *
> + * WARNING: This function is not stateless! It can only be used as part
> + * of a top-down tree iteration, not for isolated queries.
> + *
> + * Returns one of the following:
> + * MEMCG_PROT_NONE: cgroup memory is not protected
> + * MEMCG_PROT_LOW: cgroup memory is protected as long there is
> + * an unprotected supply of reclaimable memory from other cgroups.
> + * MEMCG_PROT_MIN: cgroup memory is protected
> + */
> enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
> struct mem_cgroup *memcg)
> {
> struct mem_cgroup *parent;
> - unsigned long emin, parent_emin;
> - unsigned long elow, parent_elow;
> unsigned long usage;
>
> if (mem_cgroup_disabled())
> @@ -6292,52 +6333,29 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
> if (!usage)
> return MEMCG_PROT_NONE;
>
> - emin = memcg->memory.min;
> - elow = memcg->memory.low;
> -
> parent = parent_mem_cgroup(memcg);
> /* No parent means a non-hierarchical mode on v1 memcg */
> if (!parent)
> return MEMCG_PROT_NONE;
>
> - if (parent == root)
> - goto exit;
> -
> - parent_emin = READ_ONCE(parent->memory.emin);
> - emin = min(emin, parent_emin);
> - if (emin && parent_emin) {
> - unsigned long min_usage, siblings_min_usage;
> -
> - min_usage = min(usage, memcg->memory.min);
> - siblings_min_usage = atomic_long_read(
> - &parent->memory.children_min_usage);
> -
> - if (min_usage && siblings_min_usage)
> - emin = min(emin, parent_emin * min_usage /
> - siblings_min_usage);
> + if (parent == root) {
> + memcg->memory.emin = memcg->memory.min;
> + memcg->memory.elow = memcg->memory.low;
> + goto out;
> }
>
> - parent_elow = READ_ONCE(parent->memory.elow);
> - elow = min(elow, parent_elow);
> - if (elow && parent_elow) {
> - unsigned long low_usage, siblings_low_usage;
> -
> - low_usage = min(usage, memcg->memory.low);
> - siblings_low_usage = atomic_long_read(
> - &parent->memory.children_low_usage);
> + memcg->memory.emin = effective_protection(usage,
> + memcg->memory.min, READ_ONCE(parent->memory.emin),
> + atomic_long_read(&parent->memory.children_min_usage));
>
> - if (low_usage && siblings_low_usage)
> - elow = min(elow, parent_elow * low_usage /
> - siblings_low_usage);
> - }
> -
> -exit:
> - memcg->memory.emin = emin;
> - memcg->memory.elow = elow;
> + memcg->memory.elow = effective_protection(usage,
> + memcg->memory.low, READ_ONCE(parent->memory.elow),
> + atomic_long_read(&parent->memory.children_low_usage));
>
> - if (usage <= emin)
> +out:
> + if (usage <= memcg->memory.emin)
> return MEMCG_PROT_MIN;
> - else if (usage <= elow)
> + else if (usage <= memcg->memory.elow)
> return MEMCG_PROT_LOW;
> else
> return MEMCG_PROT_NONE;
> --
> 2.24.1
>
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2020-01-30 12:55 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-19 20:07 [PATCH v2 0/3] mm: memcontrol: recursive memory protection Johannes Weiner
2019-12-19 20:07 ` [PATCH v2 1/3] mm: memcontrol: fix memory.low proportional distribution Johannes Weiner
2020-01-30 11:49 ` Michal Hocko
2020-02-03 21:21 ` Johannes Weiner
2020-02-03 21:38 ` Roman Gushchin
2019-12-19 20:07 ` [PATCH v2 2/3] mm: memcontrol: clean up and document effective low/min calculations Johannes Weiner
2020-01-30 12:54 ` Michal Hocko [this message]
2020-02-21 17:10 ` Michal Koutný
2020-02-25 18:40 ` Johannes Weiner
2020-02-26 16:46 ` Michal Koutný
2020-02-26 19:40 ` Johannes Weiner
2019-12-19 20:07 ` [PATCH v2 3/3] mm: memcontrol: recursive memory.low protection Johannes Weiner
2020-01-30 17:00 ` Michal Hocko
2020-02-03 21:52 ` Johannes Weiner
2020-02-10 15:21 ` Johannes Weiner
2020-02-11 16:47 ` Michal Hocko
2020-02-12 17:08 ` Johannes Weiner
2020-02-13 7:40 ` Michal Hocko
2020-02-13 13:23 ` Johannes Weiner
2020-02-13 15:46 ` Michal Hocko
2020-02-13 17:41 ` Johannes Weiner
2020-02-13 17:58 ` Johannes Weiner
2020-02-14 7:59 ` Michal Hocko
2020-02-13 13:53 ` Tejun Heo
2020-02-13 15:47 ` Michal Hocko
2020-02-13 15:52 ` Tejun Heo
2020-02-13 16:36 ` Michal Hocko
2020-02-13 16:57 ` Tejun Heo
2020-02-14 7:15 ` Michal Hocko
2020-02-14 13:57 ` Tejun Heo
2020-02-14 15:13 ` Michal Hocko
2020-02-14 15:40 ` Tejun Heo
2020-02-14 16:53 ` Johannes Weiner
2020-02-14 17:17 ` Tejun Heo
2020-02-17 8:41 ` Michal Hocko
2020-02-18 19:52 ` Johannes Weiner
2020-02-21 10:11 ` Michal Hocko
2020-02-21 15:43 ` Johannes Weiner
2020-02-25 12:20 ` Michal Hocko
2020-02-25 18:17 ` Johannes Weiner
2020-02-26 17:56 ` Michal Hocko
2020-02-21 17:12 ` Michal Koutný
2020-02-21 18:58 ` Johannes Weiner
2020-02-25 13:37 ` Michal Koutný
2020-02-25 15:03 ` Johannes Weiner
2020-02-26 13:22 ` Michal Koutný
2020-02-26 15:05 ` Johannes Weiner
2020-02-27 13:35 ` Michal Koutný
2020-02-27 15:06 ` Johannes Weiner
2019-12-19 20:22 ` [PATCH v2 0/3] mm: memcontrol: recursive memory protection Tejun Heo
2019-12-20 4:06 ` Roman Gushchin
2019-12-20 4:29 ` Chris Down
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200130125455.GV24244@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).