linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Roman Gushchin <guro@fb.com>, Tejun Heo <tj@kernel.org>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH v2 2/3] mm: memcontrol: clean up and document effective low/min calculations
Date: Thu, 30 Jan 2020 13:54:55 +0100	[thread overview]
Message-ID: <20200130125455.GV24244@dhcp22.suse.cz> (raw)
In-Reply-To: <20191219200718.15696-3-hannes@cmpxchg.org>

On Thu 19-12-19 15:07:17, Johannes Weiner wrote:
> The effective protection of any given cgroup is a somewhat complicated
> construct that depends on the ancestor's configuration, siblings'
> configurations, as well as current memory utilization in all these
> groups. It's done this way to satisfy hierarchical delegation
> requirements while also making the configuration semantics flexible
> and expressive in complex real life scenarios.
> 
> Unfortunately, all the rules and requirements are sparsely documented,
> and the code is a little too clever in merging different scenarios
> into a single min() expression. This makes it hard to reason about the
> implementation and avoid breaking semantics when making changes to it.
> 
> This patch documents each semantic rule individually and splits out
> the handling of the overcommit case from the regular case.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

It took me a while to double check that the new code is indeed
equivalent but the result is easier to grasp indeed.

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/memcontrol.c | 190 ++++++++++++++++++++++++++----------------------
>  1 file changed, 104 insertions(+), 86 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 874a0b00f89b..9c771c4d6339 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6204,65 +6204,55 @@ struct cgroup_subsys memory_cgrp_subsys = {
>  	.early_init = 0,
>  };
>  
> -/**
> - * mem_cgroup_protected - check if memory consumption is in the normal range
> - * @root: the top ancestor of the sub-tree being checked
> - * @memcg: the memory cgroup to check
> - *
> - * WARNING: This function is not stateless! It can only be used as part
> - *          of a top-down tree iteration, not for isolated queries.
> - *
> - * Returns one of the following:
> - *   MEMCG_PROT_NONE: cgroup memory is not protected
> - *   MEMCG_PROT_LOW: cgroup memory is protected as long there is
> - *     an unprotected supply of reclaimable memory from other cgroups.
> - *   MEMCG_PROT_MIN: cgroup memory is protected
> - *
> - * @root is exclusive; it is never protected when looked at directly
> +/*
> + * This function calculates an individual cgroup's effective
> + * protection which is derived from its own memory.min/low, its
> + * parent's and siblings' settings, as well as the actual memory
> + * distribution in the tree.
>   *
> - * To provide a proper hierarchical behavior, effective memory.min/low values
> - * are used. Below is the description of how effective memory.low is calculated.
> - * Effective memory.min values is calculated in the same way.
> + * The following rules apply to the effective protection values:
>   *
> - * Effective memory.low is always equal or less than the original memory.low.
> - * If there is no memory.low overcommittment (which is always true for
> - * top-level memory cgroups), these two values are equal.
> - * Otherwise, it's a part of parent's effective memory.low,
> - * calculated as a cgroup's memory.low usage divided by sum of sibling's
> - * memory.low usages, where memory.low usage is the size of actually
> - * protected memory.
> + * 1. At the first level of reclaim, effective protection is equal to
> + *    the declared protection in memory.min and memory.low.
>   *
> - *                                             low_usage
> - * elow = min( memory.low, parent->elow * ------------------ ),
> - *                                        siblings_low_usage
> + * 2. To enable safe delegation of the protection configuration, at
> + *    subsequent levels the effective protection is capped to the
> + *    parent's effective protection.
>   *
> - * low_usage = min(memory.low, memory.current)
> + * 3. To make complex and dynamic subtrees easier to configure, the
> + *    user is allowed to overcommit the declared protection at a given
> + *    level. If that is the case, the parent's effective protection is
> + *    distributed to the children in proportion to how much protection
> + *    they have declared and how much of it they are utilizing.
>   *
> + *    This makes distribution proportional, but also work-conserving:
> + *    if one cgroup claims much more protection than it uses memory,
> + *    the unused remainder is available to its siblings.
>   *
> - * Such definition of the effective memory.low provides the expected
> - * hierarchical behavior: parent's memory.low value is limiting
> - * children, unprotected memory is reclaimed first and cgroups,
> - * which are not using their guarantee do not affect actual memory
> - * distribution.
> + *    Consider the following example tree:
>   *
> - * For example, if there are memcgs A, A/B, A/C, A/D and A/E:
> + *        A      A/memory.low = 2G, A/memory.current = 6G
> + *       //\\
> + *      BC  DE   B/memory.low = 3G  B/memory.current = 2G
> + *               C/memory.low = 1G  C/memory.current = 2G
> + *               D/memory.low = 0   D/memory.current = 2G
> + *               E/memory.low = 10G E/memory.current = 0
>   *
> - *     A      A/memory.low = 2G, A/memory.current = 6G
> - *    //\\
> - *   BC  DE   B/memory.low = 3G  B/memory.current = 2G
> - *            C/memory.low = 1G  C/memory.current = 2G
> - *            D/memory.low = 0   D/memory.current = 2G
> - *            E/memory.low = 10G E/memory.current = 0
> + *    and memory pressure is applied, the following memory
> + *    distribution is expected (approximately*):
>   *
> - * and the memory pressure is applied, the following memory distribution
> - * is expected (approximately):
> + *      A/memory.current = 2G
> + *      B/memory.current = 1.3G
> + *      C/memory.current = 0.6G
> + *      D/memory.current = 0
> + *      E/memory.current = 0
>   *
> - *     A/memory.current = 2G
> + *    *assuming equal allocation rate and reclaimability
>   *
> - *     B/memory.current = 1.3G
> - *     C/memory.current = 0.6G
> - *     D/memory.current = 0
> - *     E/memory.current = 0
> + * 4. Conversely, when the declared protection is undercommitted at a
> + *    given level, the distribution of the larger parental protection
> + *    budget is NOT proportional. A cgroup's protection from a sibling
> + *    is capped to its own memory.min/low setting.
>   *
>   * These calculations require constant tracking of the actual low usages
>   * (see propagate_protected_usage()), as well as recursive calculation of
> @@ -6272,12 +6262,63 @@ struct cgroup_subsys memory_cgrp_subsys = {
>   * for next usage. This part is intentionally racy, but it's ok,
>   * as memory.low is a best-effort mechanism.
>   */
> +static unsigned long effective_protection(unsigned long usage,
> +					  unsigned long setting,
> +					  unsigned long parent_effective,
> +					  unsigned long siblings_protected)
> +{
> +	unsigned long protected;
> +
> +	protected = min(usage, setting);
> +	/*
> +	 * If all cgroups at this level combined claim and use more
> +	 * protection then what the parent affords them, distribute
> +	 * shares in proportion to utilization.
> +	 *
> +	 * We are using actual utilization rather than the statically
> +	 * claimed protection in order to be work-conserving: claimed
> +	 * but unused protection is available to siblings that would
> +	 * otherwise get a smaller chunk than what they claimed.
> +	 */
> +	if (siblings_protected > parent_effective)
> +		return protected * parent_effective / siblings_protected;
> +
> +	/*
> +	 * Ok, utilized protection of all children is within what the
> +	 * parent affords them, so we know whatever this child claims
> +	 * and utilizes is effectively protected.
> +	 *
> +	 * If there is unprotected usage beyond this value, reclaim
> +	 * will apply pressure in proportion to that amount.
> +	 *
> +	 * If there is unutilized protection, the cgroup will be fully
> +	 * shielded from reclaim, but we do return a smaller value for
> +	 * protection than what the group could enjoy in theory. This
> +	 * is okay. With the overcommit distribution above, effective
> +	 * protection is always dependent on how memory is actually
> +	 * consumed among the siblings anyway.
> +	 */
> +	return protected;
> +}
> +
> +/**
> + * mem_cgroup_protected - check if memory consumption is in the normal range
> + * @root: the top ancestor of the sub-tree being checked
> + * @memcg: the memory cgroup to check
> + *
> + * WARNING: This function is not stateless! It can only be used as part
> + *          of a top-down tree iteration, not for isolated queries.
> + *
> + * Returns one of the following:
> + *   MEMCG_PROT_NONE: cgroup memory is not protected
> + *   MEMCG_PROT_LOW: cgroup memory is protected as long there is
> + *     an unprotected supply of reclaimable memory from other cgroups.
> + *   MEMCG_PROT_MIN: cgroup memory is protected
> + */
>  enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
>  						struct mem_cgroup *memcg)
>  {
>  	struct mem_cgroup *parent;
> -	unsigned long emin, parent_emin;
> -	unsigned long elow, parent_elow;
>  	unsigned long usage;
>  
>  	if (mem_cgroup_disabled())
> @@ -6292,52 +6333,29 @@ enum mem_cgroup_protection mem_cgroup_protected(struct mem_cgroup *root,
>  	if (!usage)
>  		return MEMCG_PROT_NONE;
>  
> -	emin = memcg->memory.min;
> -	elow = memcg->memory.low;
> -
>  	parent = parent_mem_cgroup(memcg);
>  	/* No parent means a non-hierarchical mode on v1 memcg */
>  	if (!parent)
>  		return MEMCG_PROT_NONE;
>  
> -	if (parent == root)
> -		goto exit;
> -
> -	parent_emin = READ_ONCE(parent->memory.emin);
> -	emin = min(emin, parent_emin);
> -	if (emin && parent_emin) {
> -		unsigned long min_usage, siblings_min_usage;
> -
> -		min_usage = min(usage, memcg->memory.min);
> -		siblings_min_usage = atomic_long_read(
> -			&parent->memory.children_min_usage);
> -
> -		if (min_usage && siblings_min_usage)
> -			emin = min(emin, parent_emin * min_usage /
> -				   siblings_min_usage);
> +	if (parent == root) {
> +		memcg->memory.emin = memcg->memory.min;
> +		memcg->memory.elow = memcg->memory.low;
> +		goto out;
>  	}
>  
> -	parent_elow = READ_ONCE(parent->memory.elow);
> -	elow = min(elow, parent_elow);
> -	if (elow && parent_elow) {
> -		unsigned long low_usage, siblings_low_usage;
> -
> -		low_usage = min(usage, memcg->memory.low);
> -		siblings_low_usage = atomic_long_read(
> -			&parent->memory.children_low_usage);
> +	memcg->memory.emin = effective_protection(usage,
> +			memcg->memory.min, READ_ONCE(parent->memory.emin),
> +			atomic_long_read(&parent->memory.children_min_usage));
>  
> -		if (low_usage && siblings_low_usage)
> -			elow = min(elow, parent_elow * low_usage /
> -				   siblings_low_usage);
> -	}
> -
> -exit:
> -	memcg->memory.emin = emin;
> -	memcg->memory.elow = elow;
> +	memcg->memory.elow = effective_protection(usage,
> +			memcg->memory.low, READ_ONCE(parent->memory.elow),
> +			atomic_long_read(&parent->memory.children_low_usage));
>  
> -	if (usage <= emin)
> +out:
> +	if (usage <= memcg->memory.emin)
>  		return MEMCG_PROT_MIN;
> -	else if (usage <= elow)
> +	else if (usage <= memcg->memory.elow)
>  		return MEMCG_PROT_LOW;
>  	else
>  		return MEMCG_PROT_NONE;
> -- 
> 2.24.1
> 

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2020-01-30 12:55 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-19 20:07 [PATCH v2 0/3] mm: memcontrol: recursive memory protection Johannes Weiner
2019-12-19 20:07 ` [PATCH v2 1/3] mm: memcontrol: fix memory.low proportional distribution Johannes Weiner
2020-01-30 11:49   ` Michal Hocko
2020-02-03 21:21     ` Johannes Weiner
2020-02-03 21:38       ` Roman Gushchin
2019-12-19 20:07 ` [PATCH v2 2/3] mm: memcontrol: clean up and document effective low/min calculations Johannes Weiner
2020-01-30 12:54   ` Michal Hocko [this message]
2020-02-21 17:10   ` Michal Koutný
2020-02-25 18:40     ` Johannes Weiner
2020-02-26 16:46       ` Michal Koutný
2020-02-26 19:40         ` Johannes Weiner
2019-12-19 20:07 ` [PATCH v2 3/3] mm: memcontrol: recursive memory.low protection Johannes Weiner
2020-01-30 17:00   ` Michal Hocko
2020-02-03 21:52     ` Johannes Weiner
2020-02-10 15:21       ` Johannes Weiner
2020-02-11 16:47       ` Michal Hocko
2020-02-12 17:08         ` Johannes Weiner
2020-02-13  7:40           ` Michal Hocko
2020-02-13 13:23             ` Johannes Weiner
2020-02-13 15:46               ` Michal Hocko
2020-02-13 17:41                 ` Johannes Weiner
2020-02-13 17:58                   ` Johannes Weiner
2020-02-14  7:59                     ` Michal Hocko
2020-02-13 13:53             ` Tejun Heo
2020-02-13 15:47               ` Michal Hocko
2020-02-13 15:52                 ` Tejun Heo
2020-02-13 16:36                   ` Michal Hocko
2020-02-13 16:57                     ` Tejun Heo
2020-02-14  7:15                       ` Michal Hocko
2020-02-14 13:57                         ` Tejun Heo
2020-02-14 15:13                           ` Michal Hocko
2020-02-14 15:40                             ` Tejun Heo
2020-02-14 16:53                             ` Johannes Weiner
2020-02-14 17:17                               ` Tejun Heo
2020-02-17  8:41                               ` Michal Hocko
2020-02-18 19:52                                 ` Johannes Weiner
2020-02-21 10:11                                   ` Michal Hocko
2020-02-21 15:43                                     ` Johannes Weiner
2020-02-25 12:20                                       ` Michal Hocko
2020-02-25 18:17                                         ` Johannes Weiner
2020-02-26 17:56                                           ` Michal Hocko
2020-02-21 17:12   ` Michal Koutný
2020-02-21 18:58     ` Johannes Weiner
2020-02-25 13:37       ` Michal Koutný
2020-02-25 15:03         ` Johannes Weiner
2020-02-26 13:22           ` Michal Koutný
2020-02-26 15:05             ` Johannes Weiner
2020-02-27 13:35               ` Michal Koutný
2020-02-27 15:06                 ` Johannes Weiner
2019-12-19 20:22 ` [PATCH v2 0/3] mm: memcontrol: recursive memory protection Tejun Heo
2019-12-20  4:06 ` Roman Gushchin
2019-12-20  4:29 ` Chris Down

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200130125455.GV24244@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).