Re: + mm-consider-subtrees-in-memoryevents.patch added to -mm tree

* Re: + mm-consider-subtrees-in-memoryevents.patch added to -mm tree
       [not found] <20190212224542.ZW63a%akpm@linux-foundation.org>
@ 2019-02-13 12:47 ` Michal Hocko
  2019-05-16 17:56   ` Johannes Weiner
  2019-05-17 13:00   ` Shakeel Butt
  0 siblings, 2 replies; 12+ messages in thread
From: Michal Hocko @ 2019-02-13 12:47 UTC (permalink / raw)
  To: akpm
  Cc: mm-commits, tj, hannes, guro, dennis, chris, cgroups mailinglist,
	linux-mm

On Tue 12-02-19 14:45:42, Andrew Morton wrote:
[...]
> From: Chris Down <chris@chrisdown.name>
> Subject: mm, memcg: consider subtrees in memory.events
> 
> memory.stat and other files already consider subtrees in their output, and
> we should too in order to not present an inconsistent interface.
> 
> The current situation is fairly confusing, because people interacting with
> cgroups expect hierarchical behaviour in the vein of memory.stat,
> cgroup.events, and other files.  For example, this causes confusion when
> debugging reclaim events under low, as currently these always read "0" at
> non-leaf memcg nodes, which frequently causes people to misdiagnose breach
> behaviour.  The same confusion applies to other counters in this file when
> debugging issues.
> 
> Aggregation is done at write time instead of at read-time since these
> counters aren't hot (unlike memory.stat which is per-page, so it does it
> at read time), and it makes sense to bundle this with the file
> notifications.
> 
> After this patch, events are propagated up the hierarchy:
> 
>     [root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events
>     low 0
>     high 0
>     max 0
>     oom 0
>     oom_kill 0
>     [root@ktst ~]# systemd-run -p MemoryMax=1 true
>     Running as unit: run-r251162a189fb4562b9dabfdc9b0422f5.service
>     [root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events
>     low 0
>     high 0
>     max 7
>     oom 1
>     oom_kill 1
> 
> As this is a change in behaviour, this can be reverted to the old
> behaviour by mounting with the `memory_localevents' flag set.  However, we
> use the new behaviour by default as there's a lack of evidence that there
> are any current users of memory.events that would find this change
> undesirable.
> 
> Link: http://lkml.kernel.org/r/20190208224419.GA24772@chrisdown.name
> Signed-off-by: Chris Down <chris@chrisdown.name>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Roman Gushchin <guro@fb.com>
> Cc: Dennis Zhou <dennis@kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

FTR: As I've already said here [1] I can live with this change as long
as there is a larger consensus among cgroup v2 users. So let's give this
some more time before merging to see whether there is such a consensus.

[1] http://lkml.kernel.org/r/20190201102515.GK11599@dhcp22.suse.cz

> ---
> 
>  Documentation/admin-guide/cgroup-v2.rst |    9 +++++++++
>  include/linux/cgroup-defs.h             |    5 +++++
>  include/linux/memcontrol.h              |   10 ++++++++--
>  kernel/cgroup/cgroup.c                  |   16 ++++++++++++++--
>  4 files changed, 36 insertions(+), 4 deletions(-)
> 
> --- a/Documentation/admin-guide/cgroup-v2.rst~mm-consider-subtrees-in-memoryevents
> +++ a/Documentation/admin-guide/cgroup-v2.rst
> @@ -177,6 +177,15 @@ cgroup v2 currently supports the followi
>  	ignored on non-init namespace mounts.  Please refer to the
>  	Delegation section for details.
>  
> +  memory_localevents
> +
> +        Only populate memory.events with data for the current cgroup,
> +        and not any subtrees. This is legacy behaviour, the default
> +        behaviour without this option is to include subtree counts.
> +        This option is system wide and can only be set on mount or
> +        modified through remount from the init namespace. The mount
> +        option is ignored on non-init namespace mounts.
> +
>  
>  Organizing Processes and Threads
>  --------------------------------
> --- a/include/linux/cgroup-defs.h~mm-consider-subtrees-in-memoryevents
> +++ a/include/linux/cgroup-defs.h
> @@ -83,6 +83,11 @@ enum {
>  	 * Enable cpuset controller in v1 cgroup to use v2 behavior.
>  	 */
>  	CGRP_ROOT_CPUSET_V2_MODE = (1 << 4),
> +
> +	/*
> +	 * Enable legacy local memory.events.
> +	 */
> +	CGRP_ROOT_MEMORY_LOCAL_EVENTS = (1 << 5),
>  };
>  
>  /* cftype->flags */
> --- a/include/linux/memcontrol.h~mm-consider-subtrees-in-memoryevents
> +++ a/include/linux/memcontrol.h
> @@ -789,8 +789,14 @@ static inline void count_memcg_event_mm(
>  static inline void memcg_memory_event(struct mem_cgroup *memcg,
>  				      enum memcg_memory_event event)
>  {
> -	atomic_long_inc(&memcg->memory_events[event]);
> -	cgroup_file_notify(&memcg->events_file);
> +	do {
> +		atomic_long_inc(&memcg->memory_events[event]);
> +		cgroup_file_notify(&memcg->events_file);
> +
> +		if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
> +			break;
> +	} while ((memcg = parent_mem_cgroup(memcg)) &&
> +		 !mem_cgroup_is_root(memcg));
>  }
>  
>  static inline void memcg_memory_event_mm(struct mm_struct *mm,
> --- a/kernel/cgroup/cgroup.c~mm-consider-subtrees-in-memoryevents
> +++ a/kernel/cgroup/cgroup.c
> @@ -1775,11 +1775,13 @@ int cgroup_show_path(struct seq_file *sf
>  
>  enum cgroup2_param {
>  	Opt_nsdelegate,
> +	Opt_memory_localevents,
>  	nr__cgroup2_params
>  };
>  
>  static const struct fs_parameter_spec cgroup2_param_specs[] = {
> -	fsparam_flag  ("nsdelegate",		Opt_nsdelegate),
> +	fsparam_flag("nsdelegate",		Opt_nsdelegate),
> +	fsparam_flag("memory_localevents",	Opt_memory_localevents),
>  	{}
>  };
>  
> @@ -1802,6 +1804,9 @@ static int cgroup2_parse_param(struct fs
>  	case Opt_nsdelegate:
>  		ctx->flags |= CGRP_ROOT_NS_DELEGATE;
>  		return 0;
> +	case Opt_memory_localevents:
> +		ctx->flags |= CGRP_ROOT_MEMORY_LOCAL_EVENTS;
> +		return 0;
>  	}
>  	return -EINVAL;
>  }
> @@ -1813,6 +1818,11 @@ static void apply_cgroup_root_flags(unsi
>  			cgrp_dfl_root.flags |= CGRP_ROOT_NS_DELEGATE;
>  		else
>  			cgrp_dfl_root.flags &= ~CGRP_ROOT_NS_DELEGATE;
> +
> +		if (root_flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
> +			cgrp_dfl_root.flags |= CGRP_ROOT_MEMORY_LOCAL_EVENTS;
> +		else
> +			cgrp_dfl_root.flags &= ~CGRP_ROOT_MEMORY_LOCAL_EVENTS;
>  	}
>  }
>  
> @@ -1820,6 +1830,8 @@ static int cgroup_show_options(struct se
>  {
>  	if (cgrp_dfl_root.flags & CGRP_ROOT_NS_DELEGATE)
>  		seq_puts(seq, ",nsdelegate");
> +	if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
> +		seq_puts(seq, ",memory_localevents");
>  	return 0;
>  }
>  
> @@ -6207,7 +6219,7 @@ static struct kobj_attribute cgroup_dele
>  static ssize_t features_show(struct kobject *kobj, struct kobj_attribute *attr,
>  			     char *buf)
>  {
> -	return snprintf(buf, PAGE_SIZE, "nsdelegate\n");
> +	return snprintf(buf, PAGE_SIZE, "nsdelegate\nmemory_localevents\n");
>  }
>  static struct kobj_attribute cgroup_features_attr = __ATTR_RO(features);
>  
> _
> 
> Patches currently in -mm which might be from chris@chrisdown.name are
> 
> mm-create-mem_cgroup_from_seq.patch
> mm-extract-memcg-maxable-seq_file-logic-to-seq_show_memcg_tunable.patch
> mm-proportional-memorylowmin-reclaim.patch
> mm-proportional-memorylowmin-reclaim-fix.patch
> mm-memcontrol-expose-thp-events-on-a-per-memcg-basis.patch
> mm-memcontrol-expose-thp-events-on-a-per-memcg-basis-fix-2.patch
> mm-make-memoryemin-the-baseline-for-utilisation-determination.patch
> mm-rename-ambiguously-named-memorystat-counters-and-functions.patch
> mm-consider-subtrees-in-memoryevents.patch

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread