All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] mm, memcg: fix inconsistent oom event behavior
@ 2020-04-14  1:59 Yafang Shao
  2020-04-14 15:22 ` Michal Hocko
  0 siblings, 1 reply; 11+ messages in thread
From: Yafang Shao @ 2020-04-14  1:59 UTC (permalink / raw)
  To: shakeelb, chris, hannes, mhocko, vdavydov.dev, akpm
  Cc: linux-mm, Yafang Shao, stable

A recent commit 9852ae3fe529 ("mm, memcg: consider subtrees in
memory.events") changes the behavior of memcg events, which will
consider subtrees in memory.events. But oom_kill event is a special one
as it is used in both cgroup1 and cgroup2. In cgroup1, it is displayed
in memory.oom_control. The file memory.oom_control is in both root memcg
and non root memcg, that is different with memory.event as it only in
non-root memcg. That commit is okay for cgroup2, but it is not okay for
cgroup1 as it will cause inconsistent behavior between root memcg and
non-root memcg.

Here's an example on why this behavior is inconsistent in cgroup1.
     root memcg
     /
  memcg foo
   /
memcg bar

Suppose there's an oom_kill in memcg bar, then the oon_kill will be

     root memcg : memory.oom_control(oom_kill)  0
     /
  memcg foo : memory.oom_control(oom_kill)  1
   /
memcg bar : memory.oom_control(oom_kill)  1

For the non-root memcg, its memory.oom_control(oom_kill) includes its
descendants' oom_kill, but for root memcg, it doesn't include its
descendants' oom_kill. That means, memory.oom_control(oom_kill) has
different meanings in different memcgs. That is inconsistent. Then the user
has to know whether the memcg is root or not.

If we can't fully support it in cgroup1, for example by adding
memory.events.local into cgroup1 as well, then let's don't touch
its original behavior. So let's recover the original behavior for cgroup1.

Fixes: 9852ae3fe529 ("mm, memcg: consider subtrees in memory.events")
Cc: Chris Down <chris@chrisdown.name>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: stable@vger.kernel.org
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/memcontrol.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 8c340e6b347f..a0ae080a67d1 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -798,7 +798,8 @@ static inline void memcg_memory_event(struct mem_cgroup *memcg,
 		atomic_long_inc(&memcg->memory_events[event]);
 		cgroup_file_notify(&memcg->events_file);
 
-		if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
+		if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS ||
+		    !cgroup_subsys_on_dfl(memory_cgrp_subsys))
 			break;
 	} while ((memcg = parent_mem_cgroup(memcg)) &&
 		 !mem_cgroup_is_root(memcg));
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] mm, memcg: fix inconsistent oom event behavior
  2020-04-14  1:59 [PATCH v2] mm, memcg: fix inconsistent oom event behavior Yafang Shao
@ 2020-04-14 15:22 ` Michal Hocko
  2020-04-14 15:57     ` Yafang Shao
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2020-04-14 15:22 UTC (permalink / raw)
  To: Yafang Shao; +Cc: shakeelb, chris, hannes, vdavydov.dev, akpm, linux-mm, stable

On Mon 13-04-20 21:59:52, Yafang Shao wrote:
> A recent commit 9852ae3fe529 ("mm, memcg: consider subtrees in
> memory.events") changes the behavior of memcg events, which will
> consider subtrees in memory.events. But oom_kill event is a special one
> as it is used in both cgroup1 and cgroup2. In cgroup1, it is displayed
> in memory.oom_control. The file memory.oom_control is in both root memcg
> and non root memcg, that is different with memory.event as it only in
> non-root memcg. That commit is okay for cgroup2, but it is not okay for
> cgroup1 as it will cause inconsistent behavior between root memcg and
> non-root memcg.
> 
> Here's an example on why this behavior is inconsistent in cgroup1.
>      root memcg
>      /
>   memcg foo
>    /
> memcg bar
> 
> Suppose there's an oom_kill in memcg bar, then the oon_kill will be
> 
>      root memcg : memory.oom_control(oom_kill)  0
>      /
>   memcg foo : memory.oom_control(oom_kill)  1
>    /
> memcg bar : memory.oom_control(oom_kill)  1
> 
> For the non-root memcg, its memory.oom_control(oom_kill) includes its
> descendants' oom_kill, but for root memcg, it doesn't include its
> descendants' oom_kill. That means, memory.oom_control(oom_kill) has
> different meanings in different memcgs. That is inconsistent. Then the user
> has to know whether the memcg is root or not.
> 
> If we can't fully support it in cgroup1, for example by adding
> memory.events.local into cgroup1 as well, then let's don't touch
> its original behavior. So let's recover the original behavior for cgroup1.

Wthe localevents was mostly cgroup v2 feature. I do not think there was
an intention to have side effects on the legacy hierarchy. I thought
this would be the case but it is not apparently. Would it make more
sense to have CGRP_ROOT_MEMORY_LOCAL_EVENTS for legacy hierarchy by
default rather than special casing it somewhere quite deep in the code?

> Fixes: 9852ae3fe529 ("mm, memcg: consider subtrees in memory.events")
> Cc: Chris Down <chris@chrisdown.name>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: stable@vger.kernel.org
> Reviewed-by: Shakeel Butt <shakeelb@google.com>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  include/linux/memcontrol.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 8c340e6b347f..a0ae080a67d1 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -798,7 +798,8 @@ static inline void memcg_memory_event(struct mem_cgroup *memcg,
>  		atomic_long_inc(&memcg->memory_events[event]);
>  		cgroup_file_notify(&memcg->events_file);
>  
> -		if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
> +		if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS ||
> +		    !cgroup_subsys_on_dfl(memory_cgrp_subsys))
>  			break;
>  	} while ((memcg = parent_mem_cgroup(memcg)) &&
>  		 !mem_cgroup_is_root(memcg));
> -- 
> 2.18.2

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] mm, memcg: fix inconsistent oom event behavior
  2020-04-14 15:22 ` Michal Hocko
@ 2020-04-14 15:57     ` Yafang Shao
  0 siblings, 0 replies; 11+ messages in thread
From: Yafang Shao @ 2020-04-14 15:57 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Shakeel Butt, Chris Down, Johannes Weiner, Vladimir Davydov,
	Andrew Morton, Linux MM, stable

On Tue, Apr 14, 2020 at 11:23 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Mon 13-04-20 21:59:52, Yafang Shao wrote:
> > A recent commit 9852ae3fe529 ("mm, memcg: consider subtrees in
> > memory.events") changes the behavior of memcg events, which will
> > consider subtrees in memory.events. But oom_kill event is a special one
> > as it is used in both cgroup1 and cgroup2. In cgroup1, it is displayed
> > in memory.oom_control. The file memory.oom_control is in both root memcg
> > and non root memcg, that is different with memory.event as it only in
> > non-root memcg. That commit is okay for cgroup2, but it is not okay for
> > cgroup1 as it will cause inconsistent behavior between root memcg and
> > non-root memcg.
> >
> > Here's an example on why this behavior is inconsistent in cgroup1.
> >      root memcg
> >      /
> >   memcg foo
> >    /
> > memcg bar
> >
> > Suppose there's an oom_kill in memcg bar, then the oon_kill will be
> >
> >      root memcg : memory.oom_control(oom_kill)  0
> >      /
> >   memcg foo : memory.oom_control(oom_kill)  1
> >    /
> > memcg bar : memory.oom_control(oom_kill)  1
> >
> > For the non-root memcg, its memory.oom_control(oom_kill) includes its
> > descendants' oom_kill, but for root memcg, it doesn't include its
> > descendants' oom_kill. That means, memory.oom_control(oom_kill) has
> > different meanings in different memcgs. That is inconsistent. Then the user
> > has to know whether the memcg is root or not.
> >
> > If we can't fully support it in cgroup1, for example by adding
> > memory.events.local into cgroup1 as well, then let's don't touch
> > its original behavior. So let's recover the original behavior for cgroup1.
>
> Wthe localevents was mostly cgroup v2 feature. I do not think there was
> an intention to have side effects on the legacy hierarchy. I thought
> this would be the case but it is not apparently. Would it make more
> sense to have CGRP_ROOT_MEMORY_LOCAL_EVENTS for legacy hierarchy by
> default rather than special casing it somewhere quite deep in the code?
>

I had thought about setting CGRP_ROOT_MEMORY_LOCAL_EVENTS by defualt
for cgroup1, but I was not sure whether we should  also expose
memory_localevents in cgroup1_show_options().

> > Fixes: 9852ae3fe529 ("mm, memcg: consider subtrees in memory.events")
> > Cc: Chris Down <chris@chrisdown.name>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: stable@vger.kernel.org
> > Reviewed-by: Shakeel Butt <shakeelb@google.com>
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  include/linux/memcontrol.h | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index 8c340e6b347f..a0ae080a67d1 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -798,7 +798,8 @@ static inline void memcg_memory_event(struct mem_cgroup *memcg,
> >               atomic_long_inc(&memcg->memory_events[event]);
> >               cgroup_file_notify(&memcg->events_file);
> >
> > -             if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
> > +             if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS ||
> > +                 !cgroup_subsys_on_dfl(memory_cgrp_subsys))
> >                       break;
> >       } while ((memcg = parent_mem_cgroup(memcg)) &&
> >                !mem_cgroup_is_root(memcg));
> > --
> > 2.18.2
>
> --
> Michal Hocko
> SUSE Labs



Thanks
Yafang

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] mm, memcg: fix inconsistent oom event behavior
@ 2020-04-14 15:57     ` Yafang Shao
  0 siblings, 0 replies; 11+ messages in thread
From: Yafang Shao @ 2020-04-14 15:57 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Shakeel Butt, Chris Down, Johannes Weiner, Vladimir Davydov,
	Andrew Morton, Linux MM, stable

On Tue, Apr 14, 2020 at 11:23 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Mon 13-04-20 21:59:52, Yafang Shao wrote:
> > A recent commit 9852ae3fe529 ("mm, memcg: consider subtrees in
> > memory.events") changes the behavior of memcg events, which will
> > consider subtrees in memory.events. But oom_kill event is a special one
> > as it is used in both cgroup1 and cgroup2. In cgroup1, it is displayed
> > in memory.oom_control. The file memory.oom_control is in both root memcg
> > and non root memcg, that is different with memory.event as it only in
> > non-root memcg. That commit is okay for cgroup2, but it is not okay for
> > cgroup1 as it will cause inconsistent behavior between root memcg and
> > non-root memcg.
> >
> > Here's an example on why this behavior is inconsistent in cgroup1.
> >      root memcg
> >      /
> >   memcg foo
> >    /
> > memcg bar
> >
> > Suppose there's an oom_kill in memcg bar, then the oon_kill will be
> >
> >      root memcg : memory.oom_control(oom_kill)  0
> >      /
> >   memcg foo : memory.oom_control(oom_kill)  1
> >    /
> > memcg bar : memory.oom_control(oom_kill)  1
> >
> > For the non-root memcg, its memory.oom_control(oom_kill) includes its
> > descendants' oom_kill, but for root memcg, it doesn't include its
> > descendants' oom_kill. That means, memory.oom_control(oom_kill) has
> > different meanings in different memcgs. That is inconsistent. Then the user
> > has to know whether the memcg is root or not.
> >
> > If we can't fully support it in cgroup1, for example by adding
> > memory.events.local into cgroup1 as well, then let's don't touch
> > its original behavior. So let's recover the original behavior for cgroup1.
>
> Wthe localevents was mostly cgroup v2 feature. I do not think there was
> an intention to have side effects on the legacy hierarchy. I thought
> this would be the case but it is not apparently. Would it make more
> sense to have CGRP_ROOT_MEMORY_LOCAL_EVENTS for legacy hierarchy by
> default rather than special casing it somewhere quite deep in the code?
>

I had thought about setting CGRP_ROOT_MEMORY_LOCAL_EVENTS by defualt
for cgroup1, but I was not sure whether we should  also expose
memory_localevents in cgroup1_show_options().

> > Fixes: 9852ae3fe529 ("mm, memcg: consider subtrees in memory.events")
> > Cc: Chris Down <chris@chrisdown.name>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: stable@vger.kernel.org
> > Reviewed-by: Shakeel Butt <shakeelb@google.com>
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  include/linux/memcontrol.h | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index 8c340e6b347f..a0ae080a67d1 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -798,7 +798,8 @@ static inline void memcg_memory_event(struct mem_cgroup *memcg,
> >               atomic_long_inc(&memcg->memory_events[event]);
> >               cgroup_file_notify(&memcg->events_file);
> >
> > -             if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
> > +             if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS ||
> > +                 !cgroup_subsys_on_dfl(memory_cgrp_subsys))
> >                       break;
> >       } while ((memcg = parent_mem_cgroup(memcg)) &&
> >                !mem_cgroup_is_root(memcg));
> > --
> > 2.18.2
>
> --
> Michal Hocko
> SUSE Labs



Thanks
Yafang


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] mm, memcg: fix inconsistent oom event behavior
  2020-04-22 12:54 ` Johannes Weiner
  2020-04-22 12:58   ` Yafang Shao
@ 2020-04-22 13:15   ` Michal Hocko
  1 sibling, 0 replies; 11+ messages in thread
From: Michal Hocko @ 2020-04-22 13:15 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Yafang Shao, akpm, linux-mm, Chris Down, Shakeel Butt

On Wed 22-04-20 08:54:26, Johannes Weiner wrote:
> On Wed, Apr 22, 2020 at 07:06:43AM -0400, Yafang Shao wrote:
> > A recent commit 9852ae3fe529 ("mm, memcg: consider subtrees in
> > memory.events") changes the behavior of memcg events, which will
> > consider subtrees in memory.events. But oom_kill event is a special one
> > as it is used in both cgroup1 and cgroup2. In cgroup1, it is displayed
> > in memory.oom_control. The file memory.oom_control is in both root memcg
> > and non root memcg, that is different with memory.event as it only in
> > non-root memcg. That commit is okay for cgroup2, but it is not okay for
> > cgroup1 as it will cause inconsistent behavior between root memcg and
> > non-root memcg.
> > 
> > Here's an example on why this behavior is inconsistent in cgroup1.
> >      root memcg
> >      /
> >   memcg foo
> >    /
> > memcg bar
> > 
> > Suppose there's an oom_kill in memcg bar, then the oon_kill will be
> > 
> >      root memcg : memory.oom_control(oom_kill)  0
> >      /
> >   memcg foo : memory.oom_control(oom_kill)  1
> >    /
> > memcg bar : memory.oom_control(oom_kill)  1
> > 
> > For the non-root memcg, its memory.oom_control(oom_kill) includes its
> > descendants' oom_kill, but for root memcg, it doesn't include its
> > descendants' oom_kill. That means, memory.oom_control(oom_kill) has
> > different meanings in different memcgs. That is inconsistent. Then the user
> > has to know whether the memcg is root or not.
> > 
> > If we can't fully support it in cgroup1, for example by adding
> > memory.events.local into cgroup1 as well, then let's don't touch
> > its original behavior.
> > 
> > Setting CGRP_ROOT_MEMORY_LOCAL_EVENTS for legacy hierarchy by
> > default rather than special casing it somewhere quite deep in the code
> > would be better, per discussion with Michal.
> > 
> > Fixes: 9852ae3fe529 ("mm, memcg: consider subtrees in memory.events")
> > Cc: Chris Down <chris@chrisdown.name>
> > Cc: Shakeel Butt <shakeelb@google.com>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  mm/memcontrol.c | 14 ++++++++++++--
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 5beea03dd58a..0f7381bddcee 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -5940,10 +5940,20 @@ static void mem_cgroup_bind(struct cgroup_subsys_state *root_css)
> >  	 * guarantees that @root doesn't have any children, so turning it
> >  	 * on for the root memcg is enough.
> >  	 */
> > -	if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
> > +	if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> >  		root_mem_cgroup->use_hierarchy = true;
> > -	else
> > +	} else {
> >  		root_mem_cgroup->use_hierarchy = false;
> > +		/*
> > +		 * Set CGRP_ROOT_MEMORY_LOCAL_EVENTS for legacy hierarchy
> > +		 * by default to avoid inconsistent oom_kill behavior
> > +		 * between root memcg and non-root memcg.
> > +		 * Regarding default hierarchy, as this flag will be set
> > +		 * or cleared later, we don't need to process it in this
> > +		 * function.
> > +		 */
> > +		cgrp_dfl_root.flags |= CGRP_ROOT_MEMORY_LOCAL_EVENTS;
> 
> That will cause problems for people trying to remount. From
> cgroup1_reconfigure():
> 
> 	/* Don't allow flags or name to change at remount */
> 	if ((ctx->flags ^ root->flags) ||
> 	    (ctx->name && strcmp(ctx->name, root->name))) {
> 		errorfc(fc, "option or name mismatch, new: 0x%x \"%s\", old: 0x%x \"%s\"",
> 		       ctx->flags, ctx->name ?: "", root->flags, root->name);
> 		ret = -EINVAL;
> 		goto out_unlock;
> 	}

OK, I was not aware of this restriction. Under these circumstances
special casing in memcg_memory_event is the right approach.

> These flags belong to the user, they're read-only to the cgroup
> implementation. Let's not mess with them from a controller.
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 4b868e5a687f..e831a90b5506 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -773,6 +773,8 @@ static inline void memcg_memory_event(struct mem_cgroup *memcg,
>  		atomic_long_inc(&memcg->memory_events[event]);
>  		cgroup_file_notify(&memcg->events_file);
>  
> +		if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
> +			break;
>  		if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
>  			break;
>  	} while ((memcg = parent_mem_cgroup(memcg)) &&

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] mm, memcg: fix inconsistent oom event behavior
  2020-04-22 13:02     ` Chris Down
@ 2020-04-22 13:15       ` Yafang Shao
  0 siblings, 0 replies; 11+ messages in thread
From: Yafang Shao @ 2020-04-22 13:15 UTC (permalink / raw)
  To: Chris Down
  Cc: Johannes Weiner, Andrew Morton, Michal Hocko, Linux MM, Shakeel Butt

On Wed, Apr 22, 2020 at 9:02 PM Chris Down <chris@chrisdown.name> wrote:
>
> Yafang Shao writes:
> >That is what I did in the previous version, see also
> >https://lore.kernel.org/linux-mm/20200414015952.3590-1-laoar.shao@gmail.com/
>
> Your v1 patch was significantly less complicated and self-contained, and I
> would ack it (whereas I wouldn't ack this because it really complicates matters
> for localevents). My only questions were around quantifying the issue in the
> changelog :-)

The user tools parsing the memory.oom_control will be affected. I
thought I have explained it clearly in the changelog :)

-- 
Thanks
Yafang


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] mm, memcg: fix inconsistent oom event behavior
  2020-04-22 12:58   ` Yafang Shao
@ 2020-04-22 13:02     ` Chris Down
  2020-04-22 13:15       ` Yafang Shao
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Down @ 2020-04-22 13:02 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Johannes Weiner, Andrew Morton, Michal Hocko, Linux MM, Shakeel Butt

Yafang Shao writes:
>That is what I did in the previous version, see also
>https://lore.kernel.org/linux-mm/20200414015952.3590-1-laoar.shao@gmail.com/

Your v1 patch was significantly less complicated and self-contained, and I 
would ack it (whereas I wouldn't ack this because it really complicates matters 
for localevents). My only questions were around quantifying the issue in the 
changelog :-)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] mm, memcg: fix inconsistent oom event behavior
  2020-04-22 12:54 ` Johannes Weiner
@ 2020-04-22 12:58   ` Yafang Shao
  2020-04-22 13:02     ` Chris Down
  2020-04-22 13:15   ` Michal Hocko
  1 sibling, 1 reply; 11+ messages in thread
From: Yafang Shao @ 2020-04-22 12:58 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Michal Hocko, Linux MM, Chris Down, Shakeel Butt

On Wed, Apr 22, 2020 at 8:54 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Wed, Apr 22, 2020 at 07:06:43AM -0400, Yafang Shao wrote:
> > A recent commit 9852ae3fe529 ("mm, memcg: consider subtrees in
> > memory.events") changes the behavior of memcg events, which will
> > consider subtrees in memory.events. But oom_kill event is a special one
> > as it is used in both cgroup1 and cgroup2. In cgroup1, it is displayed
> > in memory.oom_control. The file memory.oom_control is in both root memcg
> > and non root memcg, that is different with memory.event as it only in
> > non-root memcg. That commit is okay for cgroup2, but it is not okay for
> > cgroup1 as it will cause inconsistent behavior between root memcg and
> > non-root memcg.
> >
> > Here's an example on why this behavior is inconsistent in cgroup1.
> >      root memcg
> >      /
> >   memcg foo
> >    /
> > memcg bar
> >
> > Suppose there's an oom_kill in memcg bar, then the oon_kill will be
> >
> >      root memcg : memory.oom_control(oom_kill)  0
> >      /
> >   memcg foo : memory.oom_control(oom_kill)  1
> >    /
> > memcg bar : memory.oom_control(oom_kill)  1
> >
> > For the non-root memcg, its memory.oom_control(oom_kill) includes its
> > descendants' oom_kill, but for root memcg, it doesn't include its
> > descendants' oom_kill. That means, memory.oom_control(oom_kill) has
> > different meanings in different memcgs. That is inconsistent. Then the user
> > has to know whether the memcg is root or not.
> >
> > If we can't fully support it in cgroup1, for example by adding
> > memory.events.local into cgroup1 as well, then let's don't touch
> > its original behavior.
> >
> > Setting CGRP_ROOT_MEMORY_LOCAL_EVENTS for legacy hierarchy by
> > default rather than special casing it somewhere quite deep in the code
> > would be better, per discussion with Michal.
> >
> > Fixes: 9852ae3fe529 ("mm, memcg: consider subtrees in memory.events")
> > Cc: Chris Down <chris@chrisdown.name>
> > Cc: Shakeel Butt <shakeelb@google.com>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  mm/memcontrol.c | 14 ++++++++++++--
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 5beea03dd58a..0f7381bddcee 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -5940,10 +5940,20 @@ static void mem_cgroup_bind(struct cgroup_subsys_state *root_css)
> >        * guarantees that @root doesn't have any children, so turning it
> >        * on for the root memcg is enough.
> >        */
> > -     if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
> > +     if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> >               root_mem_cgroup->use_hierarchy = true;
> > -     else
> > +     } else {
> >               root_mem_cgroup->use_hierarchy = false;
> > +             /*
> > +              * Set CGRP_ROOT_MEMORY_LOCAL_EVENTS for legacy hierarchy
> > +              * by default to avoid inconsistent oom_kill behavior
> > +              * between root memcg and non-root memcg.
> > +              * Regarding default hierarchy, as this flag will be set
> > +              * or cleared later, we don't need to process it in this
> > +              * function.
> > +              */
> > +             cgrp_dfl_root.flags |= CGRP_ROOT_MEMORY_LOCAL_EVENTS;
>
> That will cause problems for people trying to remount. From
> cgroup1_reconfigure():
>
>         /* Don't allow flags or name to change at remount */
>         if ((ctx->flags ^ root->flags) ||
>             (ctx->name && strcmp(ctx->name, root->name))) {
>                 errorfc(fc, "option or name mismatch, new: 0x%x \"%s\", old: 0x%x \"%s\"",
>                        ctx->flags, ctx->name ?: "", root->flags, root->name);
>                 ret = -EINVAL;
>                 goto out_unlock;
>         }
>
> These flags belong to the user, they're read-only to the cgroup
> implementation. Let's not mess with them from a controller.
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 4b868e5a687f..e831a90b5506 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -773,6 +773,8 @@ static inline void memcg_memory_event(struct mem_cgroup *memcg,
>                 atomic_long_inc(&memcg->memory_events[event]);
>                 cgroup_file_notify(&memcg->events_file);
>
> +               if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
> +                       break;
>                 if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
>                         break;
>         } while ((memcg = parent_mem_cgroup(memcg)) &&


Hi Johnanes,

That is what I did in the previous version, see also
https://lore.kernel.org/linux-mm/20200414015952.3590-1-laoar.shao@gmail.com/
.

-- 
Thanks
Yafang


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] mm, memcg: fix inconsistent oom event behavior
  2020-04-22 11:06 Yafang Shao
  2020-04-22 11:51 ` Michal Hocko
@ 2020-04-22 12:54 ` Johannes Weiner
  2020-04-22 12:58   ` Yafang Shao
  2020-04-22 13:15   ` Michal Hocko
  1 sibling, 2 replies; 11+ messages in thread
From: Johannes Weiner @ 2020-04-22 12:54 UTC (permalink / raw)
  To: Yafang Shao; +Cc: akpm, mhocko, linux-mm, Chris Down, Shakeel Butt

On Wed, Apr 22, 2020 at 07:06:43AM -0400, Yafang Shao wrote:
> A recent commit 9852ae3fe529 ("mm, memcg: consider subtrees in
> memory.events") changes the behavior of memcg events, which will
> consider subtrees in memory.events. But oom_kill event is a special one
> as it is used in both cgroup1 and cgroup2. In cgroup1, it is displayed
> in memory.oom_control. The file memory.oom_control is in both root memcg
> and non root memcg, that is different with memory.event as it only in
> non-root memcg. That commit is okay for cgroup2, but it is not okay for
> cgroup1 as it will cause inconsistent behavior between root memcg and
> non-root memcg.
> 
> Here's an example on why this behavior is inconsistent in cgroup1.
>      root memcg
>      /
>   memcg foo
>    /
> memcg bar
> 
> Suppose there's an oom_kill in memcg bar, then the oon_kill will be
> 
>      root memcg : memory.oom_control(oom_kill)  0
>      /
>   memcg foo : memory.oom_control(oom_kill)  1
>    /
> memcg bar : memory.oom_control(oom_kill)  1
> 
> For the non-root memcg, its memory.oom_control(oom_kill) includes its
> descendants' oom_kill, but for root memcg, it doesn't include its
> descendants' oom_kill. That means, memory.oom_control(oom_kill) has
> different meanings in different memcgs. That is inconsistent. Then the user
> has to know whether the memcg is root or not.
> 
> If we can't fully support it in cgroup1, for example by adding
> memory.events.local into cgroup1 as well, then let's don't touch
> its original behavior.
> 
> Setting CGRP_ROOT_MEMORY_LOCAL_EVENTS for legacy hierarchy by
> default rather than special casing it somewhere quite deep in the code
> would be better, per discussion with Michal.
> 
> Fixes: 9852ae3fe529 ("mm, memcg: consider subtrees in memory.events")
> Cc: Chris Down <chris@chrisdown.name>
> Cc: Shakeel Butt <shakeelb@google.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  mm/memcontrol.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 5beea03dd58a..0f7381bddcee 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5940,10 +5940,20 @@ static void mem_cgroup_bind(struct cgroup_subsys_state *root_css)
>  	 * guarantees that @root doesn't have any children, so turning it
>  	 * on for the root memcg is enough.
>  	 */
> -	if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
> +	if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
>  		root_mem_cgroup->use_hierarchy = true;
> -	else
> +	} else {
>  		root_mem_cgroup->use_hierarchy = false;
> +		/*
> +		 * Set CGRP_ROOT_MEMORY_LOCAL_EVENTS for legacy hierarchy
> +		 * by default to avoid inconsistent oom_kill behavior
> +		 * between root memcg and non-root memcg.
> +		 * Regarding default hierarchy, as this flag will be set
> +		 * or cleared later, we don't need to process it in this
> +		 * function.
> +		 */
> +		cgrp_dfl_root.flags |= CGRP_ROOT_MEMORY_LOCAL_EVENTS;

That will cause problems for people trying to remount. From
cgroup1_reconfigure():

	/* Don't allow flags or name to change at remount */
	if ((ctx->flags ^ root->flags) ||
	    (ctx->name && strcmp(ctx->name, root->name))) {
		errorfc(fc, "option or name mismatch, new: 0x%x \"%s\", old: 0x%x \"%s\"",
		       ctx->flags, ctx->name ?: "", root->flags, root->name);
		ret = -EINVAL;
		goto out_unlock;
	}

These flags belong to the user, they're read-only to the cgroup
implementation. Let's not mess with them from a controller.

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 4b868e5a687f..e831a90b5506 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -773,6 +773,8 @@ static inline void memcg_memory_event(struct mem_cgroup *memcg,
 		atomic_long_inc(&memcg->memory_events[event]);
 		cgroup_file_notify(&memcg->events_file);
 
+		if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
+			break;
 		if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
 			break;
 	} while ((memcg = parent_mem_cgroup(memcg)) &&


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] mm, memcg: fix inconsistent oom event behavior
  2020-04-22 11:06 Yafang Shao
@ 2020-04-22 11:51 ` Michal Hocko
  2020-04-22 12:54 ` Johannes Weiner
  1 sibling, 0 replies; 11+ messages in thread
From: Michal Hocko @ 2020-04-22 11:51 UTC (permalink / raw)
  To: Yafang Shao; +Cc: akpm, linux-mm, Chris Down, Shakeel Butt, Johannes Weiner

On Wed 22-04-20 07:06:43, Yafang Shao wrote:
> A recent commit 9852ae3fe529 ("mm, memcg: consider subtrees in
> memory.events") changes the behavior of memcg events, which will
> consider subtrees in memory.events. But oom_kill event is a special one
> as it is used in both cgroup1 and cgroup2. In cgroup1, it is displayed
> in memory.oom_control. The file memory.oom_control is in both root memcg
> and non root memcg, that is different with memory.event as it only in
> non-root memcg. That commit is okay for cgroup2, but it is not okay for
> cgroup1 as it will cause inconsistent behavior between root memcg and
> non-root memcg.
> 
> Here's an example on why this behavior is inconsistent in cgroup1.
>      root memcg
>      /
>   memcg foo
>    /
> memcg bar
> 
> Suppose there's an oom_kill in memcg bar, then the oon_kill will be
> 
>      root memcg : memory.oom_control(oom_kill)  0
>      /
>   memcg foo : memory.oom_control(oom_kill)  1
>    /
> memcg bar : memory.oom_control(oom_kill)  1
> 
> For the non-root memcg, its memory.oom_control(oom_kill) includes its
> descendants' oom_kill, but for root memcg, it doesn't include its
> descendants' oom_kill. That means, memory.oom_control(oom_kill) has
> different meanings in different memcgs. That is inconsistent. Then the user
> has to know whether the memcg is root or not.
> 
> If we can't fully support it in cgroup1, for example by adding
> memory.events.local into cgroup1 as well, then let's don't touch
> its original behavior.
> 
> Setting CGRP_ROOT_MEMORY_LOCAL_EVENTS for legacy hierarchy by
> default rather than special casing it somewhere quite deep in the code
> would be better, per discussion with Michal.

OK, this makes sense to me. Cgroup v1 really had local semantic and
9852ae3fe529 changed it unintentionally. I think it is reasonable to use
the CGRP_ROOT_MEMORY_LOCAL_EVENTS which denotes this mode but I will
defer to cgroup maintainers. Maybe there are some other side effects
which I am not aware of that would make this more awkward than a special
case for cgroup v1

> Fixes: 9852ae3fe529 ("mm, memcg: consider subtrees in memory.events")
> Cc: Chris Down <chris@chrisdown.name>
> Cc: Shakeel Butt <shakeelb@google.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/memcontrol.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 5beea03dd58a..0f7381bddcee 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5940,10 +5940,20 @@ static void mem_cgroup_bind(struct cgroup_subsys_state *root_css)
>  	 * guarantees that @root doesn't have any children, so turning it
>  	 * on for the root memcg is enough.
>  	 */
> -	if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
> +	if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
>  		root_mem_cgroup->use_hierarchy = true;
> -	else
> +	} else {
>  		root_mem_cgroup->use_hierarchy = false;
> +		/*
> +		 * Set CGRP_ROOT_MEMORY_LOCAL_EVENTS for legacy hierarchy
> +		 * by default to avoid inconsistent oom_kill behavior
> +		 * between root memcg and non-root memcg.
> +		 * Regarding default hierarchy, as this flag will be set
> +		 * or cleared later, we don't need to process it in this
> +		 * function.
> +		 */

I do not think the comment has to be so specific about oom events
behavior. I would just go with
		/*
		 * Cgroup v1 has traditionally had local semantic for
		 * event counters. Cgroup v2 changed that to a
		 * hierarchical behavior. This is expressed by
		 * CGRP_ROOT_MEMORY_LOCAL_EVENTS in the cgroup core.
		 */
> +		cgrp_dfl_root.flags |= CGRP_ROOT_MEMORY_LOCAL_EVENTS;
> +	}
>  }
>  
>  static int seq_puts_memcg_tunable(struct seq_file *m, unsigned long value)
> -- 
> 2.18.2

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2] mm, memcg: fix inconsistent oom event behavior
@ 2020-04-22 11:06 Yafang Shao
  2020-04-22 11:51 ` Michal Hocko
  2020-04-22 12:54 ` Johannes Weiner
  0 siblings, 2 replies; 11+ messages in thread
From: Yafang Shao @ 2020-04-22 11:06 UTC (permalink / raw)
  To: akpm, mhocko
  Cc: linux-mm, Yafang Shao, Chris Down, Shakeel Butt, Johannes Weiner

A recent commit 9852ae3fe529 ("mm, memcg: consider subtrees in
memory.events") changes the behavior of memcg events, which will
consider subtrees in memory.events. But oom_kill event is a special one
as it is used in both cgroup1 and cgroup2. In cgroup1, it is displayed
in memory.oom_control. The file memory.oom_control is in both root memcg
and non root memcg, that is different with memory.event as it only in
non-root memcg. That commit is okay for cgroup2, but it is not okay for
cgroup1 as it will cause inconsistent behavior between root memcg and
non-root memcg.

Here's an example on why this behavior is inconsistent in cgroup1.
     root memcg
     /
  memcg foo
   /
memcg bar

Suppose there's an oom_kill in memcg bar, then the oon_kill will be

     root memcg : memory.oom_control(oom_kill)  0
     /
  memcg foo : memory.oom_control(oom_kill)  1
   /
memcg bar : memory.oom_control(oom_kill)  1

For the non-root memcg, its memory.oom_control(oom_kill) includes its
descendants' oom_kill, but for root memcg, it doesn't include its
descendants' oom_kill. That means, memory.oom_control(oom_kill) has
different meanings in different memcgs. That is inconsistent. Then the user
has to know whether the memcg is root or not.

If we can't fully support it in cgroup1, for example by adding
memory.events.local into cgroup1 as well, then let's don't touch
its original behavior.

Setting CGRP_ROOT_MEMORY_LOCAL_EVENTS for legacy hierarchy by
default rather than special casing it somewhere quite deep in the code
would be better, per discussion with Michal.

Fixes: 9852ae3fe529 ("mm, memcg: consider subtrees in memory.events")
Cc: Chris Down <chris@chrisdown.name>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 mm/memcontrol.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5beea03dd58a..0f7381bddcee 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5940,10 +5940,20 @@ static void mem_cgroup_bind(struct cgroup_subsys_state *root_css)
 	 * guarantees that @root doesn't have any children, so turning it
 	 * on for the root memcg is enough.
 	 */
-	if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
+	if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
 		root_mem_cgroup->use_hierarchy = true;
-	else
+	} else {
 		root_mem_cgroup->use_hierarchy = false;
+		/*
+		 * Set CGRP_ROOT_MEMORY_LOCAL_EVENTS for legacy hierarchy
+		 * by default to avoid inconsistent oom_kill behavior
+		 * between root memcg and non-root memcg.
+		 * Regarding default hierarchy, as this flag will be set
+		 * or cleared later, we don't need to process it in this
+		 * function.
+		 */
+		cgrp_dfl_root.flags |= CGRP_ROOT_MEMORY_LOCAL_EVENTS;
+	}
 }
 
 static int seq_puts_memcg_tunable(struct seq_file *m, unsigned long value)
-- 
2.18.2



^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-04-22 13:16 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-14  1:59 [PATCH v2] mm, memcg: fix inconsistent oom event behavior Yafang Shao
2020-04-14 15:22 ` Michal Hocko
2020-04-14 15:57   ` Yafang Shao
2020-04-14 15:57     ` Yafang Shao
2020-04-22 11:06 Yafang Shao
2020-04-22 11:51 ` Michal Hocko
2020-04-22 12:54 ` Johannes Weiner
2020-04-22 12:58   ` Yafang Shao
2020-04-22 13:02     ` Chris Down
2020-04-22 13:15       ` Yafang Shao
2020-04-22 13:15   ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.