[PATCH v2 0/2]memcg oom: don't try to kill a process if there is no process

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2]memcg oom: don't try to kill a process if there is no process
@ 2020-05-04  4:26 Yafang Shao
  2020-05-04  4:26 ` [PATCH v2 1/2] mm, memcg: rename mem_cgroup_out_of_memory() Yafang Shao
  2020-05-04  4:26 ` [PATCH v2 2/2] mm, memcg: don't try to kill a process if memcg is not populated Yafang Shao
  0 siblings, 2 replies; 11+ messages in thread
From: Yafang Shao @ 2020-05-04  4:26 UTC (permalink / raw)
  To: akpm, shakeelb; +Cc: hannes, mhocko, guro, gthelen, linux-mm, Yafang Shao

ecently Shakeel reported a issue which also confused me serveral months
earlier that it still try to kill a process even if there is no process.
If a memcg is not populated, it is useless to try to kill a process.

mem_cgroup_out_of_memory() is renamed to mem_cgroup_oom_kill() for better
understanding.

Yafang Shao (2):
  mm, memcg: rename mem_cgroup_out_of_memory()
  mm, memcg: don't try to kill a process if memcg is not populated

 mm/memcontrol.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

-- 
2.18.2



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/2] mm, memcg: rename mem_cgroup_out_of_memory()
  2020-05-04  4:26 [PATCH v2 0/2]memcg oom: don't try to kill a process if there is no process Yafang Shao
@ 2020-05-04  4:26 ` Yafang Shao
  2020-05-04  7:59   ` Michal Hocko
  2020-05-04 15:50   ` Chris Down
  2020-05-04  4:26 ` [PATCH v2 2/2] mm, memcg: don't try to kill a process if memcg is not populated Yafang Shao
  1 sibling, 2 replies; 11+ messages in thread
From: Yafang Shao @ 2020-05-04  4:26 UTC (permalink / raw)
  To: akpm, shakeelb; +Cc: hannes, mhocko, guro, gthelen, linux-mm, Yafang Shao

Rename mem_cgroup_out_of_memory() to mem_cgroup_oom_kill() to indicate
that this function is used to try to kill a process.
With this change it will cooperate better with the oom events.
	function		memcg event
	mem_cgroup_oom()	oom
	mem_cgroup_oom_kill()	oom_kill

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Greg Thelen <gthelen@google.com>
---
 mm/memcontrol.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5beea03dd58a..985edce98491 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1570,7 +1570,7 @@ unsigned long mem_cgroup_size(struct mem_cgroup *memcg)
 	return page_counter_read(&memcg->memory);
 }
 
-static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
+static bool mem_cgroup_oom_kill(struct mem_cgroup *memcg, gfp_t gfp_mask,
 				     int order)
 {
 	struct oom_control oc = {
@@ -1799,7 +1799,7 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int
 	 * relying on the oom victim to make a forward progress and we can
 	 * invoke the oom killer here.
 	 *
-	 * Please note that mem_cgroup_out_of_memory might fail to find a
+	 * Please note that mem_cgroup_oom_kill might fail to find a
 	 * victim and then we have to bail out from the charge path.
 	 */
 	if (memcg->oom_kill_disable) {
@@ -1821,7 +1821,7 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int
 		mem_cgroup_oom_notify(memcg);
 
 	mem_cgroup_unmark_under_oom(memcg);
-	if (mem_cgroup_out_of_memory(memcg, mask, order))
+	if (mem_cgroup_oom_kill(memcg, mask, order))
 		ret = OOM_SUCCESS;
 	else
 		ret = OOM_FAILED;
@@ -1879,7 +1879,7 @@ bool mem_cgroup_oom_synchronize(bool handle)
 	if (locked && !memcg->oom_kill_disable) {
 		mem_cgroup_unmark_under_oom(memcg);
 		finish_wait(&memcg_oom_waitq, &owait.wait);
-		mem_cgroup_out_of_memory(memcg, current->memcg_oom_gfp_mask,
+		mem_cgroup_oom_kill(memcg, current->memcg_oom_gfp_mask,
 					 current->memcg_oom_order);
 	} else {
 		schedule();
@@ -6102,7 +6102,7 @@ static ssize_t memory_max_write(struct kernfs_open_file *of,
 		}
 
 		memcg_memory_event(memcg, MEMCG_OOM);
-		if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0))
+		if (!mem_cgroup_oom_kill(memcg, GFP_KERNEL, 0))
 			break;
 	}
 
-- 
2.18.2



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/2] mm, memcg: don't try to kill a process if memcg is not populated
  2020-05-04  4:26 [PATCH v2 0/2]memcg oom: don't try to kill a process if there is no process Yafang Shao
  2020-05-04  4:26 ` [PATCH v2 1/2] mm, memcg: rename mem_cgroup_out_of_memory() Yafang Shao
@ 2020-05-04  4:26 ` Yafang Shao
  2020-05-04  8:18   ` Michal Hocko
  1 sibling, 1 reply; 11+ messages in thread
From: Yafang Shao @ 2020-05-04  4:26 UTC (permalink / raw)
  To: akpm, shakeelb; +Cc: hannes, mhocko, guro, gthelen, linux-mm, Yafang Shao

Recently Shakeel reported a issue which also confused me several months
earlier. Bellow is his report -
Lowering memory.max can trigger an oom-kill if the reclaim does not
succeed. However if oom-killer does not find a process for killing, it
dumps a lot of warnings.
Deleting a memcg does not reclaim memory from it and the memory can
linger till there is a memory pressure. One normal way to proactively
reclaim such memory is to set memory.max to 0 just before deleting the
memcg. However if some of the memcg's memory is pinned by others, this
operation can trigger an oom-kill without any process and thus can log a
lot of un-needed warnings. So, ignore all such warnings from memory.max.

A better way to avoid this issue is to avoid trying to kill a process if
memcg is not populated.
Note that OOM is different from OOM kill. OOM is a status that the
system or memcg is out of memory, while OOM kill is a result that a
process inside this memcg is killed when this memcg is in OOM status.
That is the same reason why there're both MEMCG_OOM event and
MEMCG_OOM_KILL event. If we have already known that there's nothing to
kill, i.e. the memcg is not populated, then we don't need a try.

Basically why setting memory.max to 0 is better than setting memory.high to
0 before deletion. The reason is remote charging. High reclaim does not
work for remote memcg and the usage can go till max or global pressure.

[shakeelb@google.com: improve commit log]
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Greg Thelen <gthelen@google.com>
---
 mm/memcontrol.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 985edce98491..29afe3df9d98 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6102,6 +6102,10 @@ static ssize_t memory_max_write(struct kernfs_open_file *of,
 		}

 		memcg_memory_event(memcg, MEMCG_OOM);
+
+		if (!cgroup_is_populated(memcg->css.cgroup))
+			break;
+
 		if (!mem_cgroup_oom_kill(memcg, GFP_KERNEL, 0))
 			break;
 	}
-- 
2.18.2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] mm, memcg: rename mem_cgroup_out_of_memory()
  2020-05-04  4:26 ` [PATCH v2 1/2] mm, memcg: rename mem_cgroup_out_of_memory() Yafang Shao
@ 2020-05-04  7:59   ` Michal Hocko
  2020-05-04 15:50   ` Chris Down
  1 sibling, 0 replies; 11+ messages in thread
From: Michal Hocko @ 2020-05-04  7:59 UTC (permalink / raw)
  To: Yafang Shao; +Cc: akpm, shakeelb, hannes, guro, gthelen, linux-mm

On Mon 04-05-20 00:26:20, Yafang Shao wrote:
> Rename mem_cgroup_out_of_memory() to mem_cgroup_oom_kill() to indicate
> that this function is used to try to kill a process.
> With this change it will cooperate better with the oom events.
> 	function		memcg event
> 	mem_cgroup_oom()	oom
> 	mem_cgroup_oom_kill()	oom_kill
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Reviewed-by: Shakeel Butt <shakeelb@google.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Roman Gushchin <guro@fb.com>
> Cc: Greg Thelen <gthelen@google.com>

Well, I do not have a strong opinion on this. I believe the idea behind
the naming was that this is a memcg counterpart for out_of_memory.
So to be honest I do not consider this an improvement.

> ---
>  mm/memcontrol.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 5beea03dd58a..985edce98491 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1570,7 +1570,7 @@ unsigned long mem_cgroup_size(struct mem_cgroup *memcg)
>  	return page_counter_read(&memcg->memory);
>  }
>  
> -static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
> +static bool mem_cgroup_oom_kill(struct mem_cgroup *memcg, gfp_t gfp_mask,
>  				     int order)
>  {
>  	struct oom_control oc = {
> @@ -1799,7 +1799,7 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int
>  	 * relying on the oom victim to make a forward progress and we can
>  	 * invoke the oom killer here.
>  	 *
> -	 * Please note that mem_cgroup_out_of_memory might fail to find a
> +	 * Please note that mem_cgroup_oom_kill might fail to find a
>  	 * victim and then we have to bail out from the charge path.
>  	 */
>  	if (memcg->oom_kill_disable) {
> @@ -1821,7 +1821,7 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int
>  		mem_cgroup_oom_notify(memcg);
>  
>  	mem_cgroup_unmark_under_oom(memcg);
> -	if (mem_cgroup_out_of_memory(memcg, mask, order))
> +	if (mem_cgroup_oom_kill(memcg, mask, order))
>  		ret = OOM_SUCCESS;
>  	else
>  		ret = OOM_FAILED;
> @@ -1879,7 +1879,7 @@ bool mem_cgroup_oom_synchronize(bool handle)
>  	if (locked && !memcg->oom_kill_disable) {
>  		mem_cgroup_unmark_under_oom(memcg);
>  		finish_wait(&memcg_oom_waitq, &owait.wait);
> -		mem_cgroup_out_of_memory(memcg, current->memcg_oom_gfp_mask,
> +		mem_cgroup_oom_kill(memcg, current->memcg_oom_gfp_mask,
>  					 current->memcg_oom_order);
>  	} else {
>  		schedule();
> @@ -6102,7 +6102,7 @@ static ssize_t memory_max_write(struct kernfs_open_file *of,
>  		}
>  
>  		memcg_memory_event(memcg, MEMCG_OOM);
> -		if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0))
> +		if (!mem_cgroup_oom_kill(memcg, GFP_KERNEL, 0))
>  			break;
>  	}
>  
> -- 
> 2.18.2

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] mm, memcg: don't try to kill a process if memcg is not populated
  2020-05-04  4:26 ` [PATCH v2 2/2] mm, memcg: don't try to kill a process if memcg is not populated Yafang Shao
@ 2020-05-04  8:18   ` Michal Hocko
  2020-05-04 12:34     ` Yafang Shao
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2020-05-04  8:18 UTC (permalink / raw)
  To: Yafang Shao; +Cc: akpm, shakeelb, hannes, guro, gthelen, linux-mm

[It would be really great if a newer version was posted only after there
was a wider consensus on the approach.]

On Mon 04-05-20 00:26:21, Yafang Shao wrote:
> Recently Shakeel reported a issue which also confused me several months
> earlier. Bellow is his report -
> Lowering memory.max can trigger an oom-kill if the reclaim does not
> succeed. However if oom-killer does not find a process for killing, it
> dumps a lot of warnings.
> Deleting a memcg does not reclaim memory from it and the memory can
> linger till there is a memory pressure. One normal way to proactively
> reclaim such memory is to set memory.max to 0 just before deleting the
> memcg. However if some of the memcg's memory is pinned by others, this
> operation can trigger an oom-kill without any process and thus can log a
> lot of un-needed warnings. So, ignore all such warnings from memory.max.
> 
> A better way to avoid this issue is to avoid trying to kill a process if
> memcg is not populated.
> Note that OOM is different from OOM kill. OOM is a status that the
> system or memcg is out of memory, while OOM kill is a result that a
> process inside this memcg is killed when this memcg is in OOM status.

Agreed.

> That is the same reason why there're both MEMCG_OOM event and
> MEMCG_OOM_KILL event. If we have already known that there's nothing to
> kill, i.e. the memcg is not populated, then we don't need a try.

OK, but you are not explaining why a silent failure is really better
than no oom report under oom situation. With your patch, there is
no failure reported to the user and there is also no sign that there
might be a problem that memcg leaves memory behind that is not bound to
any (killable) process. This could be an important information.

Besides that I really do not see any actual problem that this would be
fixing. Reducing the hard limit is an operation which might trigger the
oom killer and leave an oom report behind. Having an OOM without any
tasks is pretty much a corner case and making it silent just makes
it harder to debug.

> Basically why setting memory.max to 0 is better than setting memory.high to
> 0 before deletion. The reason is remote charging. High reclaim does not
> work for remote memcg and the usage can go till max or global pressure.
> 
> [shakeelb@google.com: improve commit log]
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Reviewed-by: Shakeel Butt <shakeelb@google.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Roman Gushchin <guro@fb.com>
> Cc: Greg Thelen <gthelen@google.com>
> ---
>  mm/memcontrol.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 985edce98491..29afe3df9d98 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6102,6 +6102,10 @@ static ssize_t memory_max_write(struct kernfs_open_file *of,
>  		}
>  
>  		memcg_memory_event(memcg, MEMCG_OOM);
> +
> +		if (!cgroup_is_populated(memcg->css.cgroup))
> +			break;
> +
>  		if (!mem_cgroup_oom_kill(memcg, GFP_KERNEL, 0))
>  			break;
>  	}
> -- 
> 2.18.2

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] mm, memcg: don't try to kill a process if memcg is not populated
  2020-05-04  8:18   ` Michal Hocko
@ 2020-05-04 12:34     ` Yafang Shao
  2020-05-04 12:46       ` Michal Hocko
  0 siblings, 1 reply; 11+ messages in thread
From: Yafang Shao @ 2020-05-04 12:34 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Shakeel Butt, Johannes Weiner, Roman Gushchin,
	Greg Thelen, Linux MM

On Mon, May 4, 2020 at 4:18 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> [It would be really great if a newer version was posted only after there
> was a wider consensus on the approach.]
>
> On Mon 04-05-20 00:26:21, Yafang Shao wrote:
> > Recently Shakeel reported a issue which also confused me several months
> > earlier. Bellow is his report -
> > Lowering memory.max can trigger an oom-kill if the reclaim does not
> > succeed. However if oom-killer does not find a process for killing, it
> > dumps a lot of warnings.
> > Deleting a memcg does not reclaim memory from it and the memory can
> > linger till there is a memory pressure. One normal way to proactively
> > reclaim such memory is to set memory.max to 0 just before deleting the
> > memcg. However if some of the memcg's memory is pinned by others, this
> > operation can trigger an oom-kill without any process and thus can log a
> > lot of un-needed warnings. So, ignore all such warnings from memory.max.
> >
> > A better way to avoid this issue is to avoid trying to kill a process if
> > memcg is not populated.
> > Note that OOM is different from OOM kill. OOM is a status that the
> > system or memcg is out of memory, while OOM kill is a result that a
> > process inside this memcg is killed when this memcg is in OOM status.
>
> Agreed.
>
> > That is the same reason why there're both MEMCG_OOM event and
> > MEMCG_OOM_KILL event. If we have already known that there's nothing to
> > kill, i.e. the memcg is not populated, then we don't need a try.
>
> OK, but you are not explaining why a silent failure is really better
> than no oom report under oom situation. With your patch, there is
> no failure reported to the user and there is also no sign that there
> might be a problem that memcg leaves memory behind that is not bound to
> any (killable) process. This could be an important information.
>

That is not a silent failure. An oom event will be reported.
The user can get this event by memory.events or memory.events.local if
he really care about it.
Especially when the admin set memory.max to 0 to drop all the caches,
many oom logs are a noise, besides that there are some side effect,
for example two many oom logs printed to a slow console may cause some
latency spike.


> Besides that I really do not see any actual problem that this would be
> fixing.

Avoid printing two many oom logs.

> Reducing the hard limit is an operation which might trigger the
> oom killer and leave an oom report behind. Having an OOM without any
> tasks is pretty much a corner case and making it silent just makes
> it harder to debug.
>

This can only happen when the admin reduces memory.max, and the admin
should have the ability to know how to get the result, for example by
memory.events.

> > Basically why setting memory.max to 0 is better than setting memory.high to
> > 0 before deletion. The reason is remote charging. High reclaim does not
> > work for remote memcg and the usage can go till max or global pressure.
> >
> > [shakeelb@google.com: improve commit log]
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > Reviewed-by: Shakeel Butt <shakeelb@google.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > Cc: Roman Gushchin <guro@fb.com>
> > Cc: Greg Thelen <gthelen@google.com>
> > ---
> >  mm/memcontrol.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 985edce98491..29afe3df9d98 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -6102,6 +6102,10 @@ static ssize_t memory_max_write(struct kernfs_open_file *of,
> >               }
> >
> >               memcg_memory_event(memcg, MEMCG_OOM);
> > +
> > +             if (!cgroup_is_populated(memcg->css.cgroup))
> > +                     break;
> > +
> >               if (!mem_cgroup_oom_kill(memcg, GFP_KERNEL, 0))
> >                       break;
> >       }
> > --
> > 2.18.2
>
> --
> Michal Hocko
> SUSE Labs



-- 
Thanks
Yafang


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] mm, memcg: don't try to kill a process if memcg is not populated
  2020-05-04 12:34     ` Yafang Shao
@ 2020-05-04 12:46       ` Michal Hocko
  2020-05-04 15:24         ` Yafang Shao
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2020-05-04 12:46 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Andrew Morton, Shakeel Butt, Johannes Weiner, Roman Gushchin,
	Greg Thelen, Linux MM

On Mon 04-05-20 20:34:01, Yafang Shao wrote:
> On Mon, May 4, 2020 at 4:18 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > [It would be really great if a newer version was posted only after there
> > was a wider consensus on the approach.]
> >
> > On Mon 04-05-20 00:26:21, Yafang Shao wrote:
> > > Recently Shakeel reported a issue which also confused me several months
> > > earlier. Bellow is his report -
> > > Lowering memory.max can trigger an oom-kill if the reclaim does not
> > > succeed. However if oom-killer does not find a process for killing, it
> > > dumps a lot of warnings.
> > > Deleting a memcg does not reclaim memory from it and the memory can
> > > linger till there is a memory pressure. One normal way to proactively
> > > reclaim such memory is to set memory.max to 0 just before deleting the
> > > memcg. However if some of the memcg's memory is pinned by others, this
> > > operation can trigger an oom-kill without any process and thus can log a
> > > lot of un-needed warnings. So, ignore all such warnings from memory.max.
> > >
> > > A better way to avoid this issue is to avoid trying to kill a process if
> > > memcg is not populated.
> > > Note that OOM is different from OOM kill. OOM is a status that the
> > > system or memcg is out of memory, while OOM kill is a result that a
> > > process inside this memcg is killed when this memcg is in OOM status.
> >
> > Agreed.
> >
> > > That is the same reason why there're both MEMCG_OOM event and
> > > MEMCG_OOM_KILL event. If we have already known that there's nothing to
> > > kill, i.e. the memcg is not populated, then we don't need a try.
> >
> > OK, but you are not explaining why a silent failure is really better
> > than no oom report under oom situation. With your patch, there is
> > no failure reported to the user and there is also no sign that there
> > might be a problem that memcg leaves memory behind that is not bound to
> > any (killable) process. This could be an important information.
> >
> 
> That is not a silent failure. An oom event will be reported.
> The user can get this event by memory.events or memory.events.local if
> he really care about it.

You are right. The oom situation will be reported (somehow) but the
reason why no task has been killed might be several and there is no way
to report no eligible tasks.

> Especially when the admin set memory.max to 0 to drop all the caches,
> many oom logs are a noise, besides that there are some side effect,
> for example two many oom logs printed to a slow console may cause some
> latency spike.

But the oom situation and the oom report is simply something an admin
has to expect especially when the hard limit is set to 0. With kmem
accounting there is no guarantee that the target will be met.
> 
> 
> > Besides that I really do not see any actual problem that this would be
> > fixing.
> 
> Avoid printing two many oom logs.

There is only a single oom report printed so I disagree this is really a
proper justification.

Unless you can come up with a better justification I am against this
patch. It unnecessarily reduce debugging tools while it doesn't really
provide any huge advantage. Changing the hard limit to impossible target
is known to trigger the oom kernel and the oom report is a part of that.
If the oom report is too noisy then we can discuss on how to make it
more compact but making ad-hoc exceptions like this one is not a good
solution.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] mm, memcg: don't try to kill a process if memcg is not populated
  2020-05-04 12:46       ` Michal Hocko
@ 2020-05-04 15:24         ` Yafang Shao
  2020-05-04 16:11           ` Michal Hocko
  0 siblings, 1 reply; 11+ messages in thread
From: Yafang Shao @ 2020-05-04 15:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Shakeel Butt, Johannes Weiner, Roman Gushchin,
	Greg Thelen, Linux MM

On Mon, May 4, 2020 at 8:46 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Mon 04-05-20 20:34:01, Yafang Shao wrote:
> > On Mon, May 4, 2020 at 4:18 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > [It would be really great if a newer version was posted only after there
> > > was a wider consensus on the approach.]
> > >
> > > On Mon 04-05-20 00:26:21, Yafang Shao wrote:
> > > > Recently Shakeel reported a issue which also confused me several months
> > > > earlier. Bellow is his report -
> > > > Lowering memory.max can trigger an oom-kill if the reclaim does not
> > > > succeed. However if oom-killer does not find a process for killing, it
> > > > dumps a lot of warnings.
> > > > Deleting a memcg does not reclaim memory from it and the memory can
> > > > linger till there is a memory pressure. One normal way to proactively
> > > > reclaim such memory is to set memory.max to 0 just before deleting the
> > > > memcg. However if some of the memcg's memory is pinned by others, this
> > > > operation can trigger an oom-kill without any process and thus can log a
> > > > lot of un-needed warnings. So, ignore all such warnings from memory.max.
> > > >
> > > > A better way to avoid this issue is to avoid trying to kill a process if
> > > > memcg is not populated.
> > > > Note that OOM is different from OOM kill. OOM is a status that the
> > > > system or memcg is out of memory, while OOM kill is a result that a
> > > > process inside this memcg is killed when this memcg is in OOM status.
> > >
> > > Agreed.
> > >
> > > > That is the same reason why there're both MEMCG_OOM event and
> > > > MEMCG_OOM_KILL event. If we have already known that there's nothing to
> > > > kill, i.e. the memcg is not populated, then we don't need a try.
> > >
> > > OK, but you are not explaining why a silent failure is really better
> > > than no oom report under oom situation. With your patch, there is
> > > no failure reported to the user and there is also no sign that there
> > > might be a problem that memcg leaves memory behind that is not bound to
> > > any (killable) process. This could be an important information.
> > >
> >
> > That is not a silent failure. An oom event will be reported.
> > The user can get this event by memory.events or memory.events.local if
> > he really care about it.
>
> You are right. The oom situation will be reported (somehow) but the
> reason why no task has been killed might be several and there is no way
> to report no eligible tasks.
>
> > Especially when the admin set memory.max to 0 to drop all the caches,
> > many oom logs are a noise, besides that there are some side effect,
> > for example two many oom logs printed to a slow console may cause some
> > latency spike.
>
> But the oom situation and the oom report is simply something an admin
> has to expect especially when the hard limit is set to 0. With kmem
> accounting there is no guarantee that the target will be met.

I'm always wondering that why not moving the kmem from this memcg to
the root_mem_cgroup in this situation ?
Then this memcg can be easily reclaimed.

> >
> >
> > > Besides that I really do not see any actual problem that this would be
> > > fixing.
> >
> > Avoid printing two many oom logs.
>
> There is only a single oom report printed so I disagree this is really a
> proper justification.
>
> Unless you can come up with a better justification I am against this
> patch. It unnecessarily reduce debugging tools while it doesn't really
> provide any huge advantage. Changing the hard limit to impossible target
> is known to trigger the oom kernel and the oom report is a part of that.
> If the oom report is too noisy then we can discuss on how to make it
> more compact but making ad-hoc exceptions like this one is not a good
> solution.
> --

No better justification yet. But I think more memcg users will
complaining about it.

-- 
Thanks
Yafang


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] mm, memcg: rename mem_cgroup_out_of_memory()
  2020-05-04  4:26 ` [PATCH v2 1/2] mm, memcg: rename mem_cgroup_out_of_memory() Yafang Shao
  2020-05-04  7:59   ` Michal Hocko
@ 2020-05-04 15:50   ` Chris Down
  1 sibling, 0 replies; 11+ messages in thread
From: Chris Down @ 2020-05-04 15:50 UTC (permalink / raw)
  To: Yafang Shao; +Cc: akpm, shakeelb, hannes, mhocko, guro, gthelen, linux-mm

Yafang Shao writes:
>Rename mem_cgroup_out_of_memory() to mem_cgroup_oom_kill() to indicate
>that this function is used to try to kill a process.
>With this change it will cooperate better with the oom events.
>	function		memcg event
>	mem_cgroup_oom()	oom
>	mem_cgroup_oom_kill()	oom_kill

Hmm. The reason it's called mem_cgroup_out_of_memory() is, as Michal said, to 
match out_of_memory, which may or may not OOM kill. Internally, 
mem_cgroup_out_of_memory() may also do this depending on the state of oom_lock 
and the current task, so mem_cgroup_out_of_memory() is really about *deciding* 
what to do when generically out of memory rather than necessarily OOM killing 
something (although yes, that's the general outcome).

Is matching the memcg event names the only reason you'd like to do this, or are 
there more concerns you had? If that's the only case, I think let's keep it as 
is for now. :-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] mm, memcg: don't try to kill a process if memcg is not populated
  2020-05-04 15:24         ` Yafang Shao
@ 2020-05-04 16:11           ` Michal Hocko
  2020-05-04 17:04             ` Roman Gushchin
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2020-05-04 16:11 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Andrew Morton, Shakeel Butt, Johannes Weiner, Roman Gushchin,
	Greg Thelen, Linux MM

On Mon 04-05-20 23:24:35, Yafang Shao wrote:
> On Mon, May 4, 2020 at 8:46 PM Michal Hocko <mhocko@kernel.org> wrote:
[...]
> > But the oom situation and the oom report is simply something an admin
> > has to expect especially when the hard limit is set to 0. With kmem
> > accounting there is no guarantee that the target will be met.
> 
> I'm always wondering that why not moving the kmem from this memcg to
> the root_mem_cgroup in this situation ?
> Then this memcg can be easily reclaimed.

Roman was playing with kmem charges reparenting. But please note that
this alone would be sufficient. Even LRU pages are not guaranteed to be
reclaimable - think of the full swap space, memory might be pinned etc.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] mm, memcg: don't try to kill a process if memcg is not populated
  2020-05-04 16:11           ` Michal Hocko
@ 2020-05-04 17:04             ` Roman Gushchin
  0 siblings, 0 replies; 11+ messages in thread
From: Roman Gushchin @ 2020-05-04 17:04 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Yafang Shao, Andrew Morton, Shakeel Butt, Johannes Weiner,
	Greg Thelen, Linux MM

On Mon, May 04, 2020 at 06:11:13PM +0200, Michal Hocko wrote:
> On Mon 04-05-20 23:24:35, Yafang Shao wrote:
> > On Mon, May 4, 2020 at 8:46 PM Michal Hocko <mhocko@kernel.org> wrote:
> [...]
> > > But the oom situation and the oom report is simply something an admin
> > > has to expect especially when the hard limit is set to 0. With kmem
> > > accounting there is no guarantee that the target will be met.
> > 
> > I'm always wondering that why not moving the kmem from this memcg to
> > the root_mem_cgroup in this situation ?
> > Then this memcg can be easily reclaimed.

It's not that trivial: there are many objects which are keeping a reference
to a memory cgroup. We don't even have a comprehensive list of them.
And we should somehow reassign them to a different cgroup without too much
overhead.
Also it's better to move it to the parent instead of root.

> 
> Roman was playing with kmem charges reparenting.

Slabs are already reparenting. Other objects, which are allocated directly
by the page allocator (e.g. vmallocs) are not. But it will be relatively
easy to cover them after landing my slab controller rework patchset:
https://lore.kernel.org/lkml/20200422204708.2176080-1-guro@fb.com/ .
Basically it provides a framework for charging kernel objects in a way
that provides inexpensive reparenting.

Thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-05-04 17:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-04  4:26 [PATCH v2 0/2]memcg oom: don't try to kill a process if there is no process Yafang Shao
2020-05-04  4:26 ` [PATCH v2 1/2] mm, memcg: rename mem_cgroup_out_of_memory() Yafang Shao
2020-05-04  7:59   ` Michal Hocko
2020-05-04 15:50   ` Chris Down
2020-05-04  4:26 ` [PATCH v2 2/2] mm, memcg: don't try to kill a process if memcg is not populated Yafang Shao
2020-05-04  8:18   ` Michal Hocko
2020-05-04 12:34     ` Yafang Shao
2020-05-04 12:46       ` Michal Hocko
2020-05-04 15:24         ` Yafang Shao
2020-05-04 16:11           ` Michal Hocko
2020-05-04 17:04             ` Roman Gushchin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).