linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: add group_oom_kill memory event
@ 2021-12-03 16:24 Dan Schatzberg
  2021-12-03 23:37 ` Roman Gushchin
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Dan Schatzberg @ 2021-12-03 16:24 UTC (permalink / raw)
  To: Johannes Weiner, Roman Gushchin
  Cc: Tejun Heo, Zefan Li, Jonathan Corbet, Michal Hocko,
	Vladimir Davydov, Andrew Morton, Shakeel Butt,
	Matthew Wilcox (Oracle),
	Muchun Song, Alex Shi, Wei Yang, open list:CONTROL GROUP (CGROUP),
	open list:DOCUMENTATION, open list,
	open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)

Our container agent wants to know when a container exits if it was OOM
killed or not to report to the user. We use memory.oom.group = 1 to
ensure that OOM kills within the container's cgroup kill
everything. Existing memory.events are insufficient for knowing if
this triggered:

1) Our current approach reads memory.events oom_kill and reports the
container was killed if the value is non-zero. This is erroneous in
some cases where containers create their children cgroups with
memory.oom.group=1 as such OOM kills will get counted against the
container cgroup's oom_kill counter despite not actually OOM killing
the entire container.

2) Reading memory.events.local will fail to identify OOM kills in leaf
cgroups (that don't set memory.oom.group) within the container cgroup.

This patch adds a new oom_group_kill event when memory.oom.group
triggers to allow userspace to cleanly identify when an entire cgroup
is oom killed.

Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 4 ++++
 include/linux/memcontrol.h              | 1 +
 mm/memcontrol.c                         | 5 +++++
 mm/oom_kill.c                           | 1 +
 4 files changed, 11 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 2aeb7ae8b393..eec830ce2068 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1268,6 +1268,10 @@ PAGE_SIZE multiple when read back.
 		The number of processes belonging to this cgroup
 		killed by any kind of OOM killer.
 
+          oom_group_kill
+                The number of times all tasks in the cgroup were killed
+                due to memory.oom.group.
+
   memory.events.local
 	Similar to memory.events but the fields in the file are local
 	to the cgroup i.e. not hierarchical. The file modified event
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 0c5c403f4be6..951f24f42147 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -42,6 +42,7 @@ enum memcg_memory_event {
 	MEMCG_MAX,
 	MEMCG_OOM,
 	MEMCG_OOM_KILL,
+	MEMCG_OOM_GROUP_KILL,
 	MEMCG_SWAP_HIGH,
 	MEMCG_SWAP_MAX,
 	MEMCG_SWAP_FAIL,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6863a834ed42..5ab3b9ce90de 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4390,6 +4390,9 @@ static int mem_cgroup_oom_control_read(struct seq_file *sf, void *v)
 	seq_printf(sf, "under_oom %d\n", (bool)memcg->under_oom);
 	seq_printf(sf, "oom_kill %lu\n",
 		   atomic_long_read(&memcg->memory_events[MEMCG_OOM_KILL]));
+	seq_printf(sf, "oom_group_kill %lu\n",
+		   atomic_long_read(
+			&memcg->memory_events[MEMCG_OOM_GROUP_KILL]));
 	return 0;
 }
 
@@ -6307,6 +6310,8 @@ static void __memory_events_show(struct seq_file *m, atomic_long_t *events)
 	seq_printf(m, "oom %lu\n", atomic_long_read(&events[MEMCG_OOM]));
 	seq_printf(m, "oom_kill %lu\n",
 		   atomic_long_read(&events[MEMCG_OOM_KILL]));
+	seq_printf(m, "oom_group_kill %lu\n",
+		   atomic_long_read(&events[MEMCG_OOM_GROUP_KILL]));
 }
 
 static int memory_events_show(struct seq_file *m, void *v)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 1ddabefcfb5a..e52ce0b1465d 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -994,6 +994,7 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
 	 * If necessary, kill all tasks in the selected memory cgroup.
 	 */
 	if (oom_group) {
+		memcg_memory_event(oom_group, MEMCG_OOM_GROUP_KILL);
 		mem_cgroup_print_oom_group(oom_group);
 		mem_cgroup_scan_tasks(oom_group, oom_kill_memcg_member,
 				      (void *)message);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: add group_oom_kill memory event
  2021-12-03 16:24 [PATCH] mm: add group_oom_kill memory event Dan Schatzberg
@ 2021-12-03 23:37 ` Roman Gushchin
  2021-12-04  0:45 ` Shakeel Butt
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Roman Gushchin @ 2021-12-03 23:37 UTC (permalink / raw)
  To: Dan Schatzberg
  Cc: Johannes Weiner, Tejun Heo, Zefan Li, Jonathan Corbet,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Shakeel Butt,
	Matthew Wilcox (Oracle),
	Muchun Song, Alex Shi, Wei Yang, open list:CONTROL GROUP (CGROUP),
	open list:DOCUMENTATION, open list,
	open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)

On Fri, Dec 03, 2021 at 08:24:23AM -0800, Dan Schatzberg wrote:
> Our container agent wants to know when a container exits if it was OOM
> killed or not to report to the user. We use memory.oom.group = 1 to
> ensure that OOM kills within the container's cgroup kill
> everything. Existing memory.events are insufficient for knowing if
> this triggered:
> 
> 1) Our current approach reads memory.events oom_kill and reports the
> container was killed if the value is non-zero. This is erroneous in
> some cases where containers create their children cgroups with
> memory.oom.group=1 as such OOM kills will get counted against the
> container cgroup's oom_kill counter despite not actually OOM killing
> the entire container.
> 
> 2) Reading memory.events.local will fail to identify OOM kills in leaf
> cgroups (that don't set memory.oom.group) within the container cgroup.
> 
> This patch adds a new oom_group_kill event when memory.oom.group
> triggers to allow userspace to cleanly identify when an entire cgroup
> is oom killed.
> 
> Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>

Reviewed-by: Roman Gushchin <guro@fb.com>

Thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: add group_oom_kill memory event
  2021-12-03 16:24 [PATCH] mm: add group_oom_kill memory event Dan Schatzberg
  2021-12-03 23:37 ` Roman Gushchin
@ 2021-12-04  0:45 ` Shakeel Butt
  2021-12-10 20:00   ` Dan Schatzberg
  2021-12-04  9:36 ` Chris Down
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 7+ messages in thread
From: Shakeel Butt @ 2021-12-04  0:45 UTC (permalink / raw)
  To: Dan Schatzberg
  Cc: Johannes Weiner, Roman Gushchin, Tejun Heo, Zefan Li,
	Jonathan Corbet, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Matthew Wilcox (Oracle),
	Muchun Song, Alex Shi, Wei Yang, open list:CONTROL GROUP (CGROUP),
	open list:DOCUMENTATION, open list,
	open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)

On Fri, Dec 3, 2021 at 8:24 AM Dan Schatzberg <schatzberg.dan@gmail.com> wrote:
>
> Our container agent wants to know when a container exits if it was OOM
> killed or not to report to the user. We use memory.oom.group = 1 to
> ensure that OOM kills within the container's cgroup kill
> everything. Existing memory.events are insufficient for knowing if
> this triggered:
>
> 1) Our current approach reads memory.events oom_kill and reports the
> container was killed if the value is non-zero. This is erroneous in
> some cases where containers create their children cgroups with
> memory.oom.group=1 as such OOM kills will get counted against the
> container cgroup's oom_kill counter despite not actually OOM killing
> the entire container.
>
> 2) Reading memory.events.local will fail to identify OOM kills in leaf
> cgroups (that don't set memory.oom.group) within the container cgroup.
>
> This patch adds a new oom_group_kill event when memory.oom.group
> triggers to allow userspace to cleanly identify when an entire cgroup
> is oom killed.
>
> Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>

So, with this patch, will you be watching oom_group_kill from
memory.events or memory.events.local file for your use-case?

Reviewed-by: Shakeel Butt <shakeelb@google.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: add group_oom_kill memory event
  2021-12-03 16:24 [PATCH] mm: add group_oom_kill memory event Dan Schatzberg
  2021-12-03 23:37 ` Roman Gushchin
  2021-12-04  0:45 ` Shakeel Butt
@ 2021-12-04  9:36 ` Chris Down
  2021-12-06 15:19 ` Johannes Weiner
  2021-12-13 11:19 ` Michal Hocko
  4 siblings, 0 replies; 7+ messages in thread
From: Chris Down @ 2021-12-04  9:36 UTC (permalink / raw)
  To: Dan Schatzberg
  Cc: Johannes Weiner, Roman Gushchin, Tejun Heo, Zefan Li,
	Jonathan Corbet, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Matthew Wilcox (Oracle),
	Muchun Song, Alex Shi, Wei Yang, open list:CONTROL GROUP (CGROUP),
	open list:DOCUMENTATION, open list,
	open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)

Dan Schatzberg writes:
>Our container agent wants to know when a container exits if it was OOM
>killed or not to report to the user. We use memory.oom.group = 1 to
>ensure that OOM kills within the container's cgroup kill
>everything. Existing memory.events are insufficient for knowing if
>this triggered:
>
>1) Our current approach reads memory.events oom_kill and reports the
>container was killed if the value is non-zero. This is erroneous in
>some cases where containers create their children cgroups with
>memory.oom.group=1 as such OOM kills will get counted against the
>container cgroup's oom_kill counter despite not actually OOM killing
>the entire container.
>
>2) Reading memory.events.local will fail to identify OOM kills in leaf
>cgroups (that don't set memory.oom.group) within the container cgroup.
>
>This patch adds a new oom_group_kill event when memory.oom.group
>triggers to allow userspace to cleanly identify when an entire cgroup
>is oom killed.
>
>Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>

Thanks! Acking with one minor point on the documentation front.

Acked-by: Chris Down <chris@chrisdown.name>

>---
> Documentation/admin-guide/cgroup-v2.rst | 4 ++++
> include/linux/memcontrol.h              | 1 +
> mm/memcontrol.c                         | 5 +++++
> mm/oom_kill.c                           | 1 +
> 4 files changed, 11 insertions(+)
>
>diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
>index 2aeb7ae8b393..eec830ce2068 100644
>--- a/Documentation/admin-guide/cgroup-v2.rst
>+++ b/Documentation/admin-guide/cgroup-v2.rst
>@@ -1268,6 +1268,10 @@ PAGE_SIZE multiple when read back.
> 		The number of processes belonging to this cgroup
> 		killed by any kind of OOM killer.
>
>+          oom_group_kill
>+                The number of times all tasks in the cgroup were killed
>+                due to memory.oom.group.

Maybe pedantic, but this reads as unclear to me whether in cgroup with 3 tasks 
we get the value "3" or "1" when a group kill occurs.

Maybe rephrase to not make be about tasks and just say "number of times a group 
OOM occurred"?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: add group_oom_kill memory event
  2021-12-03 16:24 [PATCH] mm: add group_oom_kill memory event Dan Schatzberg
                   ` (2 preceding siblings ...)
  2021-12-04  9:36 ` Chris Down
@ 2021-12-06 15:19 ` Johannes Weiner
  2021-12-13 11:19 ` Michal Hocko
  4 siblings, 0 replies; 7+ messages in thread
From: Johannes Weiner @ 2021-12-06 15:19 UTC (permalink / raw)
  To: Dan Schatzberg
  Cc: Roman Gushchin, Tejun Heo, Zefan Li, Jonathan Corbet,
	Michal Hocko, Vladimir Davydov, Andrew Morton, Shakeel Butt,
	Matthew Wilcox (Oracle),
	Muchun Song, Alex Shi, Wei Yang, open list:CONTROL GROUP (CGROUP),
	open list:DOCUMENTATION, open list,
	open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)

This makes perfect sense, just one minor point:

On Fri, Dec 03, 2021 at 08:24:23AM -0800, Dan Schatzberg wrote:
> @@ -4390,6 +4390,9 @@ static int mem_cgroup_oom_control_read(struct seq_file *sf, void *v)
>  	seq_printf(sf, "under_oom %d\n", (bool)memcg->under_oom);
>  	seq_printf(sf, "oom_kill %lu\n",
>  		   atomic_long_read(&memcg->memory_events[MEMCG_OOM_KILL]));
> +	seq_printf(sf, "oom_group_kill %lu\n",
> +		   atomic_long_read(
> +			&memcg->memory_events[MEMCG_OOM_GROUP_KILL]));
>  	return 0;
>  }

oom_control is a cgroup1 file, but group-oom is a cgroup2-only
feature. Best to drop this hunk.

With that, please add:
Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: add group_oom_kill memory event
  2021-12-04  0:45 ` Shakeel Butt
@ 2021-12-10 20:00   ` Dan Schatzberg
  0 siblings, 0 replies; 7+ messages in thread
From: Dan Schatzberg @ 2021-12-10 20:00 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Johannes Weiner, Roman Gushchin, Tejun Heo, Zefan Li,
	Jonathan Corbet, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Matthew Wilcox (Oracle),
	Muchun Song, Alex Shi, Wei Yang, open list:CONTROL GROUP (CGROUP),
	open list:DOCUMENTATION, open list,
	open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)

On Fri, Dec 03, 2021 at 04:45:54PM -0800, Shakeel Butt wrote:
> On Fri, Dec 3, 2021 at 8:24 AM Dan Schatzberg <schatzberg.dan@gmail.com> wrote:
> >
> > Our container agent wants to know when a container exits if it was OOM
> > killed or not to report to the user. We use memory.oom.group = 1 to
> > ensure that OOM kills within the container's cgroup kill
> > everything. Existing memory.events are insufficient for knowing if
> > this triggered:
> >
> > 1) Our current approach reads memory.events oom_kill and reports the
> > container was killed if the value is non-zero. This is erroneous in
> > some cases where containers create their children cgroups with
> > memory.oom.group=1 as such OOM kills will get counted against the
> > container cgroup's oom_kill counter despite not actually OOM killing
> > the entire container.
> >
> > 2) Reading memory.events.local will fail to identify OOM kills in leaf
> > cgroups (that don't set memory.oom.group) within the container cgroup.
> >
> > This patch adds a new oom_group_kill event when memory.oom.group
> > triggers to allow userspace to cleanly identify when an entire cgroup
> > is oom killed.
> >
> > Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
> 
> So, with this patch, will you be watching oom_group_kill from
> memory.events or memory.events.local file for your use-case?
> 
> Reviewed-by: Shakeel Butt <shakeelb@google.com>

We will watch from memory.events.local. If containers want to
construct their own child cgroups and allow for group oom to occur
inside, that's fine - a future container exit should not result in us
claiming the container was OOM killed. If the container exits and
memory.event.local shows oom_group_kill > 0 then we know the container
was OOM killed.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: add group_oom_kill memory event
  2021-12-03 16:24 [PATCH] mm: add group_oom_kill memory event Dan Schatzberg
                   ` (3 preceding siblings ...)
  2021-12-06 15:19 ` Johannes Weiner
@ 2021-12-13 11:19 ` Michal Hocko
  4 siblings, 0 replies; 7+ messages in thread
From: Michal Hocko @ 2021-12-13 11:19 UTC (permalink / raw)
  To: Dan Schatzberg
  Cc: Johannes Weiner, Roman Gushchin, Tejun Heo, Zefan Li,
	Jonathan Corbet, Vladimir Davydov, Andrew Morton, Shakeel Butt,
	Matthew Wilcox (Oracle),
	Muchun Song, Alex Shi, Wei Yang, open list:CONTROL GROUP (CGROUP),
	open list:DOCUMENTATION, open list,
	open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)

On Fri 03-12-21 08:24:23, Dan Schatzberg wrote:
> Our container agent wants to know when a container exits if it was OOM
> killed or not to report to the user. We use memory.oom.group = 1 to
> ensure that OOM kills within the container's cgroup kill
> everything. Existing memory.events are insufficient for knowing if
> this triggered:

Yes our events reporting is not really friendly for this kind of usage.
OOM_KILL is accounted to the memcg of the task so it will not be updated
for inter nodes other than recursively (so never in local events).
OOM event, even though it is reported to the memcg under oom, cannot be
really used either because in some cases the oom killer is simply not
invoked. So there indeed is no clear way to tell what is happening under
the memcg hierarchy and what is happening for the whole hierarchy.
 
> 1) Our current approach reads memory.events oom_kill and reports the
> container was killed if the value is non-zero. This is erroneous in
> some cases where containers create their children cgroups with
> memory.oom.group=1 as such OOM kills will get counted against the
> container cgroup's oom_kill counter despite not actually OOM killing
> the entire container.
> 
> 2) Reading memory.events.local will fail to identify OOM kills in leaf
> cgroups (that don't set memory.oom.group) within the container cgroup.

I am a bit confused by 2). local events by definition cannot tell you
anything about children cgroups.

> This patch adds a new oom_group_kill event when memory.oom.group
> triggers to allow userspace to cleanly identify when an entire cgroup
> is oom killed.

New counter makes sense to me because it allows to tell oom events even
on the middle nodes.

> Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>

once the cgroup v1 interface part is dropped (as suggested by Johannes),
feel free to add
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  Documentation/admin-guide/cgroup-v2.rst | 4 ++++
>  include/linux/memcontrol.h              | 1 +
>  mm/memcontrol.c                         | 5 +++++
>  mm/oom_kill.c                           | 1 +
>  4 files changed, 11 insertions(+)
> 
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 2aeb7ae8b393..eec830ce2068 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1268,6 +1268,10 @@ PAGE_SIZE multiple when read back.
>  		The number of processes belonging to this cgroup
>  		killed by any kind of OOM killer.
>  
> +          oom_group_kill
> +                The number of times all tasks in the cgroup were killed
> +                due to memory.oom.group.

This can be rather confusing for hierarchicaly reported values but the
same applies for other counters as well. So be it.
[...]
> @@ -4390,6 +4390,9 @@ static int mem_cgroup_oom_control_read(struct seq_file *sf, void *v)
>  	seq_printf(sf, "under_oom %d\n", (bool)memcg->under_oom);
>  	seq_printf(sf, "oom_kill %lu\n",
>  		   atomic_long_read(&memcg->memory_events[MEMCG_OOM_KILL]));
> +	seq_printf(sf, "oom_group_kill %lu\n",
> +		   atomic_long_read(
> +			&memcg->memory_events[MEMCG_OOM_GROUP_KILL]));
>  	return 0;
>  }

This should be dropped.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-12-13 11:19 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-03 16:24 [PATCH] mm: add group_oom_kill memory event Dan Schatzberg
2021-12-03 23:37 ` Roman Gushchin
2021-12-04  0:45 ` Shakeel Butt
2021-12-10 20:00   ` Dan Schatzberg
2021-12-04  9:36 ` Chris Down
2021-12-06 15:19 ` Johannes Weiner
2021-12-13 11:19 ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).