linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] mm/oom_kill: count global and memory cgroup oom kills
@ 2017-05-25 10:28 Konstantin Khlebnikov
  2017-05-30  4:29 ` David Rientjes
  2017-06-05  8:50 ` Michal Hocko
  0 siblings, 2 replies; 9+ messages in thread
From: Konstantin Khlebnikov @ 2017-05-25 10:28 UTC (permalink / raw)
  To: linux-mm, linux-kernel, Michal Hocko
  Cc: Tetsuo Handa, Andrew Morton, Roman Guschin, David Rientjes

Show count of oom killer invocations in /proc/vmstat and count of
processes killed in memory cgroup in knob "memory.events"
(in memory.oom_control for v1 cgroup).

Also describe difference between "oom" and "oom_kill" in memory
cgroup documentation. Currently oom in memory cgroup kills tasks
iff shortage has happened inside page fault.

These counters helps in monitoring oom kills - for now
the only way is grepping for magic words in kernel log.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

---

v1: https://lkml.kernel.org/r/149520375057.74196.2843113275800730971.stgit@buzz

v2:
* count all oom kills in /proc/vmstat
* update counter for cgroup which tasks belongs to
---
 Documentation/cgroup-v2.txt   |   20 ++++++++++++++++----
 include/linux/memcontrol.h    |    5 ++++-
 include/linux/vm_event_item.h |    1 +
 mm/memcontrol.c               |    2 ++
 mm/oom_kill.c                 |    5 +++++
 mm/vmstat.c                   |    1 +
 6 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index dc5e2dcdbef4..738b1c7023ad 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -826,13 +826,25 @@ PAGE_SIZE multiple when read back.
 
 		The number of times the cgroup's memory usage was
 		about to go over the max boundary.  If direct reclaim
-		fails to bring it down, the OOM killer is invoked.
+		fails to bring it down, the cgroup goes to OOM state.
 
 	  oom
 
-		The number of times the OOM killer has been invoked in
-		the cgroup.  This may not exactly match the number of
-		processes killed but should generally be close.
+		The number of time the cgroup's memory usage was
+		reached the limit and allocation was about to fail.
+
+		Depending on context result could be invocation of OOM
+		killer and retrying allocation or failing alloction.
+
+		Failed allocation in its turn could be returned into
+		userspace as -ENOMEM or siletly ignored in cases like
+		disk readahead.	 For now OOM in memory cgroup kills
+		tasks iff shortage has happened inside page fault.
+
+	  oom_kill
+
+		The number of processes belonging to this cgroup
+		killed by any kind of OOM killer.
 
   memory.stat
 
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 899949bbb2f9..42296f7001da 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -556,8 +556,11 @@ static inline void mem_cgroup_count_vm_event(struct mm_struct *mm,
 
 	rcu_read_lock();
 	memcg = mem_cgroup_from_task(rcu_dereference(mm->owner));
-	if (likely(memcg))
+	if (likely(memcg)) {
 		this_cpu_inc(memcg->stat->events[idx]);
+		if (idx == OOM_KILL)
+			cgroup_file_notify(&memcg->events_file);
+	}
 	rcu_read_unlock();
 }
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index d84ae90ccd5c..1707e0a7d943 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -41,6 +41,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 		KSWAPD_LOW_WMARK_HIT_QUICKLY, KSWAPD_HIGH_WMARK_HIT_QUICKLY,
 		PAGEOUTRUN, PGROTATED,
 		DROP_PAGECACHE, DROP_SLAB,
+		OOM_KILL,
 #ifdef CONFIG_NUMA_BALANCING
 		NUMA_PTE_UPDATES,
 		NUMA_HUGE_PTE_UPDATES,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 94172089f52f..7011ebf2b90e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3574,6 +3574,7 @@ static int mem_cgroup_oom_control_read(struct seq_file *sf, void *v)
 
 	seq_printf(sf, "oom_kill_disable %d\n", memcg->oom_kill_disable);
 	seq_printf(sf, "under_oom %d\n", (bool)memcg->under_oom);
+	seq_printf(sf, "oom_kill %lu\n", memcg_sum_events(memcg, OOM_KILL));
 	return 0;
 }
 
@@ -5165,6 +5166,7 @@ static int memory_events_show(struct seq_file *m, void *v)
 	seq_printf(m, "high %lu\n", memcg_sum_events(memcg, MEMCG_HIGH));
 	seq_printf(m, "max %lu\n", memcg_sum_events(memcg, MEMCG_MAX));
 	seq_printf(m, "oom %lu\n", memcg_sum_events(memcg, MEMCG_OOM));
+	seq_printf(m, "oom_kill %lu\n", memcg_sum_events(memcg, OOM_KILL));
 
 	return 0;
 }
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 04c9143a8625..dd30a045ef5b 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -876,6 +876,11 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
 	/* Get a reference to safely compare mm after task_unlock(victim) */
 	mm = victim->mm;
 	mmgrab(mm);
+
+	/* Raise event before sending signal: reaper must see this */
+	count_vm_event(OOM_KILL);
+	mem_cgroup_count_vm_event(mm, OOM_KILL);
+
 	/*
 	 * We should send SIGKILL before setting TIF_MEMDIE in order to prevent
 	 * the OOM victim from depleting the memory reserves from the user
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 76f73670200a..fe80b81a86e0 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1018,6 +1018,7 @@ const char * const vmstat_text[] = {
 
 	"drop_pagecache",
 	"drop_slab",
+	"oom_kill",
 
 #ifdef CONFIG_NUMA_BALANCING
 	"numa_pte_updates",

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] mm/oom_kill: count global and memory cgroup oom kills
  2017-05-25 10:28 [PATCH v2] mm/oom_kill: count global and memory cgroup oom kills Konstantin Khlebnikov
@ 2017-05-30  4:29 ` David Rientjes
  2017-05-30  4:51   ` Konstantin Khlebnikov
  2017-06-05  8:50 ` Michal Hocko
  1 sibling, 1 reply; 9+ messages in thread
From: David Rientjes @ 2017-05-30  4:29 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, linux-kernel, Michal Hocko, Tetsuo Handa,
	Andrew Morton, Roman Guschin

On Thu, 25 May 2017, Konstantin Khlebnikov wrote:

> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 04c9143a8625..dd30a045ef5b 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -876,6 +876,11 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
>  	/* Get a reference to safely compare mm after task_unlock(victim) */
>  	mm = victim->mm;
>  	mmgrab(mm);
> +
> +	/* Raise event before sending signal: reaper must see this */

How is the oom reaper involved here?

> +	count_vm_event(OOM_KILL);
> +	mem_cgroup_count_vm_event(mm, OOM_KILL);
> +
>  	/*
>  	 * We should send SIGKILL before setting TIF_MEMDIE in order to prevent
>  	 * the OOM victim from depleting the memory reserves from the user

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] mm/oom_kill: count global and memory cgroup oom kills
  2017-05-30  4:29 ` David Rientjes
@ 2017-05-30  4:51   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 9+ messages in thread
From: Konstantin Khlebnikov @ 2017-05-30  4:51 UTC (permalink / raw)
  To: David Rientjes
  Cc: Konstantin Khlebnikov, linux-mm, Linux Kernel Mailing List,
	Michal Hocko, Tetsuo Handa, Andrew Morton, Roman Guschin

On Tue, May 30, 2017 at 7:29 AM, David Rientjes <rientjes@google.com> wrote:
> On Thu, 25 May 2017, Konstantin Khlebnikov wrote:
>
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 04c9143a8625..dd30a045ef5b 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -876,6 +876,11 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
>>       /* Get a reference to safely compare mm after task_unlock(victim) */
>>       mm = victim->mm;
>>       mmgrab(mm);
>> +
>> +     /* Raise event before sending signal: reaper must see this */
>
> How is the oom reaper involved here?

Task reaper - OOM event should happens before SIGCHLD.

>
>> +     count_vm_event(OOM_KILL);
>> +     mem_cgroup_count_vm_event(mm, OOM_KILL);
>> +
>>       /*
>>        * We should send SIGKILL before setting TIF_MEMDIE in order to prevent
>>        * the OOM victim from depleting the memory reserves from the user
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] mm/oom_kill: count global and memory cgroup oom kills
  2017-05-25 10:28 [PATCH v2] mm/oom_kill: count global and memory cgroup oom kills Konstantin Khlebnikov
  2017-05-30  4:29 ` David Rientjes
@ 2017-06-05  8:50 ` Michal Hocko
  2017-06-05 14:27   ` Konstantin Khlebnikov
  1 sibling, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2017-06-05  8:50 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, linux-kernel, Tetsuo Handa, Andrew Morton,
	Roman Guschin, David Rientjes

On Thu 25-05-17 13:28:30, Konstantin Khlebnikov wrote:
> Show count of oom killer invocations in /proc/vmstat and count of
> processes killed in memory cgroup in knob "memory.events"
> (in memory.oom_control for v1 cgroup).
> 
> Also describe difference between "oom" and "oom_kill" in memory
> cgroup documentation. Currently oom in memory cgroup kills tasks
> iff shortage has happened inside page fault.
> 
> These counters helps in monitoring oom kills - for now
> the only way is grepping for magic words in kernel log.

Yes this is less than optimal and the counter sounds like a good step
forward. I have 2 comments to the patch though.

[...]

> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 899949bbb2f9..42296f7001da 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -556,8 +556,11 @@ static inline void mem_cgroup_count_vm_event(struct mm_struct *mm,
>  
>  	rcu_read_lock();
>  	memcg = mem_cgroup_from_task(rcu_dereference(mm->owner));
> -	if (likely(memcg))
> +	if (likely(memcg)) {
>  		this_cpu_inc(memcg->stat->events[idx]);
> +		if (idx == OOM_KILL)
> +			cgroup_file_notify(&memcg->events_file);
> +	}
>  	rcu_read_unlock();

Well, this is ugly. I see how you want to share the global counter and
the memcg event which needs the notification. But I cannot say this
would be really easy to follow. Can we have at least a comment in
memcg_event_item enum definition?

> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 04c9143a8625..dd30a045ef5b 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -876,6 +876,11 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
>  	/* Get a reference to safely compare mm after task_unlock(victim) */
>  	mm = victim->mm;
>  	mmgrab(mm);
> +
> +	/* Raise event before sending signal: reaper must see this */
> +	count_vm_event(OOM_KILL);
> +	mem_cgroup_count_vm_event(mm, OOM_KILL);
> +
>  	/*
>  	 * We should send SIGKILL before setting TIF_MEMDIE in order to prevent
>  	 * the OOM victim from depleting the memory reserves from the user

Why don't you count tasks which share mm with the oom victim? 
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 0e2c925e7826..9a95947a60ba 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -924,6 +924,8 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
 		 */
 		if (unlikely(p->flags & PF_KTHREAD))
 			continue;
+		count_vm_event(OOM_KILL);
+		count_memcg_event_mm(mm, OOM_KILL);
 		do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, true);
 	}
 	rcu_read_unlock();

Other than that looks good to me.
Acked-by: Michal Hocko <mhocko@suse.com>
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] mm/oom_kill: count global and memory cgroup oom kills
  2017-06-05  8:50 ` Michal Hocko
@ 2017-06-05 14:27   ` Konstantin Khlebnikov
  2017-06-08  9:44     ` Michal Hocko
  2017-09-13  4:51     ` PINTU KUMAR
  0 siblings, 2 replies; 9+ messages in thread
From: Konstantin Khlebnikov @ 2017-06-05 14:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, Tetsuo Handa, Andrew Morton,
	Roman Guschin, David Rientjes



On 05.06.2017 11:50, Michal Hocko wrote:
> On Thu 25-05-17 13:28:30, Konstantin Khlebnikov wrote:
>> Show count of oom killer invocations in /proc/vmstat and count of
>> processes killed in memory cgroup in knob "memory.events"
>> (in memory.oom_control for v1 cgroup).
>>
>> Also describe difference between "oom" and "oom_kill" in memory
>> cgroup documentation. Currently oom in memory cgroup kills tasks
>> iff shortage has happened inside page fault.
>>
>> These counters helps in monitoring oom kills - for now
>> the only way is grepping for magic words in kernel log.
> 
> Yes this is less than optimal and the counter sounds like a good step
> forward. I have 2 comments to the patch though.
> 
> [...]
> 
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index 899949bbb2f9..42296f7001da 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -556,8 +556,11 @@ static inline void mem_cgroup_count_vm_event(struct mm_struct *mm,
>>   
>>   	rcu_read_lock();
>>   	memcg = mem_cgroup_from_task(rcu_dereference(mm->owner));
>> -	if (likely(memcg))
>> +	if (likely(memcg)) {
>>   		this_cpu_inc(memcg->stat->events[idx]);
>> +		if (idx == OOM_KILL)
>> +			cgroup_file_notify(&memcg->events_file);
>> +	}
>>   	rcu_read_unlock();
> 
> Well, this is ugly. I see how you want to share the global counter and
> the memcg event which needs the notification. But I cannot say this
> would be really easy to follow. Can we have at least a comment in
> memcg_event_item enum definition?

Yep, this is a little bit ugly.
But this funciton is static-inline and idx always constant so resulting code is fine.

> 
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 04c9143a8625..dd30a045ef5b 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -876,6 +876,11 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
>>   	/* Get a reference to safely compare mm after task_unlock(victim) */
>>   	mm = victim->mm;
>>   	mmgrab(mm);
>> +
>> +	/* Raise event before sending signal: reaper must see this */
>> +	count_vm_event(OOM_KILL);
>> +	mem_cgroup_count_vm_event(mm, OOM_KILL);
>> +
>>   	/*
>>   	 * We should send SIGKILL before setting TIF_MEMDIE in order to prevent
>>   	 * the OOM victim from depleting the memory reserves from the user
> 
> Why don't you count tasks which share mm with the oom victim?

Yes, this makes sense. But these kills are not logged thus counter will differs from logged events.
Also these tasks might live in different cgroups, so counting to mm owner isn't correct.

> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 0e2c925e7826..9a95947a60ba 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -924,6 +924,8 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
>   		 */
>   		if (unlikely(p->flags & PF_KTHREAD))
>   			continue;
> +		count_vm_event(OOM_KILL);
> +		count_memcg_event_mm(mm, OOM_KILL);
>   		do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, true);
>   	}
>   	rcu_read_unlock();
> 
> Other than that looks good to me.
> Acked-by: Michal Hocko <mhocko@suse.com>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] mm/oom_kill: count global and memory cgroup oom kills
  2017-06-05 14:27   ` Konstantin Khlebnikov
@ 2017-06-08  9:44     ` Michal Hocko
  2017-09-13  4:51     ` PINTU KUMAR
  1 sibling, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2017-06-08  9:44 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, linux-kernel, Tetsuo Handa, Andrew Morton,
	Roman Guschin, David Rientjes

On Mon 05-06-17 17:27:50, Konstantin Khlebnikov wrote:
> 
> 
> On 05.06.2017 11:50, Michal Hocko wrote:
> >On Thu 25-05-17 13:28:30, Konstantin Khlebnikov wrote:
[...]
> >>index 04c9143a8625..dd30a045ef5b 100644
> >>--- a/mm/oom_kill.c
> >>+++ b/mm/oom_kill.c
> >>@@ -876,6 +876,11 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
> >>  	/* Get a reference to safely compare mm after task_unlock(victim) */
> >>  	mm = victim->mm;
> >>  	mmgrab(mm);
> >>+
> >>+	/* Raise event before sending signal: reaper must see this */
> >>+	count_vm_event(OOM_KILL);
> >>+	mem_cgroup_count_vm_event(mm, OOM_KILL);
> >>+
> >>  	/*
> >>  	 * We should send SIGKILL before setting TIF_MEMDIE in order to prevent
> >>  	 * the OOM victim from depleting the memory reserves from the user
> >
> >Why don't you count tasks which share mm with the oom victim?
> 
> Yes, this makes sense. But these kills are not logged thus counter
> will differs from logged events.

Yes they are not but does that matter? Do we want _all_ or only some oom
kills being counted.

> Also these tasks might live in different cgroups, so counting to mm
> owner isn't correct.

Well, the situation with mm shared between different memcgs is always
hairy. We try to charge mm->owner but I suspect we are not consistent in
that. I would have to double check because it's been a long ago since
I've investigated that. My point is that once you count OOM kills you
should count all the tasks IMHO.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] mm/oom_kill: count global and memory cgroup oom kills
  2017-06-05 14:27   ` Konstantin Khlebnikov
  2017-06-08  9:44     ` Michal Hocko
@ 2017-09-13  4:51     ` PINTU KUMAR
  2017-09-13  7:35       ` Konstantin Khlebnikov
  1 sibling, 1 reply; 9+ messages in thread
From: PINTU KUMAR @ 2017-09-13  4:51 UTC (permalink / raw)
  To: Michal Hocko, Konstantin Khlebnikov
  Cc: linux-mm, linux-kernel, Tetsuo Handa, Andrew Morton,
	Roman Guschin, David Rientjes, Greg Kroah-Hartman

[-- Attachment #1: Type: text/plain, Size: 4198 bytes --]



Hi,

I have submitted a similar patch 2 years ago (Oct/2015).
But at that time the patch was rejected.
Here is the history:
https://lkml.org/lkml/2015/10/1/372

Now I see the similar patch got accepted. At least the initial idea and the objective were same. 
Even I were not included here.
 On one side I feel happy that my initial idea got accepted now.But on the other side it really hurts :(

Thanks,Pintu
 
 On Monday 5 June 2017, 7:57:57 PM IST, Konstantin Khlebnikov <khlebnikov@yandex-team.ru> wrote: 


On 05.06.2017 11:50, Michal Hocko wrote:
> On Thu 25-05-17 13:28:30, Konstantin Khlebnikov wrote:
>> Show count of oom killer invocations in /proc/vmstat and count of
>> processes killed in memory cgroup in knob "memory.events"
>> (in memory.oom_control for v1 cgroup).
>>
>> Also describe difference between "oom" and "oom_kill" in memory
>> cgroup documentation. Currently oom in memory cgroup kills tasks
>> iff shortage has happened inside page fault.
>>
>> These counters helps in monitoring oom kills - for now
>> the only way is grepping for magic words in kernel log.
> 
> Yes this is less than optimal and the counter sounds like a good step
> forward. I have 2 comments to the patch though.
> 
> [...]
> 
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index 899949bbb2f9..42296f7001da 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -556,8 +556,11 @@ static inline void mem_cgroup_count_vm_event(struct mm_struct *mm,
>>  
>>      rcu_read_lock();
>>      memcg = mem_cgroup_from_task(rcu_dereference(mm->owner));
>> -    if (likely(memcg))
>> +    if (likely(memcg)) {
>>          this_cpu_inc(memcg->stat->events[idx]);
>> +        if (idx == OOM_KILL)
>> +            cgroup_file_notify(&memcg->events_file);
>> +    }
>>      rcu_read_unlock();
> 
> Well, this is ugly. I see how you want to share the global counter and
> the memcg event which needs the notification. But I cannot say this
> would be really easy to follow. Can we have at least a comment in
> memcg_event_item enum definition?

Yep, this is a little bit ugly.
But this funciton is static-inline and idx always constant so resulting code is fine.

> 
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 04c9143a8625..dd30a045ef5b 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -876,6 +876,11 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
>>      /* Get a reference to safely compare mm after task_unlock(victim) */
>>      mm = victim->mm;
>>      mmgrab(mm);
>> +
>> +    /* Raise event before sending signal: reaper must see this */
>> +    count_vm_event(OOM_KILL);
>> +    mem_cgroup_count_vm_event(mm, OOM_KILL);
>> +
>>      /*
>>      * We should send SIGKILL before setting TIF_MEMDIE in order to prevent
>>      * the OOM victim from depleting the memory reserves from the user
> 
> Why don't you count tasks which share mm with the oom victim?

Yes, this makes sense. But these kills are not logged thus counter will differs from logged events.
Also these tasks might live in different cgroups, so counting to mm owner isn't correct.


> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 0e2c925e7826..9a95947a60ba 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -924,6 +924,8 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
>          */
>          if (unlikely(p->flags & PF_KTHREAD))
>              continue;
> +        count_vm_event(OOM_KILL);
> +        count_memcg_event_mm(mm, OOM_KILL);
>          do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, true);
>      }
>      rcu_read_unlock();
> 
> Other than that looks good to me.
> Acked-by: Michal Hocko <mhocko@suse.com>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


[-- Attachment #2: Type: text/html, Size: 5813 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] mm/oom_kill: count global and memory cgroup oom kills
  2017-09-13  4:51     ` PINTU KUMAR
@ 2017-09-13  7:35       ` Konstantin Khlebnikov
  2017-09-13 14:33         ` Pintu Kumar
  0 siblings, 1 reply; 9+ messages in thread
From: Konstantin Khlebnikov @ 2017-09-13  7:35 UTC (permalink / raw)
  To: PINTU KUMAR, Michal Hocko
  Cc: linux-mm, linux-kernel, Tetsuo Handa, Andrew Morton,
	Roman Guschin, David Rientjes, Greg Kroah-Hartman

On 13.09.2017 07:51, PINTU KUMAR wrote:
> 
> 
> Hi,
> 
> I have submitted a similar patch 2 years ago (Oct/2015).
> But at that time the patch was rejected.
> Here is the history:
> https://lkml.org/lkml/2015/10/1/372
> 
> Now I see the similar patch got accepted. At least the initial idea and the objective were same.
> Even I were not included here.
> On one side I feel happy that my initial idea got accepted now.
> But on the other side it really hurts :(
> 

If this makes you feel better: mine version also fixes uncertainty in memory cgroup statistics.

> 
> Thanks,
> Pintu
> 
> 
> On Monday 5 June 2017, 7:57:57 PM IST, Konstantin Khlebnikov <khlebnikov@yandex-team.ru> wrote:
> 
> 
> On 05.06.2017 11:50, Michal Hocko wrote:
>  > On Thu 25-05-17 13:28:30, Konstantin Khlebnikov wrote:
>  >> Show count of oom killer invocations in /proc/vmstat and count of
>  >> processes killed in memory cgroup in knob "memory.events"
>  >> (in memory.oom_control for v1 cgroup).
>  >>
>  >> Also describe difference between "oom" and "oom_kill" in memory
>  >> cgroup documentation. Currently oom in memory cgroup kills tasks
>  >> iff shortage has happened inside page fault.
>  >>
>  >> These counters helps in monitoring oom kills - for now
>  >> the only way is grepping for magic words in kernel log.
>  >
>  > Yes this is less than optimal and the counter sounds like a good step
>  > forward. I have 2 comments to the patch though.
>  >
>  > [...]
>  >
>  >> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>  >> index 899949bbb2f9..42296f7001da 100644
>  >> --- a/include/linux/memcontrol.h
>  >> +++ b/include/linux/memcontrol.h
>  >> @@ -556,8 +556,11 @@ static inline void mem_cgroup_count_vm_event(struct mm_struct *mm,
>  >>
>  >>      rcu_read_lock();
>  >>      memcg = mem_cgroup_from_task(rcu_dereference(mm->owner));
>  >> -    if (likely(memcg))
>  >> +    if (likely(memcg)) {
>  >>          this_cpu_inc(memcg->stat->events[idx]);
>  >> +        if (idx == OOM_KILL)
>  >> +            cgroup_file_notify(&memcg->events_file);
>  >> +    }
>  >>      rcu_read_unlock();
>  >
>  > Well, this is ugly. I see how you want to share the global counter and
>  > the memcg event which needs the notification. But I cannot say this
>  > would be really easy to follow. Can we have at least a comment in
>  > memcg_event_item enum definition?
> 
> Yep, this is a little bit ugly.
> But this funciton is static-inline and idx always constant so resulting code is fine.
> 
>  >
>  >> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>  >> index 04c9143a8625..dd30a045ef5b 100644
>  >> --- a/mm/oom_kill.c
>  >> +++ b/mm/oom_kill.c
>  >> @@ -876,6 +876,11 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
>  >>      /* Get a reference to safely compare mm after task_unlock(victim) */
>  >>      mm = victim->mm;
>  >>      mmgrab(mm);
>  >> +
>  >> +    /* Raise event before sending signal: reaper must see this */
>  >> +    count_vm_event(OOM_KILL);
>  >> +    mem_cgroup_count_vm_event(mm, OOM_KILL);
>  >> +
>  >>      /*
>  >>      * We should send SIGKILL before setting TIF_MEMDIE in order to prevent
>  >>      * the OOM victim from depleting the memory reserves from the user
>  >
>  > Why don't you count tasks which share mm with the oom victim?
> 
> Yes, this makes sense. But these kills are not logged thus counter will differs from logged events.
> Also these tasks might live in different cgroups, so counting to mm owner isn't correct.
> 
> 
>  > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>  > index 0e2c925e7826..9a95947a60ba 100644
>  > --- a/mm/oom_kill.c
>  > +++ b/mm/oom_kill.c
>  > @@ -924,6 +924,8 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
>  >          */
>  >          if (unlikely(p->flags & PF_KTHREAD))
>  >              continue;
>  > +        count_vm_event(OOM_KILL);
>  > +        count_memcg_event_mm(mm, OOM_KILL);
>  >          do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, true);
>  >      }
>  >      rcu_read_unlock();
>  >
>  > Other than that looks good to me.
>  > Acked-by: Michal Hocko <mhocko@suse.com>
>  >
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] mm/oom_kill: count global and memory cgroup oom kills
  2017-09-13  7:35       ` Konstantin Khlebnikov
@ 2017-09-13 14:33         ` Pintu Kumar
  0 siblings, 0 replies; 9+ messages in thread
From: Pintu Kumar @ 2017-09-13 14:33 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: PINTU KUMAR, Michal Hocko, linux-mm, linux-kernel, Tetsuo Handa,
	Andrew Morton, Roman Guschin, David Rientjes, Greg Kroah-Hartman,
	Pintu Kumar

On Wed, Sep 13, 2017 at 1:05 PM, Konstantin Khlebnikov
<khlebnikov@yandex-team.ru> wrote:
> On 13.09.2017 07:51, PINTU KUMAR wrote:
>>
>>
>>
>> Hi,
>>
>> I have submitted a similar patch 2 years ago (Oct/2015).
>> But at that time the patch was rejected.
>> Here is the history:
>> https://lkml.org/lkml/2015/10/1/372
>>
>> Now I see the similar patch got accepted. At least the initial idea and
>> the objective were same.
>> Even I were not included here.
>> On one side I feel happy that my initial idea got accepted now.
>> But on the other side it really hurts :(
>>
>
> If this makes you feel better: mine version also fixes uncertainty in memory
> cgroup statistics.
>

Yes, my initial version was also just about global oom counter. And
planning to add more later. But initial version itself was rejected.
Sometimes its really painful to know how same ideas are treated differently :(

Anyways, thanks for this version. I think it is really helpful as per
my experience.
Specially in production system where logs are disabled and no root
access. But still we can access the /proc/vmstat fields.
This was my point.


>>
>> Thanks,
>> Pintu
>>
>>
>> On Monday 5 June 2017, 7:57:57 PM IST, Konstantin Khlebnikov
>> <khlebnikov@yandex-team.ru> wrote:
>>
>>
>> On 05.06.2017 11:50, Michal Hocko wrote:
>>  > On Thu 25-05-17 13:28:30, Konstantin Khlebnikov wrote:
>>  >> Show count of oom killer invocations in /proc/vmstat and count of
>>  >> processes killed in memory cgroup in knob "memory.events"
>>  >> (in memory.oom_control for v1 cgroup).
>>  >>
>>  >> Also describe difference between "oom" and "oom_kill" in memory
>>  >> cgroup documentation. Currently oom in memory cgroup kills tasks
>>  >> iff shortage has happened inside page fault.
>>  >>
>>  >> These counters helps in monitoring oom kills - for now
>>  >> the only way is grepping for magic words in kernel log.
>>  >
>>  > Yes this is less than optimal and the counter sounds like a good step
>>  > forward. I have 2 comments to the patch though.
>>  >
>>  > [...]
>>  >
>>  >> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>>  >> index 899949bbb2f9..42296f7001da 100644
>>  >> --- a/include/linux/memcontrol.h
>>  >> +++ b/include/linux/memcontrol.h
>>  >> @@ -556,8 +556,11 @@ static inline void
>> mem_cgroup_count_vm_event(struct mm_struct *mm,
>>  >>
>>  >>      rcu_read_lock();
>>  >>      memcg = mem_cgroup_from_task(rcu_dereference(mm->owner));
>>  >> -    if (likely(memcg))
>>  >> +    if (likely(memcg)) {
>>  >>          this_cpu_inc(memcg->stat->events[idx]);
>>  >> +        if (idx == OOM_KILL)
>>  >> +            cgroup_file_notify(&memcg->events_file);
>>  >> +    }
>>  >>      rcu_read_unlock();
>>  >
>>  > Well, this is ugly. I see how you want to share the global counter and
>>  > the memcg event which needs the notification. But I cannot say this
>>  > would be really easy to follow. Can we have at least a comment in
>>  > memcg_event_item enum definition?
>>
>> Yep, this is a little bit ugly.
>> But this funciton is static-inline and idx always constant so resulting
>> code is fine.
>>
>>  >
>>  >> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>>  >> index 04c9143a8625..dd30a045ef5b 100644
>>  >> --- a/mm/oom_kill.c
>>  >> +++ b/mm/oom_kill.c
>>  >> @@ -876,6 +876,11 @@ static void oom_kill_process(struct oom_control
>> *oc, const char *message)
>>  >>      /* Get a reference to safely compare mm after task_unlock(victim)
>> */
>>  >>      mm = victim->mm;
>>  >>      mmgrab(mm);
>>  >> +
>>  >> +    /* Raise event before sending signal: reaper must see this */
>>  >> +    count_vm_event(OOM_KILL);
>>  >> +    mem_cgroup_count_vm_event(mm, OOM_KILL);
>>  >> +
>>  >>      /*
>>  >>      * We should send SIGKILL before setting TIF_MEMDIE in order to
>> prevent
>>  >>      * the OOM victim from depleting the memory reserves from the user
>>  >
>>  > Why don't you count tasks which share mm with the oom victim?
>>
>> Yes, this makes sense. But these kills are not logged thus counter will
>> differs from logged events.
>> Also these tasks might live in different cgroups, so counting to mm owner
>> isn't correct.
>>
>>
>>  > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>>  > index 0e2c925e7826..9a95947a60ba 100644
>>  > --- a/mm/oom_kill.c
>>  > +++ b/mm/oom_kill.c
>>  > @@ -924,6 +924,8 @@ static void oom_kill_process(struct oom_control
>> *oc, const char *message)
>>  >          */
>>  >          if (unlikely(p->flags & PF_KTHREAD))
>>  >              continue;
>>  > +        count_vm_event(OOM_KILL);
>>  > +        count_memcg_event_mm(mm, OOM_KILL);
>>  >          do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, true);
>>  >      }
>>  >      rcu_read_unlock();
>>  >
>>  > Other than that looks good to me.
>>  > Acked-by: Michal Hocko <mhocko@suse.com>
>>  >
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-09-13 14:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-25 10:28 [PATCH v2] mm/oom_kill: count global and memory cgroup oom kills Konstantin Khlebnikov
2017-05-30  4:29 ` David Rientjes
2017-05-30  4:51   ` Konstantin Khlebnikov
2017-06-05  8:50 ` Michal Hocko
2017-06-05 14:27   ` Konstantin Khlebnikov
2017-06-08  9:44     ` Michal Hocko
2017-09-13  4:51     ` PINTU KUMAR
2017-09-13  7:35       ` Konstantin Khlebnikov
2017-09-13 14:33         ` Pintu Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).