All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread.
@ 2018-03-20 11:57 Tetsuo Handa
  2018-03-20 11:57 ` [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt Tetsuo Handa
  2018-03-20 12:28 ` [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread Michal Hocko
  0 siblings, 2 replies; 20+ messages in thread
From: Tetsuo Handa @ 2018-03-20 11:57 UTC (permalink / raw)
  To: linux-mm; +Cc: Tetsuo Handa, David Rientjes, Michal Hocko

I found that it is not difficult to hit "oom_reaper: unable to reap pid:"
messages if the victim thread is doing copy_process(). Since I noticed
that it is likely helpful to show trace of unable to reap victim thread
for finding locations which should use killable wait, this patch does so.

[  226.608508] oom_reaper: unable to reap pid:9261 (a.out)
[  226.611971] a.out           D13056  9261   6927 0x00100084
[  226.615879] Call Trace:
[  226.617926]  ? __schedule+0x25f/0x780
[  226.620559]  schedule+0x2d/0x80
[  226.623356]  rwsem_down_write_failed+0x2bb/0x440
[  226.626426]  ? rwsem_down_write_failed+0x55/0x440
[  226.629458]  ? anon_vma_fork+0x124/0x150
[  226.632679]  call_rwsem_down_write_failed+0x13/0x20
[  226.635884]  down_write+0x49/0x60
[  226.638867]  ? copy_process.part.41+0x12f2/0x1fe0
[  226.642042]  copy_process.part.41+0x12f2/0x1fe0 /* i_mmap_lock_write() in dup_mmap() */
[  226.645087]  ? _do_fork+0xe6/0x560
[  226.647991]  _do_fork+0xe6/0x560
[  226.650495]  ? syscall_trace_enter+0x1a9/0x240
[  226.653443]  ? retint_user+0x18/0x18
[  226.656601]  ? page_fault+0x2f/0x50
[  226.659159]  ? trace_hardirqs_on_caller+0x11f/0x1b0
[  226.662399]  do_syscall_64+0x74/0x230
[  226.664989]  entry_SYSCALL_64_after_hwframe+0x42/0xb7

[  380.525910] oom_reaper: unable to reap pid:6927 (a.out)
[  380.529274] a.out           R  running task    13544  6927      1 0x00100084
[  380.533264] Call Trace:
[  380.535433]  ? __schedule+0x25f/0x780
[  380.538052]  ? mark_held_locks+0x60/0x80
[  380.540775]  schedule+0x2d/0x80
[  380.543258]  schedule_timeout+0x1a4/0x350
[  380.546053]  ? __next_timer_interrupt+0xd0/0xd0
[  380.549033]  msleep+0x25/0x30
[  380.551458]  shrink_inactive_list+0x5b7/0x690
[  380.554408]  ? __lock_acquire+0x1f1/0xfd0
[  380.557228]  ? find_held_lock+0x2d/0x90
[  380.559959]  shrink_node_memcg+0x340/0x770
[  380.562776]  ? __lock_acquire+0x246/0xfd0
[  380.565532]  ? mem_cgroup_iter+0x121/0x4f0
[  380.568337]  ? mem_cgroup_iter+0x121/0x4f0
[  380.571203]  shrink_node+0xd8/0x370
[  380.573745]  do_try_to_free_pages+0xe3/0x390
[  380.576759]  try_to_free_pages+0xc8/0x110
[  380.579476]  __alloc_pages_slowpath+0x28a/0x8d9
[  380.582639]  __alloc_pages_nodemask+0x21d/0x260
[  380.585662]  new_slab+0x558/0x760
[  380.588188]  ___slab_alloc+0x353/0x6f0
[  380.590962]  ? copy_process.part.41+0x121f/0x1fe0
[  380.594008]  ? find_held_lock+0x2d/0x90
[  380.596750]  ? copy_process.part.41+0x121f/0x1fe0
[  380.599852]  __slab_alloc+0x41/0x7a
[  380.602469]  ? copy_process.part.41+0x121f/0x1fe0
[  380.605607]  kmem_cache_alloc+0x1a6/0x1f0
[  380.608528]  copy_process.part.41+0x121f/0x1fe0 /* kmem_cache_alloc(GFP_KERNEL) in dup_mmap() */
[  380.611613]  ? _do_fork+0xe6/0x560
[  380.614299]  _do_fork+0xe6/0x560
[  380.616752]  ? syscall_trace_enter+0x1a9/0x240
[  380.619813]  ? retint_user+0x18/0x18
[  380.622588]  ? page_fault+0x2f/0x50
[  380.625184]  ? trace_hardirqs_on_caller+0x11f/0x1b0
[  380.628841]  do_syscall_64+0x74/0x230
[  380.631660]  entry_SYSCALL_64_after_hwframe+0x42/0xb7

[  600.285293] oom_reaper: unable to reap pid:7550 (a.out)
[  600.289112] a.out           D13104  7550   6931 0x00100084
[  600.292918] Call Trace:
[  600.295506]  ? __schedule+0x25f/0x780
[  600.298614]  ? __lock_acquire+0x246/0xfd0
[  600.301879]  schedule+0x2d/0x80
[  600.304623]  schedule_timeout+0x1fd/0x350
[  600.307633]  ? find_held_lock+0x2d/0x90
[  600.310909]  ? mark_held_locks+0x60/0x80
[  600.314001]  ? _raw_spin_unlock_irq+0x24/0x30
[  600.317382]  wait_for_completion+0xab/0x130
[  600.320579]  ? wake_up_q+0x70/0x70
[  600.323419]  flush_work+0x1bd/0x260
[  600.326434]  ? flush_work+0x174/0x260
[  600.329332]  ? destroy_worker+0x90/0x90
[  600.332301]  drain_all_pages+0x16d/0x1e0
[  600.335464]  __alloc_pages_slowpath+0x443/0x8d9
[  600.338743]  __alloc_pages_nodemask+0x21d/0x260
[  600.342178]  new_slab+0x558/0x760
[  600.344861]  ___slab_alloc+0x353/0x6f0
[  600.347744]  ? copy_process.part.41+0x121f/0x1fe0
[  600.351213]  ? find_held_lock+0x2d/0x90
[  600.354119]  ? copy_process.part.41+0x121f/0x1fe0
[  600.357491]  __slab_alloc+0x41/0x7a
[  600.360206]  ? copy_process.part.41+0x121f/0x1fe0
[  600.363480]  kmem_cache_alloc+0x1a6/0x1f0
[  600.366579]  copy_process.part.41+0x121f/0x1fe0 /* kmem_cache_alloc(GFP_KERNEL) in dup_mmap() */
[  600.369761]  ? _do_fork+0xe6/0x560
[  600.372382]  _do_fork+0xe6/0x560
[  600.375142]  ? syscall_trace_enter+0x1a9/0x240
[  600.378258]  ? retint_user+0x18/0x18
[  600.380978]  ? page_fault+0x2f/0x50
[  600.383814]  ? trace_hardirqs_on_caller+0x11f/0x1b0
[  600.387096]  do_syscall_64+0x74/0x230
[  600.389928]  entry_SYSCALL_64_after_hwframe+0x42/0xb7

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
---
 mm/oom_kill.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 5336985..900300c 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -41,6 +41,7 @@
 #include <linux/kthread.h>
 #include <linux/init.h>
 #include <linux/mmu_notifier.h>
+#include <linux/sched/debug.h>
 
 #include <asm/tlb.h>
 #include "internal.h"
@@ -596,6 +597,7 @@ static void oom_reap_task(struct task_struct *tsk)
 
 	pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
 		task_pid_nr(tsk), tsk->comm);
+	sched_show_task(tsk);
 	debug_show_all_locks();
 
 done:
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt.
  2018-03-20 11:57 [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread Tetsuo Handa
@ 2018-03-20 11:57 ` Tetsuo Handa
  2018-03-20 12:12   ` Michal Hocko
  2018-03-20 12:28 ` [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread Michal Hocko
  1 sibling, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2018-03-20 11:57 UTC (permalink / raw)
  To: linux-mm; +Cc: Tetsuo Handa, David Rientjes, Michal Hocko

I got "oom_reaper: unable to reap pid:" messages when the victim thread
was blocked inside free_pgtables() (which occurred after returning from
unmap_vmas() and setting MMF_OOM_SKIP). We don't need to complain when
__oom_reap_task_mm() returned true (by e.g. finding MMF_OOM_SKIP already
set) when oom_reap_task() was trying MAX_OOM_REAP_RETRIES'th attempt.

[  663.593821] Killed process 7558 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
[  664.684801] oom_reaper: unable to reap pid:7558 (a.out)
[  664.892292] a.out           D13272  7558   6931 0x00100084
[  664.895765] Call Trace:
[  664.897574]  ? __schedule+0x25f/0x780
[  664.900099]  schedule+0x2d/0x80
[  664.902260]  rwsem_down_write_failed+0x2bb/0x440
[  664.905249]  ? rwsem_down_write_failed+0x55/0x440
[  664.908335]  ? free_pgd_range+0x569/0x5e0
[  664.911145]  call_rwsem_down_write_failed+0x13/0x20
[  664.914121]  down_write+0x49/0x60
[  664.916519]  ? unlink_file_vma+0x28/0x50
[  664.919255]  unlink_file_vma+0x28/0x50
[  664.922234]  free_pgtables+0x36/0x100
[  664.924797]  exit_mmap+0xbb/0x180
[  664.927220]  mmput+0x50/0x110
[  664.929504]  copy_process.part.41+0xb61/0x1fe0
[  664.932448]  ? _do_fork+0xe6/0x560
[  664.934902]  ? _do_fork+0xe6/0x560
[  664.937361]  _do_fork+0xe6/0x560
[  664.939742]  ? syscall_trace_enter+0x1a9/0x240
[  664.942693]  ? retint_user+0x18/0x18
[  664.945309]  ? page_fault+0x2f/0x50
[  664.947896]  ? trace_hardirqs_on_caller+0x11f/0x1b0
[  664.951075]  do_syscall_64+0x74/0x230
[  664.953747]  entry_SYSCALL_64_after_hwframe+0x42/0xb7

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
---
 mm/oom_kill.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 900300c..1cb2b98 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -588,11 +588,11 @@ static void oom_reap_task(struct task_struct *tsk)
 	struct mm_struct *mm = tsk->signal->oom_mm;
 
 	/* Retry the down_read_trylock(mmap_sem) a few times */
-	while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm))
+	while (attempts++ < MAX_OOM_REAP_RETRIES) {
+		if (__oom_reap_task_mm(tsk, mm))
+			goto done;
 		schedule_timeout_idle(HZ/10);
-
-	if (attempts <= MAX_OOM_REAP_RETRIES)
-		goto done;
+	}
 
 
 	pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt.
  2018-03-20 11:57 ` [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt Tetsuo Handa
@ 2018-03-20 12:12   ` Michal Hocko
  2018-03-20 12:37     ` Tetsuo Handa
  0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2018-03-20 12:12 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-mm, David Rientjes

On Tue 20-03-18 20:57:56, Tetsuo Handa wrote:
> I got "oom_reaper: unable to reap pid:" messages when the victim thread
> was blocked inside free_pgtables() (which occurred after returning from
> unmap_vmas() and setting MMF_OOM_SKIP). We don't need to complain when
> __oom_reap_task_mm() returned true (by e.g. finding MMF_OOM_SKIP already
> set) when oom_reap_task() was trying MAX_OOM_REAP_RETRIES'th attempt.
> 
> [  663.593821] Killed process 7558 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
> [  664.684801] oom_reaper: unable to reap pid:7558 (a.out)

I do not see "oom_reaper: reaped process..." so has the task been
reaped?

> [  664.892292] a.out           D13272  7558   6931 0x00100084
> [  664.895765] Call Trace:
> [  664.897574]  ? __schedule+0x25f/0x780
> [  664.900099]  schedule+0x2d/0x80
> [  664.902260]  rwsem_down_write_failed+0x2bb/0x440
> [  664.905249]  ? rwsem_down_write_failed+0x55/0x440
> [  664.908335]  ? free_pgd_range+0x569/0x5e0
> [  664.911145]  call_rwsem_down_write_failed+0x13/0x20
> [  664.914121]  down_write+0x49/0x60
> [  664.916519]  ? unlink_file_vma+0x28/0x50
> [  664.919255]  unlink_file_vma+0x28/0x50
> [  664.922234]  free_pgtables+0x36/0x100
> [  664.924797]  exit_mmap+0xbb/0x180
> [  664.927220]  mmput+0x50/0x110
> [  664.929504]  copy_process.part.41+0xb61/0x1fe0
> [  664.932448]  ? _do_fork+0xe6/0x560
> [  664.934902]  ? _do_fork+0xe6/0x560
> [  664.937361]  _do_fork+0xe6/0x560
> [  664.939742]  ? syscall_trace_enter+0x1a9/0x240
> [  664.942693]  ? retint_user+0x18/0x18
> [  664.945309]  ? page_fault+0x2f/0x50
> [  664.947896]  ? trace_hardirqs_on_caller+0x11f/0x1b0
> [  664.951075]  do_syscall_64+0x74/0x230
> [  664.953747]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
>  mm/oom_kill.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 900300c..1cb2b98 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -588,11 +588,11 @@ static void oom_reap_task(struct task_struct *tsk)
>  	struct mm_struct *mm = tsk->signal->oom_mm;
>  
>  	/* Retry the down_read_trylock(mmap_sem) a few times */
> -	while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm))
> +	while (attempts++ < MAX_OOM_REAP_RETRIES) {
> +		if (__oom_reap_task_mm(tsk, mm))
> +			goto done;
>  		schedule_timeout_idle(HZ/10);
> -
> -	if (attempts <= MAX_OOM_REAP_RETRIES)
> -		goto done;
> +	}

I do not see how this improves anything. Even if __oom_reap_task_mm
suceeded during the last attempt then attempts == MAX_OOM_REAP_RETRIES
and the if below would bail out. Or what do I miss?

>  	pr_info("oom_reaper: unable to reap pid:%d (%s)\n",

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread.
  2018-03-20 11:57 [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread Tetsuo Handa
  2018-03-20 11:57 ` [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt Tetsuo Handa
@ 2018-03-20 12:28 ` Michal Hocko
  2018-03-20 12:52   ` Tetsuo Handa
  1 sibling, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2018-03-20 12:28 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-mm, David Rientjes

On Tue 20-03-18 20:57:55, Tetsuo Handa wrote:
> I found that it is not difficult to hit "oom_reaper: unable to reap pid:"
> messages if the victim thread is doing copy_process(). Since I noticed
> that it is likely helpful to show trace of unable to reap victim thread
> for finding locations which should use killable wait, this patch does so.
> 
> [  226.608508] oom_reaper: unable to reap pid:9261 (a.out)
> [  226.611971] a.out           D13056  9261   6927 0x00100084
> [  226.615879] Call Trace:
> [  226.617926]  ? __schedule+0x25f/0x780
> [  226.620559]  schedule+0x2d/0x80
> [  226.623356]  rwsem_down_write_failed+0x2bb/0x440
> [  226.626426]  ? rwsem_down_write_failed+0x55/0x440
> [  226.629458]  ? anon_vma_fork+0x124/0x150
> [  226.632679]  call_rwsem_down_write_failed+0x13/0x20
> [  226.635884]  down_write+0x49/0x60
> [  226.638867]  ? copy_process.part.41+0x12f2/0x1fe0
> [  226.642042]  copy_process.part.41+0x12f2/0x1fe0 /* i_mmap_lock_write() in dup_mmap() */
> [  226.645087]  ? _do_fork+0xe6/0x560
> [  226.647991]  _do_fork+0xe6/0x560
> [  226.650495]  ? syscall_trace_enter+0x1a9/0x240
> [  226.653443]  ? retint_user+0x18/0x18
> [  226.656601]  ? page_fault+0x2f/0x50
> [  226.659159]  ? trace_hardirqs_on_caller+0x11f/0x1b0
> [  226.662399]  do_syscall_64+0x74/0x230
> [  226.664989]  entry_SYSCALL_64_after_hwframe+0x42/0xb7

A single stack trace in the changelog would be sufficient IMHO.
Appart from that. What do you expect users will do about this trace?
Sure they will see a path which holds mmap_sem, we will see a bug report
but we can hardly do anything about that. We simply cannot drop the lock
from that path in 99% of situations. So _why_ do we want to add more
information to the log?

[...]

> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
>  mm/oom_kill.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 5336985..900300c 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -41,6 +41,7 @@
>  #include <linux/kthread.h>
>  #include <linux/init.h>
>  #include <linux/mmu_notifier.h>
> +#include <linux/sched/debug.h>
>  
>  #include <asm/tlb.h>
>  #include "internal.h"
> @@ -596,6 +597,7 @@ static void oom_reap_task(struct task_struct *tsk)
>  
>  	pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
>  		task_pid_nr(tsk), tsk->comm);
> +	sched_show_task(tsk);
>  	debug_show_all_locks();
>  
>  done:
> -- 
> 1.8.3.1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt.
  2018-03-20 12:12   ` Michal Hocko
@ 2018-03-20 12:37     ` Tetsuo Handa
  2018-03-20 12:47       ` [PATCH 2/2] mm,oom_reaper: Check for MMF_OOM_SKIP before complain Tetsuo Handa
  2018-03-20 13:24       ` [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt Michal Hocko
  0 siblings, 2 replies; 20+ messages in thread
From: Tetsuo Handa @ 2018-03-20 12:37 UTC (permalink / raw)
  To: mhocko; +Cc: linux-mm, rientjes

Michal Hocko wrote:
> On Tue 20-03-18 20:57:56, Tetsuo Handa wrote:
> > I got "oom_reaper: unable to reap pid:" messages when the victim thread
> > was blocked inside free_pgtables() (which occurred after returning from
> > unmap_vmas() and setting MMF_OOM_SKIP). We don't need to complain when
> > __oom_reap_task_mm() returned true (by e.g. finding MMF_OOM_SKIP already
> > set) when oom_reap_task() was trying MAX_OOM_REAP_RETRIES'th attempt.
> > 
> > [  663.593821] Killed process 7558 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
> > [  664.684801] oom_reaper: unable to reap pid:7558 (a.out)
> 
> I do not see "oom_reaper: reaped process..." so has the task been
> reaped?

The log is http://I-love.SAKURA.ne.jp/tmp/serial-20180320.txt.xz .
I think the task succeeded to call unmap_vmas() in exit_mmap(), for
no "oom_reaper: reaped process..." message means that __oom_reap_task_mm()
returned without setting MMF_UNSTABLE.

> 
> > [  664.892292] a.out           D13272  7558   6931 0x00100084
> > [  664.895765] Call Trace:
> > [  664.897574]  ? __schedule+0x25f/0x780
> > [  664.900099]  schedule+0x2d/0x80
> > [  664.902260]  rwsem_down_write_failed+0x2bb/0x440
> > [  664.905249]  ? rwsem_down_write_failed+0x55/0x440
> > [  664.908335]  ? free_pgd_range+0x569/0x5e0
> > [  664.911145]  call_rwsem_down_write_failed+0x13/0x20
> > [  664.914121]  down_write+0x49/0x60
> > [  664.916519]  ? unlink_file_vma+0x28/0x50
> > [  664.919255]  unlink_file_vma+0x28/0x50
> > [  664.922234]  free_pgtables+0x36/0x100
> > [  664.924797]  exit_mmap+0xbb/0x180
> > [  664.927220]  mmput+0x50/0x110
> > [  664.929504]  copy_process.part.41+0xb61/0x1fe0
> > [  664.932448]  ? _do_fork+0xe6/0x560
> > [  664.934902]  ? _do_fork+0xe6/0x560
> > [  664.937361]  _do_fork+0xe6/0x560
> > [  664.939742]  ? syscall_trace_enter+0x1a9/0x240
> > [  664.942693]  ? retint_user+0x18/0x18
> > [  664.945309]  ? page_fault+0x2f/0x50
> > [  664.947896]  ? trace_hardirqs_on_caller+0x11f/0x1b0
> > [  664.951075]  do_syscall_64+0x74/0x230
> > [  664.953747]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> > 
> > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: David Rientjes <rientjes@google.com>
> > ---
> >  mm/oom_kill.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 900300c..1cb2b98 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -588,11 +588,11 @@ static void oom_reap_task(struct task_struct *tsk)
> >  	struct mm_struct *mm = tsk->signal->oom_mm;
> >  
> >  	/* Retry the down_read_trylock(mmap_sem) a few times */
> > -	while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm))
> > +	while (attempts++ < MAX_OOM_REAP_RETRIES) {
> > +		if (__oom_reap_task_mm(tsk, mm))
> > +			goto done;
> >  		schedule_timeout_idle(HZ/10);
> > -
> > -	if (attempts <= MAX_OOM_REAP_RETRIES)
> > -		goto done;
> > +	}
> 
> I do not see how this improves anything. Even if __oom_reap_task_mm
> suceeded during the last attempt then attempts == MAX_OOM_REAP_RETRIES
> and the if below would bail out. Or what do I miss?

Oops, I optimized incorrectly. The original patch was

-	if (attempts <= MAX_OOM_REAP_RETRIES)
+	if (attempts <= MAX_OOM_REAP_RETRIES ||
+	    test_bit(MMF_OOM_SKIP, &mm->flags))
 		goto done;

with title "Check for MMF_OOM_SKIP before complain".
Even if the last attempt failed, there is no need to complain if
MMF_OOM_SKIP was already set.

> 
> >  	pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
> 
> -- 
> Michal Hocko
> SUSE Labs
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/2] mm,oom_reaper: Check for MMF_OOM_SKIP before complain.
  2018-03-20 12:37     ` Tetsuo Handa
@ 2018-03-20 12:47       ` Tetsuo Handa
  2018-03-20 20:49         ` David Rientjes
  2018-03-20 13:24       ` [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt Michal Hocko
  1 sibling, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2018-03-20 12:47 UTC (permalink / raw)
  To: mhocko; +Cc: linux-mm, rientjes

>From c8954b88c8e78d10829c646c95e076516a54e84f Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Tue, 20 Mar 2018 21:42:12 +0900
Subject: [PATCH] mm,oom_reaper: Check for MMF_OOM_SKIP before complain.

I got "oom_reaper: unable to reap pid:" messages when the victim thread
was blocked inside free_pgtables() (which occurred after returning from
unmap_vmas() and setting MMF_OOM_SKIP). We don't need to complain when
exit_mmap() already set MMF_OOM_SKIP.

[  663.593821] Killed process 7558 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
[  664.684801] oom_reaper: unable to reap pid:7558 (a.out)
[  664.892292] a.out           D13272  7558   6931 0x00100084
[  664.895765] Call Trace:
[  664.897574]  ? __schedule+0x25f/0x780
[  664.900099]  schedule+0x2d/0x80
[  664.902260]  rwsem_down_write_failed+0x2bb/0x440
[  664.905249]  ? rwsem_down_write_failed+0x55/0x440
[  664.908335]  ? free_pgd_range+0x569/0x5e0
[  664.911145]  call_rwsem_down_write_failed+0x13/0x20
[  664.914121]  down_write+0x49/0x60
[  664.916519]  ? unlink_file_vma+0x28/0x50
[  664.919255]  unlink_file_vma+0x28/0x50
[  664.922234]  free_pgtables+0x36/0x100
[  664.924797]  exit_mmap+0xbb/0x180
[  664.927220]  mmput+0x50/0x110
[  664.929504]  copy_process.part.41+0xb61/0x1fe0
[  664.932448]  ? _do_fork+0xe6/0x560
[  664.934902]  ? _do_fork+0xe6/0x560
[  664.937361]  _do_fork+0xe6/0x560
[  664.939742]  ? syscall_trace_enter+0x1a9/0x240
[  664.942693]  ? retint_user+0x18/0x18
[  664.945309]  ? page_fault+0x2f/0x50
[  664.947896]  ? trace_hardirqs_on_caller+0x11f/0x1b0
[  664.951075]  do_syscall_64+0x74/0x230
[  664.953747]  entry_SYSCALL_64_after_hwframe+0x42/0xb7

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
---
 mm/oom_kill.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 5336985..dfd3705 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -590,7 +590,8 @@ static void oom_reap_task(struct task_struct *tsk)
 	while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm))
 		schedule_timeout_idle(HZ/10);
 
-	if (attempts <= MAX_OOM_REAP_RETRIES)
+	if (attempts <= MAX_OOM_REAP_RETRIES ||
+	    test_bit(MMF_OOM_SKIP, &mm->flags))
 		goto done;
 
 
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread.
  2018-03-20 12:28 ` [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread Michal Hocko
@ 2018-03-20 12:52   ` Tetsuo Handa
  2018-03-20 13:19     ` Michal Hocko
  0 siblings, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2018-03-20 12:52 UTC (permalink / raw)
  To: mhocko; +Cc: linux-mm, rientjes

Michal Hocko wrote:
> On Tue 20-03-18 20:57:55, Tetsuo Handa wrote:
> > I found that it is not difficult to hit "oom_reaper: unable to reap pid:"
> > messages if the victim thread is doing copy_process(). Since I noticed
> > that it is likely helpful to show trace of unable to reap victim thread
> > for finding locations which should use killable wait, this patch does so.
> > 
> > [  226.608508] oom_reaper: unable to reap pid:9261 (a.out)
> > [  226.611971] a.out           D13056  9261   6927 0x00100084
> > [  226.615879] Call Trace:
> > [  226.617926]  ? __schedule+0x25f/0x780
> > [  226.620559]  schedule+0x2d/0x80
> > [  226.623356]  rwsem_down_write_failed+0x2bb/0x440
> > [  226.626426]  ? rwsem_down_write_failed+0x55/0x440
> > [  226.629458]  ? anon_vma_fork+0x124/0x150
> > [  226.632679]  call_rwsem_down_write_failed+0x13/0x20
> > [  226.635884]  down_write+0x49/0x60
> > [  226.638867]  ? copy_process.part.41+0x12f2/0x1fe0
> > [  226.642042]  copy_process.part.41+0x12f2/0x1fe0 /* i_mmap_lock_write() in dup_mmap() */
> > [  226.645087]  ? _do_fork+0xe6/0x560
> > [  226.647991]  _do_fork+0xe6/0x560
> > [  226.650495]  ? syscall_trace_enter+0x1a9/0x240
> > [  226.653443]  ? retint_user+0x18/0x18
> > [  226.656601]  ? page_fault+0x2f/0x50
> > [  226.659159]  ? trace_hardirqs_on_caller+0x11f/0x1b0
> > [  226.662399]  do_syscall_64+0x74/0x230
> > [  226.664989]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> 
> A single stack trace in the changelog would be sufficient IMHO.
> Appart from that. What do you expect users will do about this trace?
> Sure they will see a path which holds mmap_sem, we will see a bug report
> but we can hardly do anything about that. We simply cannot drop the lock
> from that path in 99% of situations. So _why_ do we want to add more
> information to the log?

This case is blocked at i_mmap_lock_write(). If we can add error handling path
there, we can replace i_mmap_lock_write() with i_mmap_lock_write_killable() and
bail out soon. This patch helps finding such locations.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread.
  2018-03-20 12:52   ` Tetsuo Handa
@ 2018-03-20 13:19     ` Michal Hocko
  2018-03-20 13:30       ` Tetsuo Handa
  0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2018-03-20 13:19 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-mm, rientjes

On Tue 20-03-18 21:52:33, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Tue 20-03-18 20:57:55, Tetsuo Handa wrote:
> > > I found that it is not difficult to hit "oom_reaper: unable to reap pid:"
> > > messages if the victim thread is doing copy_process(). Since I noticed
> > > that it is likely helpful to show trace of unable to reap victim thread
> > > for finding locations which should use killable wait, this patch does so.
> > > 
> > > [  226.608508] oom_reaper: unable to reap pid:9261 (a.out)
> > > [  226.611971] a.out           D13056  9261   6927 0x00100084
> > > [  226.615879] Call Trace:
> > > [  226.617926]  ? __schedule+0x25f/0x780
> > > [  226.620559]  schedule+0x2d/0x80
> > > [  226.623356]  rwsem_down_write_failed+0x2bb/0x440
> > > [  226.626426]  ? rwsem_down_write_failed+0x55/0x440
> > > [  226.629458]  ? anon_vma_fork+0x124/0x150
> > > [  226.632679]  call_rwsem_down_write_failed+0x13/0x20
> > > [  226.635884]  down_write+0x49/0x60
> > > [  226.638867]  ? copy_process.part.41+0x12f2/0x1fe0
> > > [  226.642042]  copy_process.part.41+0x12f2/0x1fe0 /* i_mmap_lock_write() in dup_mmap() */
> > > [  226.645087]  ? _do_fork+0xe6/0x560
> > > [  226.647991]  _do_fork+0xe6/0x560
> > > [  226.650495]  ? syscall_trace_enter+0x1a9/0x240
> > > [  226.653443]  ? retint_user+0x18/0x18
> > > [  226.656601]  ? page_fault+0x2f/0x50
> > > [  226.659159]  ? trace_hardirqs_on_caller+0x11f/0x1b0
> > > [  226.662399]  do_syscall_64+0x74/0x230
> > > [  226.664989]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> > 
> > A single stack trace in the changelog would be sufficient IMHO.
> > Appart from that. What do you expect users will do about this trace?
> > Sure they will see a path which holds mmap_sem, we will see a bug report
> > but we can hardly do anything about that. We simply cannot drop the lock
> > from that path in 99% of situations. So _why_ do we want to add more
> > information to the log?
> 
> This case is blocked at i_mmap_lock_write().

But why does i_mmap_lock_write matter for oom_reaping. We are not
touching hugetlb mappings. dup_mmap holds mmap_sem for write which is
the most probable source of the backoff.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt.
  2018-03-20 12:37     ` Tetsuo Handa
  2018-03-20 12:47       ` [PATCH 2/2] mm,oom_reaper: Check for MMF_OOM_SKIP before complain Tetsuo Handa
@ 2018-03-20 13:24       ` Michal Hocko
  2018-03-20 13:37         ` Tetsuo Handa
  1 sibling, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2018-03-20 13:24 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-mm, rientjes

On Tue 20-03-18 21:37:42, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Tue 20-03-18 20:57:56, Tetsuo Handa wrote:
> > > I got "oom_reaper: unable to reap pid:" messages when the victim thread
> > > was blocked inside free_pgtables() (which occurred after returning from
> > > unmap_vmas() and setting MMF_OOM_SKIP). We don't need to complain when
> > > __oom_reap_task_mm() returned true (by e.g. finding MMF_OOM_SKIP already
> > > set) when oom_reap_task() was trying MAX_OOM_REAP_RETRIES'th attempt.
> > > 
> > > [  663.593821] Killed process 7558 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
> > > [  664.684801] oom_reaper: unable to reap pid:7558 (a.out)
> > 
> > I do not see "oom_reaper: reaped process..." so has the task been
> > reaped?
> 
> The log is http://I-love.SAKURA.ne.jp/tmp/serial-20180320.txt.xz .

xzgrep oom_reaper: serial-20180320.txt.xz
[   97.144805] oom_reaper: reaped process 2142 (cleanupd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[  103.447171] oom_reaper: reaped process 659 (NetworkManager), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[  208.906879] oom_reaper: reaped process 1415 (postgres), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[  214.179760] oom_reaper: reaped process 588 (irqbalance), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[  215.013260] oom_reaper: reaped process 591 (systemd-logind), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[  221.483724] oom_reaper: reaped process 7388 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[  223.140437] oom_reaper: reaped process 9252 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[  224.735597] oom_reaper: reaped process 9257 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[  226.608508] oom_reaper: unable to reap pid:9261 (a.out)
[  354.342331] oom_reaper: unable to reap pid:16611 (a.out)
[  380.525910] oom_reaper: unable to reap pid:6927 (a.out)
[  408.397539] oom_reaper: unable to reap pid:6927 (a.out)
[  435.702005] oom_reaper: unable to reap pid:7554 (a.out)
[  466.269660] oom_reaper: unable to reap pid:7560 (a.out)
[  495.621196] oom_reaper: unable to reap pid:7563 (a.out)
[  534.279602] oom_reaper: reaped process 22038 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[  536.271631] oom_reaper: unable to reap pid:6928 (a.out)
[  600.285293] oom_reaper: unable to reap pid:7550 (a.out)
[  664.684801] oom_reaper: unable to reap pid:7558 (a.out)
[  743.188067] oom_reaper: reaped process 7562 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[  744.980868] oom_reaper: unable to reap pid:6928 (a.out)

None of the reaped taks is reported as "unable to reap..." So what
actually gets fixed by this patch?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread.
  2018-03-20 13:19     ` Michal Hocko
@ 2018-03-20 13:30       ` Tetsuo Handa
  2018-03-20 13:34         ` Michal Hocko
  0 siblings, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2018-03-20 13:30 UTC (permalink / raw)
  To: mhocko; +Cc: linux-mm, rientjes

Michal Hocko wrote:
> On Tue 20-03-18 21:52:33, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > A single stack trace in the changelog would be sufficient IMHO.
> > > Appart from that. What do you expect users will do about this trace?
> > > Sure they will see a path which holds mmap_sem, we will see a bug report
> > > but we can hardly do anything about that. We simply cannot drop the lock
> > > from that path in 99% of situations. So _why_ do we want to add more
> > > information to the log?
> > 
> > This case is blocked at i_mmap_lock_write().
> 
> But why does i_mmap_lock_write matter for oom_reaping. We are not
> touching hugetlb mappings. dup_mmap holds mmap_sem for write which is
> the most probable source of the backoff.

If i_mmap_lock_write can bail out upon SIGKILL, the OOM victim will be able to
release mmap_sem held for write, which helps the OOM reaper not to back off.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread.
  2018-03-20 13:30       ` Tetsuo Handa
@ 2018-03-20 13:34         ` Michal Hocko
  2018-03-20 13:50           ` Tetsuo Handa
  0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2018-03-20 13:34 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-mm, rientjes

On Tue 20-03-18 22:30:16, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Tue 20-03-18 21:52:33, Tetsuo Handa wrote:
> > > Michal Hocko wrote:
> > > > A single stack trace in the changelog would be sufficient IMHO.
> > > > Appart from that. What do you expect users will do about this trace?
> > > > Sure they will see a path which holds mmap_sem, we will see a bug report
> > > > but we can hardly do anything about that. We simply cannot drop the lock
> > > > from that path in 99% of situations. So _why_ do we want to add more
> > > > information to the log?
> > > 
> > > This case is blocked at i_mmap_lock_write().
> > 
> > But why does i_mmap_lock_write matter for oom_reaping. We are not
> > touching hugetlb mappings. dup_mmap holds mmap_sem for write which is
> > the most probable source of the backoff.
> 
> If i_mmap_lock_write can bail out upon SIGKILL, the OOM victim will be able to
> release mmap_sem held for write, which helps the OOM reaper not to back off.

There are so many other blocking calls (including allocations) in
dup_mmap that I do not really think i_mmap_lock_write is the biggest
problem. That will be likely the case for other mmap_sem write lockers.

Really I am not sure dumping more information is beneficial here.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt.
  2018-03-20 13:24       ` [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt Michal Hocko
@ 2018-03-20 13:37         ` Tetsuo Handa
  2018-03-20 13:44           ` Michal Hocko
  0 siblings, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2018-03-20 13:37 UTC (permalink / raw)
  To: mhocko; +Cc: linux-mm, rientjes

Michal Hocko wrote:
> On Tue 20-03-18 21:37:42, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Tue 20-03-18 20:57:56, Tetsuo Handa wrote:
> > > > I got "oom_reaper: unable to reap pid:" messages when the victim thread
> > > > was blocked inside free_pgtables() (which occurred after returning from
> > > > unmap_vmas() and setting MMF_OOM_SKIP). We don't need to complain when
> > > > __oom_reap_task_mm() returned true (by e.g. finding MMF_OOM_SKIP already
> > > > set) when oom_reap_task() was trying MAX_OOM_REAP_RETRIES'th attempt.
> > > > 
> > > > [  663.593821] Killed process 7558 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
> > > > [  664.684801] oom_reaper: unable to reap pid:7558 (a.out)
> > > 
> > > I do not see "oom_reaper: reaped process..." so has the task been
> > > reaped?
> > 
> > The log is http://I-love.SAKURA.ne.jp/tmp/serial-20180320.txt.xz .
> 
> xzgrep oom_reaper: serial-20180320.txt.xz
> [   97.144805] oom_reaper: reaped process 2142 (cleanupd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [  103.447171] oom_reaper: reaped process 659 (NetworkManager), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [  208.906879] oom_reaper: reaped process 1415 (postgres), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [  214.179760] oom_reaper: reaped process 588 (irqbalance), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [  215.013260] oom_reaper: reaped process 591 (systemd-logind), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [  221.483724] oom_reaper: reaped process 7388 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [  223.140437] oom_reaper: reaped process 9252 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [  224.735597] oom_reaper: reaped process 9257 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [  226.608508] oom_reaper: unable to reap pid:9261 (a.out)
> [  354.342331] oom_reaper: unable to reap pid:16611 (a.out)
> [  380.525910] oom_reaper: unable to reap pid:6927 (a.out)
> [  408.397539] oom_reaper: unable to reap pid:6927 (a.out)
> [  435.702005] oom_reaper: unable to reap pid:7554 (a.out)
> [  466.269660] oom_reaper: unable to reap pid:7560 (a.out)
> [  495.621196] oom_reaper: unable to reap pid:7563 (a.out)
> [  534.279602] oom_reaper: reaped process 22038 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [  536.271631] oom_reaper: unable to reap pid:6928 (a.out)
> [  600.285293] oom_reaper: unable to reap pid:7550 (a.out)
> [  664.684801] oom_reaper: unable to reap pid:7558 (a.out)
> [  743.188067] oom_reaper: reaped process 7562 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [  744.980868] oom_reaper: unable to reap pid:6928 (a.out)
> 
> None of the reaped taks is reported as "unable to reap..." So what
> actually gets fixed by this patch?

This patch (which I sent as "[PATCH 2/2] mm,oom_reaper: Check for MMF_OOM_SKIP before complain.")
makes the OOM reaper not to complain if MMF_OOM_SKIP was already set by exit_mmap().

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt.
  2018-03-20 13:37         ` Tetsuo Handa
@ 2018-03-20 13:44           ` Michal Hocko
  0 siblings, 0 replies; 20+ messages in thread
From: Michal Hocko @ 2018-03-20 13:44 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-mm, rientjes

On Tue 20-03-18 22:37:31, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Tue 20-03-18 21:37:42, Tetsuo Handa wrote:
> > > Michal Hocko wrote:
> > > > On Tue 20-03-18 20:57:56, Tetsuo Handa wrote:
> > > > > I got "oom_reaper: unable to reap pid:" messages when the victim thread
> > > > > was blocked inside free_pgtables() (which occurred after returning from
> > > > > unmap_vmas() and setting MMF_OOM_SKIP). We don't need to complain when
> > > > > __oom_reap_task_mm() returned true (by e.g. finding MMF_OOM_SKIP already
> > > > > set) when oom_reap_task() was trying MAX_OOM_REAP_RETRIES'th attempt.
> > > > > 
> > > > > [  663.593821] Killed process 7558 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
> > > > > [  664.684801] oom_reaper: unable to reap pid:7558 (a.out)
> > > > 
> > > > I do not see "oom_reaper: reaped process..." so has the task been
> > > > reaped?
> > > 
> > > The log is http://I-love.SAKURA.ne.jp/tmp/serial-20180320.txt.xz .
> > 
> > xzgrep oom_reaper: serial-20180320.txt.xz
> > [   97.144805] oom_reaper: reaped process 2142 (cleanupd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [  103.447171] oom_reaper: reaped process 659 (NetworkManager), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [  208.906879] oom_reaper: reaped process 1415 (postgres), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [  214.179760] oom_reaper: reaped process 588 (irqbalance), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [  215.013260] oom_reaper: reaped process 591 (systemd-logind), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [  221.483724] oom_reaper: reaped process 7388 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [  223.140437] oom_reaper: reaped process 9252 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [  224.735597] oom_reaper: reaped process 9257 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [  226.608508] oom_reaper: unable to reap pid:9261 (a.out)
> > [  354.342331] oom_reaper: unable to reap pid:16611 (a.out)
> > [  380.525910] oom_reaper: unable to reap pid:6927 (a.out)
> > [  408.397539] oom_reaper: unable to reap pid:6927 (a.out)
> > [  435.702005] oom_reaper: unable to reap pid:7554 (a.out)
> > [  466.269660] oom_reaper: unable to reap pid:7560 (a.out)
> > [  495.621196] oom_reaper: unable to reap pid:7563 (a.out)
> > [  534.279602] oom_reaper: reaped process 22038 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [  536.271631] oom_reaper: unable to reap pid:6928 (a.out)
> > [  600.285293] oom_reaper: unable to reap pid:7550 (a.out)
> > [  664.684801] oom_reaper: unable to reap pid:7558 (a.out)
> > [  743.188067] oom_reaper: reaped process 7562 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [  744.980868] oom_reaper: unable to reap pid:6928 (a.out)
> > 
> > None of the reaped taks is reported as "unable to reap..." So what
> > actually gets fixed by this patch?
> 
> This patch (which I sent as "[PATCH 2/2] mm,oom_reaper: Check for MMF_OOM_SKIP before complain.")
> makes the OOM reaper not to complain if MMF_OOM_SKIP was already set by exit_mmap().

Sigh. So can you send a patch with a meaningful changelog showing
the problem and explaining how it is fixed?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread.
  2018-03-20 13:34         ` Michal Hocko
@ 2018-03-20 13:50           ` Tetsuo Handa
  2018-03-20 14:10             ` Michal Hocko
  0 siblings, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2018-03-20 13:50 UTC (permalink / raw)
  To: mhocko; +Cc: linux-mm, rientjes

Michal Hocko wrote:
> On Tue 20-03-18 22:30:16, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Tue 20-03-18 21:52:33, Tetsuo Handa wrote:
> > > > Michal Hocko wrote:
> > > > > A single stack trace in the changelog would be sufficient IMHO.
> > > > > Appart from that. What do you expect users will do about this trace?
> > > > > Sure they will see a path which holds mmap_sem, we will see a bug report
> > > > > but we can hardly do anything about that. We simply cannot drop the lock
> > > > > from that path in 99% of situations. So _why_ do we want to add more
> > > > > information to the log?
> > > > 
> > > > This case is blocked at i_mmap_lock_write().
> > > 
> > > But why does i_mmap_lock_write matter for oom_reaping. We are not
> > > touching hugetlb mappings. dup_mmap holds mmap_sem for write which is
> > > the most probable source of the backoff.
> > 
> > If i_mmap_lock_write can bail out upon SIGKILL, the OOM victim will be able to
> > release mmap_sem held for write, which helps the OOM reaper not to back off.
> 
> There are so many other blocking calls (including allocations) in
> dup_mmap 

Yes. But

>          that I do not really think i_mmap_lock_write is the biggest
> problem. That will be likely the case for other mmap_sem write lockers.

i_mmap_lock_write() is one of the problems which we could afford fixing.
8 out of 11 "oom_reaper: unable to reap" messages are blocked at i_mmap_lock_write().

[  226.608508] oom_reaper: unable to reap pid:9261 (a.out)
[  226.611971] a.out           D13056  9261   6927 0x00100084
[  226.615879] Call Trace:
[  226.617926]  ? __schedule+0x25f/0x780
[  226.620559]  schedule+0x2d/0x80
[  226.623356]  rwsem_down_write_failed+0x2bb/0x440
[  226.626426]  ? rwsem_down_write_failed+0x55/0x440
[  226.629458]  ? anon_vma_fork+0x124/0x150
[  226.632679]  call_rwsem_down_write_failed+0x13/0x20
[  226.635884]  down_write+0x49/0x60
[  226.638867]  ? copy_process.part.41+0x12f2/0x1fe0
[  226.642042]  copy_process.part.41+0x12f2/0x1fe0
[  226.645087]  ? _do_fork+0xe6/0x560
[  226.647991]  _do_fork+0xe6/0x560
[  226.650495]  ? syscall_trace_enter+0x1a9/0x240
[  226.653443]  ? retint_user+0x18/0x18
[  226.656601]  ? page_fault+0x2f/0x50
[  226.659159]  ? trace_hardirqs_on_caller+0x11f/0x1b0
[  226.662399]  do_syscall_64+0x74/0x230
[  226.664989]  entry_SYSCALL_64_after_hwframe+0x42/0xb7

[  354.342331] oom_reaper: unable to reap pid:16611 (a.out)
[  354.346707] a.out           D13312 16611   6927 0x00100084
[  354.351173] Call Trace:
[  354.354164]  ? __schedule+0x25f/0x780
[  354.357633]  schedule+0x2d/0x80
[  354.360976]  rwsem_down_write_failed+0x2bb/0x440
[  354.364774]  ? rwsem_down_write_failed+0x55/0x440
[  354.368749]  call_rwsem_down_write_failed+0x13/0x20
[  354.372703]  down_write+0x49/0x60
[  354.376106]  ? copy_process.part.41+0x12f2/0x1fe0
[  354.379956]  copy_process.part.41+0x12f2/0x1fe0
[  354.383791]  ? _do_fork+0xe6/0x560
[  354.386967]  _do_fork+0xe6/0x560
[  354.390016]  ? syscall_trace_enter+0x1a9/0x240
[  354.393756]  ? retint_user+0x18/0x18
[  354.397296]  ? page_fault+0x2f/0x50
[  354.401081]  ? trace_hardirqs_on_caller+0x11f/0x1b0
[  354.404772]  do_syscall_64+0x74/0x230
[  354.407749]  entry_SYSCALL_64_after_hwframe+0x42/0xb7

[  380.525910] oom_reaper: unable to reap pid:6927 (a.out)
[  380.529274] a.out           R  running task    13544  6927      1 0x00100084
[  380.533264] Call Trace:
[  380.535433]  ? __schedule+0x25f/0x780
[  380.538052]  ? mark_held_locks+0x60/0x80
[  380.540775]  schedule+0x2d/0x80
[  380.543258]  schedule_timeout+0x1a4/0x350
[  380.546053]  ? __next_timer_interrupt+0xd0/0xd0
[  380.549033]  msleep+0x25/0x30
[  380.551458]  shrink_inactive_list+0x5b7/0x690
[  380.554408]  ? __lock_acquire+0x1f1/0xfd0
[  380.557228]  ? find_held_lock+0x2d/0x90
[  380.559959]  shrink_node_memcg+0x340/0x770
[  380.562776]  ? __lock_acquire+0x246/0xfd0
[  380.565532]  ? mem_cgroup_iter+0x121/0x4f0
[  380.568337]  ? mem_cgroup_iter+0x121/0x4f0
[  380.571203]  shrink_node+0xd8/0x370
[  380.573745]  do_try_to_free_pages+0xe3/0x390
[  380.576759]  try_to_free_pages+0xc8/0x110
[  380.579476]  __alloc_pages_slowpath+0x28a/0x8d9
[  380.582639]  __alloc_pages_nodemask+0x21d/0x260
[  380.585662]  new_slab+0x558/0x760
[  380.588188]  ___slab_alloc+0x353/0x6f0
[  380.590962]  ? copy_process.part.41+0x121f/0x1fe0
[  380.594008]  ? find_held_lock+0x2d/0x90
[  380.596750]  ? copy_process.part.41+0x121f/0x1fe0
[  380.599852]  __slab_alloc+0x41/0x7a
[  380.602469]  ? copy_process.part.41+0x121f/0x1fe0
[  380.605607]  kmem_cache_alloc+0x1a6/0x1f0
[  380.608528]  copy_process.part.41+0x121f/0x1fe0
[  380.611613]  ? _do_fork+0xe6/0x560
[  380.614299]  _do_fork+0xe6/0x560
[  380.616752]  ? syscall_trace_enter+0x1a9/0x240
[  380.619813]  ? retint_user+0x18/0x18
[  380.622588]  ? page_fault+0x2f/0x50
[  380.625184]  ? trace_hardirqs_on_caller+0x11f/0x1b0
[  380.628841]  do_syscall_64+0x74/0x230
[  380.631660]  entry_SYSCALL_64_after_hwframe+0x42/0xb7

[  408.397539] oom_reaper: unable to reap pid:6927 (a.out)
[  408.400862] a.out           D13544  6927      1 0x00100084
[  408.404262] Call Trace:
[  408.406449]  ? __schedule+0x25f/0x780
[  408.409063]  schedule+0x2d/0x80
[  408.411440]  rwsem_down_write_failed+0x2bb/0x440
[  408.414544]  ? rwsem_down_write_failed+0x55/0x440
[  408.417684]  call_rwsem_down_write_failed+0x13/0x20
[  408.420866]  down_write+0x49/0x60
[  408.423442]  ? copy_process.part.41+0x12f2/0x1fe0
[  408.426521]  copy_process.part.41+0x12f2/0x1fe0
[  408.429587]  ? _do_fork+0xe6/0x560
[  408.432134]  _do_fork+0xe6/0x560
[  408.434601]  ? syscall_trace_enter+0x1a9/0x240
[  408.437729]  ? retint_user+0x18/0x18
[  408.440391]  ? page_fault+0x2f/0x50
[  408.442955]  ? trace_hardirqs_on_caller+0x11f/0x1b0
[  408.446850]  do_syscall_64+0x74/0x230
[  408.449523]  entry_SYSCALL_64_after_hwframe+0x42/0xb7

[  435.702005] oom_reaper: unable to reap pid:7554 (a.out)
[  435.706759] a.out           D13328  7554   6928 0x00100084
[  435.711417] Call Trace:
[  435.714814]  ? __schedule+0x25f/0x780
[  435.718813]  schedule+0x2d/0x80
[  435.722308]  rwsem_down_write_failed+0x2bb/0x440
[  435.726661]  ? rwsem_down_write_failed+0x55/0x440
[  435.730881]  ? anon_vma_fork+0x124/0x150
[  435.734935]  call_rwsem_down_write_failed+0x13/0x20
[  435.739278]  down_write+0x49/0x60
[  435.743009]  ? copy_process.part.41+0x12f2/0x1fe0
[  435.747173]  copy_process.part.41+0x12f2/0x1fe0
[  435.751304]  ? _do_fork+0xe6/0x560
[  435.754924]  _do_fork+0xe6/0x560
[  435.758538]  ? syscall_trace_enter+0x1a9/0x240
[  435.762533]  ? retint_user+0x18/0x18
[  435.766115]  ? page_fault+0x2f/0x50
[  435.769646]  ? trace_hardirqs_on_caller+0x11f/0x1b0
[  435.773850]  do_syscall_64+0x74/0x230
[  435.777364]  entry_SYSCALL_64_after_hwframe+0x42/0xb7

[  466.269660] oom_reaper: unable to reap pid:7560 (a.out)
[  466.273267] a.out           D13120  7560   6928 0x00100084
[  466.276878] Call Trace:
[  466.279146]  ? __schedule+0x25f/0x780
[  466.281945]  schedule+0x2d/0x80
[  466.284421]  rwsem_down_write_failed+0x2bb/0x440
[  466.287748]  ? rwsem_down_write_failed+0x55/0x440
[  466.291012]  ? anon_vma_fork+0x124/0x150
[  466.293990]  call_rwsem_down_write_failed+0x13/0x20
[  466.297344]  down_write+0x49/0x60
[  466.299925]  ? copy_process.part.41+0x12f2/0x1fe0
[  466.303310]  copy_process.part.41+0x12f2/0x1fe0
[  466.306504]  ? _do_fork+0xe6/0x560
[  466.309257]  _do_fork+0xe6/0x560
[  466.311911]  ? syscall_trace_enter+0x1a9/0x240
[  466.315112]  ? retint_user+0x18/0x18
[  466.317937]  ? page_fault+0x2f/0x50
[  466.320667]  ? trace_hardirqs_on_caller+0x11f/0x1b0
[  466.324043]  do_syscall_64+0x74/0x230
[  466.326999]  entry_SYSCALL_64_after_hwframe+0x42/0xb7

[  495.621196] oom_reaper: unable to reap pid:7563 (a.out)
[  495.624785] a.out           D12992  7563   6928 0x00100084
[  495.628398] Call Trace:
[  495.630634]  ? __schedule+0x25f/0x780
[  495.633464]  schedule+0x2d/0x80
[  495.635952]  rwsem_down_write_failed+0x2bb/0x440
[  495.639169]  ? rwsem_down_write_failed+0x55/0x440
[  495.642432]  ? anon_vma_fork+0x124/0x150
[  495.645216]  call_rwsem_down_write_failed+0x13/0x20
[  495.648651]  down_write+0x49/0x60
[  495.651446]  ? copy_process.part.41+0x12f2/0x1fe0
[  495.654843]  copy_process.part.41+0x12f2/0x1fe0
[  495.658031]  ? _do_fork+0xe6/0x560
[  495.660718]  _do_fork+0xe6/0x560
[  495.663415]  ? syscall_trace_enter+0x1a9/0x240
[  495.666662]  ? retint_user+0x18/0x18
[  495.669369]  ? page_fault+0x2f/0x50
[  495.672162]  ? trace_hardirqs_on_caller+0x11f/0x1b0
[  495.675529]  do_syscall_64+0x74/0x230
[  495.678418]  entry_SYSCALL_64_after_hwframe+0x42/0xb7

[  536.271631] oom_reaper: unable to reap pid:6928 (a.out)
[  536.275197] a.out           D12992  6928      1 0x00100084
[  536.278972] Call Trace:
[  536.281110]  ? __schedule+0x25f/0x780
[  536.283883]  schedule+0x2d/0x80
[  536.286593]  rwsem_down_write_failed+0x2bb/0x440
[  536.289821]  ? rwsem_down_write_failed+0x55/0x440
[  536.293113]  ? anon_vma_fork+0x124/0x150
[  536.296262]  call_rwsem_down_write_failed+0x13/0x20
[  536.299554]  down_write+0x49/0x60
[  536.302514]  ? copy_process.part.41+0x12f2/0x1fe0
[  536.305816]  copy_process.part.41+0x12f2/0x1fe0
[  536.309032]  ? _do_fork+0xe6/0x560
[  536.312206]  _do_fork+0xe6/0x560
[  536.314810]  ? syscall_trace_enter+0x1a9/0x240
[  536.318335]  ? retint_user+0x18/0x18
[  536.321064]  ? page_fault+0x2f/0x50
[  536.323757]  ? trace_hardirqs_on_caller+0x11f/0x1b0
[  536.327396]  do_syscall_64+0x74/0x230
[  536.330175]  entry_SYSCALL_64_after_hwframe+0x42/0xb7

[  600.285293] oom_reaper: unable to reap pid:7550 (a.out)
[  600.289112] a.out           D13104  7550   6931 0x00100084
[  600.292918] Call Trace:
[  600.295506]  ? __schedule+0x25f/0x780
[  600.298614]  ? __lock_acquire+0x246/0xfd0
[  600.301879]  schedule+0x2d/0x80
[  600.304623]  schedule_timeout+0x1fd/0x350
[  600.307633]  ? find_held_lock+0x2d/0x90
[  600.310909]  ? mark_held_locks+0x60/0x80
[  600.314001]  ? _raw_spin_unlock_irq+0x24/0x30
[  600.317382]  wait_for_completion+0xab/0x130
[  600.320579]  ? wake_up_q+0x70/0x70
[  600.323419]  flush_work+0x1bd/0x260
[  600.326434]  ? flush_work+0x174/0x260
[  600.329332]  ? destroy_worker+0x90/0x90
[  600.332301]  drain_all_pages+0x16d/0x1e0
[  600.335464]  __alloc_pages_slowpath+0x443/0x8d9
[  600.338743]  __alloc_pages_nodemask+0x21d/0x260
[  600.342178]  new_slab+0x558/0x760
[  600.344861]  ___slab_alloc+0x353/0x6f0
[  600.347744]  ? copy_process.part.41+0x121f/0x1fe0
[  600.351213]  ? find_held_lock+0x2d/0x90
[  600.354119]  ? copy_process.part.41+0x121f/0x1fe0
[  600.357491]  __slab_alloc+0x41/0x7a
[  600.360206]  ? copy_process.part.41+0x121f/0x1fe0
[  600.363480]  kmem_cache_alloc+0x1a6/0x1f0
[  600.366579]  copy_process.part.41+0x121f/0x1fe0
[  600.369761]  ? _do_fork+0xe6/0x560
[  600.372382]  _do_fork+0xe6/0x560
[  600.375142]  ? syscall_trace_enter+0x1a9/0x240
[  600.378258]  ? retint_user+0x18/0x18
[  600.380978]  ? page_fault+0x2f/0x50
[  600.383814]  ? trace_hardirqs_on_caller+0x11f/0x1b0
[  600.387096]  do_syscall_64+0x74/0x230
[  600.389928]  entry_SYSCALL_64_after_hwframe+0x42/0xb7

[  664.684801] oom_reaper: unable to reap pid:7558 (a.out)
[  664.892292] a.out           D13272  7558   6931 0x00100084
[  664.895765] Call Trace:
[  664.897574]  ? __schedule+0x25f/0x780
[  664.900099]  schedule+0x2d/0x80
[  664.902260]  rwsem_down_write_failed+0x2bb/0x440
[  664.905249]  ? rwsem_down_write_failed+0x55/0x440
[  664.908335]  ? free_pgd_range+0x569/0x5e0
[  664.911145]  call_rwsem_down_write_failed+0x13/0x20
[  664.914121]  down_write+0x49/0x60
[  664.916519]  ? unlink_file_vma+0x28/0x50
[  664.919255]  unlink_file_vma+0x28/0x50
[  664.922234]  free_pgtables+0x36/0x100
[  664.924797]  exit_mmap+0xbb/0x180
[  664.927220]  mmput+0x50/0x110
[  664.929504]  copy_process.part.41+0xb61/0x1fe0
[  664.932448]  ? _do_fork+0xe6/0x560
[  664.934902]  ? _do_fork+0xe6/0x560
[  664.937361]  _do_fork+0xe6/0x560
[  664.939742]  ? syscall_trace_enter+0x1a9/0x240
[  664.942693]  ? retint_user+0x18/0x18
[  664.945309]  ? page_fault+0x2f/0x50
[  664.947896]  ? trace_hardirqs_on_caller+0x11f/0x1b0
[  664.951075]  do_syscall_64+0x74/0x230
[  664.953747]  entry_SYSCALL_64_after_hwframe+0x42/0xb7

[  744.980868] oom_reaper: unable to reap pid:6928 (a.out)
[  744.984653] a.out           D12992  6928      1 0x00100084
[  744.988278] Call Trace:
[  744.990524]  ? __schedule+0x25f/0x780
[  744.993435]  schedule+0x2d/0x80
[  744.995877]  rwsem_down_write_failed+0x2bb/0x440
[  744.999272]  ? rwsem_down_write_failed+0x55/0x440
[  745.002759]  ? anon_vma_fork+0x124/0x150
[  745.005789]  call_rwsem_down_write_failed+0x13/0x20
[  745.009324]  down_write+0x49/0x60
[  745.012134]  ? copy_process.part.41+0x12f2/0x1fe0
[  745.015574]  copy_process.part.41+0x12f2/0x1fe0
[  745.018845]  ? _do_fork+0xe6/0x560
[  745.021698]  _do_fork+0xe6/0x560
[  745.024275]  ? syscall_trace_enter+0x1a9/0x240
[  745.027443]  ? retint_user+0x18/0x18
[  745.030281]  ? page_fault+0x2f/0x50
[  745.033138]  ? trace_hardirqs_on_caller+0x11f/0x1b0
[  745.036703]  do_syscall_64+0x74/0x230
[  745.039476]  entry_SYSCALL_64_after_hwframe+0x42/0xb7

> 
> Really I am not sure dumping more information is beneficial here.

Converting to use killable where we can afford is beneficial.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread.
  2018-03-20 13:50           ` Tetsuo Handa
@ 2018-03-20 14:10             ` Michal Hocko
  2018-03-20 14:20               ` Tetsuo Handa
  0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2018-03-20 14:10 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-mm, rientjes

On Tue 20-03-18 22:50:21, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Tue 20-03-18 22:30:16, Tetsuo Handa wrote:
> > > Michal Hocko wrote:
> > > > On Tue 20-03-18 21:52:33, Tetsuo Handa wrote:
> > > > > Michal Hocko wrote:
> > > > > > A single stack trace in the changelog would be sufficient IMHO.
> > > > > > Appart from that. What do you expect users will do about this trace?
> > > > > > Sure they will see a path which holds mmap_sem, we will see a bug report
> > > > > > but we can hardly do anything about that. We simply cannot drop the lock
> > > > > > from that path in 99% of situations. So _why_ do we want to add more
> > > > > > information to the log?
> > > > > 
> > > > > This case is blocked at i_mmap_lock_write().
> > > > 
> > > > But why does i_mmap_lock_write matter for oom_reaping. We are not
> > > > touching hugetlb mappings. dup_mmap holds mmap_sem for write which is
> > > > the most probable source of the backoff.
> > > 
> > > If i_mmap_lock_write can bail out upon SIGKILL, the OOM victim will be able to
> > > release mmap_sem held for write, which helps the OOM reaper not to back off.
> > 
> > There are so many other blocking calls (including allocations) in
> > dup_mmap 
> 
> Yes. But
> 
> >          that I do not really think i_mmap_lock_write is the biggest
> > problem. That will be likely the case for other mmap_sem write lockers.
> 
> i_mmap_lock_write() is one of the problems which we could afford fixing.
> 8 out of 11 "oom_reaper: unable to reap" messages are blocked at i_mmap_lock_write().
> 
[...]
> > Really I am not sure dumping more information is beneficial here.
> 
> Converting to use killable where we can afford is beneficial.

I am no questioning that. I am questioning the additional information
because we won't be able to do anything about mmap_sem holder most of
the time. Because they tend block on allocations...

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread.
  2018-03-20 14:10             ` Michal Hocko
@ 2018-03-20 14:20               ` Tetsuo Handa
  2018-03-20 20:46                 ` David Rientjes
  0 siblings, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2018-03-20 14:20 UTC (permalink / raw)
  To: mhocko; +Cc: linux-mm, rientjes

Michal Hocko wrote:
> On Tue 20-03-18 22:50:21, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Tue 20-03-18 22:30:16, Tetsuo Handa wrote:
> > > > Michal Hocko wrote:
> > > > > On Tue 20-03-18 21:52:33, Tetsuo Handa wrote:
> > > > > > Michal Hocko wrote:
> > > > > > > A single stack trace in the changelog would be sufficient IMHO.
> > > > > > > Appart from that. What do you expect users will do about this trace?
> > > > > > > Sure they will see a path which holds mmap_sem, we will see a bug report
> > > > > > > but we can hardly do anything about that. We simply cannot drop the lock
> > > > > > > from that path in 99% of situations. So _why_ do we want to add more
> > > > > > > information to the log?
> > > > > > 
> > > > > > This case is blocked at i_mmap_lock_write().
> > > > > 
> > > > > But why does i_mmap_lock_write matter for oom_reaping. We are not
> > > > > touching hugetlb mappings. dup_mmap holds mmap_sem for write which is
> > > > > the most probable source of the backoff.
> > > > 
> > > > If i_mmap_lock_write can bail out upon SIGKILL, the OOM victim will be able to
> > > > release mmap_sem held for write, which helps the OOM reaper not to back off.
> > > 
> > > There are so many other blocking calls (including allocations) in
> > > dup_mmap 
> > 
> > Yes. But
> > 
> > >          that I do not really think i_mmap_lock_write is the biggest
> > > problem. That will be likely the case for other mmap_sem write lockers.
> > 
> > i_mmap_lock_write() is one of the problems which we could afford fixing.
> > 8 out of 11 "oom_reaper: unable to reap" messages are blocked at i_mmap_lock_write().
> > 
> [...]
> > > Really I am not sure dumping more information is beneficial here.
> > 
> > Converting to use killable where we can afford is beneficial.
> 
> I am no questioning that. I am questioning the additional information
> because we won't be able to do anything about mmap_sem holder most of
> the time. Because they tend block on allocations...

serial-20180320.txt.xz is saying that they tend to block on i_mmap_lock_write() rather
than memory allocations. Making memory allocations killable will need a lot of work.
But I think that making frequently hitting down_write() killable won't need so much work.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread.
  2018-03-20 14:20               ` Tetsuo Handa
@ 2018-03-20 20:46                 ` David Rientjes
  0 siblings, 0 replies; 20+ messages in thread
From: David Rientjes @ 2018-03-20 20:46 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: mhocko, linux-mm

On Tue, 20 Mar 2018, Tetsuo Handa wrote:

> > I am no questioning that. I am questioning the additional information
> > because we won't be able to do anything about mmap_sem holder most of
> > the time. Because they tend block on allocations...
> 
> serial-20180320.txt.xz is saying that they tend to block on i_mmap_lock_write() rather
> than memory allocations. Making memory allocations killable will need a lot of work.
> But I think that making frequently hitting down_write() killable won't need so much work.
> 

I have available to me more than 28,222,058 occurrences of successful oom 
reaping and 13,018 occurrences of failing to grab mm->mmap_sem.

The number of failures is low on production workloads, so I don't see an 
issue with emitting a stack trace in these instances if it can help 
improve things.  But for my 0.04% failure rate, I can't say that I would 
look into them much unless it results in the system livelocking.  Unless 
there's a compelling reason why this is shouldn't be done (too much email 
sent to linux-mm as a result? :), I say just go ahead and add the darn 
stack trace.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] mm,oom_reaper: Check for MMF_OOM_SKIP before complain.
  2018-03-20 12:47       ` [PATCH 2/2] mm,oom_reaper: Check for MMF_OOM_SKIP before complain Tetsuo Handa
@ 2018-03-20 20:49         ` David Rientjes
  2018-03-22 10:46           ` [PATCH] " Tetsuo Handa
  0 siblings, 1 reply; 20+ messages in thread
From: David Rientjes @ 2018-03-20 20:49 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: mhocko, linux-mm

On Tue, 20 Mar 2018, Tetsuo Handa wrote:

> I got "oom_reaper: unable to reap pid:" messages when the victim thread
> was blocked inside free_pgtables() (which occurred after returning from
> unmap_vmas() and setting MMF_OOM_SKIP). We don't need to complain when
> exit_mmap() already set MMF_OOM_SKIP.
> 
> [  663.593821] Killed process 7558 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
> [  664.684801] oom_reaper: unable to reap pid:7558 (a.out)
> [  664.892292] a.out           D13272  7558   6931 0x00100084
> [  664.895765] Call Trace:
> [  664.897574]  ? __schedule+0x25f/0x780
> [  664.900099]  schedule+0x2d/0x80
> [  664.902260]  rwsem_down_write_failed+0x2bb/0x440
> [  664.905249]  ? rwsem_down_write_failed+0x55/0x440
> [  664.908335]  ? free_pgd_range+0x569/0x5e0
> [  664.911145]  call_rwsem_down_write_failed+0x13/0x20
> [  664.914121]  down_write+0x49/0x60
> [  664.916519]  ? unlink_file_vma+0x28/0x50
> [  664.919255]  unlink_file_vma+0x28/0x50
> [  664.922234]  free_pgtables+0x36/0x100
> [  664.924797]  exit_mmap+0xbb/0x180
> [  664.927220]  mmput+0x50/0x110
> [  664.929504]  copy_process.part.41+0xb61/0x1fe0
> [  664.932448]  ? _do_fork+0xe6/0x560
> [  664.934902]  ? _do_fork+0xe6/0x560
> [  664.937361]  _do_fork+0xe6/0x560
> [  664.939742]  ? syscall_trace_enter+0x1a9/0x240
> [  664.942693]  ? retint_user+0x18/0x18
> [  664.945309]  ? page_fault+0x2f/0x50
> [  664.947896]  ? trace_hardirqs_on_caller+0x11f/0x1b0
> [  664.951075]  do_syscall_64+0x74/0x230
> [  664.953747]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Rientjes <rientjes@google.com>

Acked-by: David Rientjes <rientjes@google.com>

But you'll need to send it to akpm.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] mm,oom_reaper: Check for MMF_OOM_SKIP before complain.
  2018-03-20 20:49         ` David Rientjes
@ 2018-03-22 10:46           ` Tetsuo Handa
  2018-03-22 11:59             ` Michal Hocko
  0 siblings, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2018-03-22 10:46 UTC (permalink / raw)
  To: akpm; +Cc: rientjes, mhocko, linux-mm

>From b141cdbe0db852549c94d5b1e6a9967ca69d59fd Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Thu, 22 Mar 2018 19:44:12 +0900
Subject: [PATCH] mm,oom_reaper: Check for MMF_OOM_SKIP before complain.

I got "oom_reaper: unable to reap pid:" messages when the victim thread
was blocked inside free_pgtables() (which occurred after returning from
unmap_vmas() and setting MMF_OOM_SKIP). We don't need to complain when
exit_mmap() already set MMF_OOM_SKIP.

[  663.593821] Killed process 7558 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
[  664.684801] oom_reaper: unable to reap pid:7558 (a.out)
[  664.892292] a.out           D13272  7558   6931 0x00100084
[  664.895765] Call Trace:
[  664.897574]  ? __schedule+0x25f/0x780
[  664.900099]  schedule+0x2d/0x80
[  664.902260]  rwsem_down_write_failed+0x2bb/0x440
[  664.905249]  ? rwsem_down_write_failed+0x55/0x440
[  664.908335]  ? free_pgd_range+0x569/0x5e0
[  664.911145]  call_rwsem_down_write_failed+0x13/0x20
[  664.914121]  down_write+0x49/0x60
[  664.916519]  ? unlink_file_vma+0x28/0x50
[  664.919255]  unlink_file_vma+0x28/0x50
[  664.922234]  free_pgtables+0x36/0x100
[  664.924797]  exit_mmap+0xbb/0x180
[  664.927220]  mmput+0x50/0x110
[  664.929504]  copy_process.part.41+0xb61/0x1fe0
[  664.932448]  ? _do_fork+0xe6/0x560
[  664.934902]  ? _do_fork+0xe6/0x560
[  664.937361]  _do_fork+0xe6/0x560
[  664.939742]  ? syscall_trace_enter+0x1a9/0x240
[  664.942693]  ? retint_user+0x18/0x18
[  664.945309]  ? page_fault+0x2f/0x50
[  664.947896]  ? trace_hardirqs_on_caller+0x11f/0x1b0
[  664.951075]  do_syscall_64+0x74/0x230
[  664.953747]  entry_SYSCALL_64_after_hwframe+0x42/0xb7

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.com>
---
 mm/oom_kill.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 5336985..dfd3705 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -590,7 +590,8 @@ static void oom_reap_task(struct task_struct *tsk)
 	while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm))
 		schedule_timeout_idle(HZ/10);
 
-	if (attempts <= MAX_OOM_REAP_RETRIES)
+	if (attempts <= MAX_OOM_REAP_RETRIES ||
+	    test_bit(MMF_OOM_SKIP, &mm->flags))
 		goto done;
 
 
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm,oom_reaper: Check for MMF_OOM_SKIP before complain.
  2018-03-22 10:46           ` [PATCH] " Tetsuo Handa
@ 2018-03-22 11:59             ` Michal Hocko
  0 siblings, 0 replies; 20+ messages in thread
From: Michal Hocko @ 2018-03-22 11:59 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: akpm, rientjes, linux-mm

On Thu 22-03-18 19:46:36, Tetsuo Handa wrote:
> >From b141cdbe0db852549c94d5b1e6a9967ca69d59fd Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Thu, 22 Mar 2018 19:44:12 +0900
> Subject: [PATCH] mm,oom_reaper: Check for MMF_OOM_SKIP before complain.
> 
> I got "oom_reaper: unable to reap pid:" messages when the victim thread
> was blocked inside free_pgtables() (which occurred after returning from
> unmap_vmas() and setting MMF_OOM_SKIP). We don't need to complain when
> exit_mmap() already set MMF_OOM_SKIP.

... because we do not report a failure when MMF_OOM_SKIP is set already
(see __oom_reap_task_mm).

> [  663.593821] Killed process 7558 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
> [  664.684801] oom_reaper: unable to reap pid:7558 (a.out)
> [  664.892292] a.out           D13272  7558   6931 0x00100084
> [  664.895765] Call Trace:
> [  664.897574]  ? __schedule+0x25f/0x780
> [  664.900099]  schedule+0x2d/0x80
> [  664.902260]  rwsem_down_write_failed+0x2bb/0x440
> [  664.905249]  ? rwsem_down_write_failed+0x55/0x440
> [  664.908335]  ? free_pgd_range+0x569/0x5e0
> [  664.911145]  call_rwsem_down_write_failed+0x13/0x20
> [  664.914121]  down_write+0x49/0x60
> [  664.916519]  ? unlink_file_vma+0x28/0x50
> [  664.919255]  unlink_file_vma+0x28/0x50
> [  664.922234]  free_pgtables+0x36/0x100
> [  664.924797]  exit_mmap+0xbb/0x180
> [  664.927220]  mmput+0x50/0x110
> [  664.929504]  copy_process.part.41+0xb61/0x1fe0
> [  664.932448]  ? _do_fork+0xe6/0x560
> [  664.934902]  ? _do_fork+0xe6/0x560
> [  664.937361]  _do_fork+0xe6/0x560
> [  664.939742]  ? syscall_trace_enter+0x1a9/0x240
> [  664.942693]  ? retint_user+0x18/0x18
> [  664.945309]  ? page_fault+0x2f/0x50
> [  664.947896]  ? trace_hardirqs_on_caller+0x11f/0x1b0
> [  664.951075]  do_syscall_64+0x74/0x230
> [  664.953747]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Acked-by: David Rientjes <rientjes@google.com>
> Cc: Michal Hocko <mhocko@suse.com>

OK, fair enough. I think it is on borderline of usefulness but it
doesn't seem harmful. I can already see somebody asking why the
MMF_OOM_SKIP really requires mmap_sem in __oom_reap_task_mm while it
doesn't in oom_reap_task but what ever. 

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/oom_kill.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 5336985..dfd3705 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -590,7 +590,8 @@ static void oom_reap_task(struct task_struct *tsk)
>  	while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm))
>  		schedule_timeout_idle(HZ/10);
>  
> -	if (attempts <= MAX_OOM_REAP_RETRIES)
> +	if (attempts <= MAX_OOM_REAP_RETRIES ||
> +	    test_bit(MMF_OOM_SKIP, &mm->flags))
>  		goto done;
>  
>  
> -- 
> 1.8.3.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-03-22 11:59 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-20 11:57 [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread Tetsuo Handa
2018-03-20 11:57 ` [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt Tetsuo Handa
2018-03-20 12:12   ` Michal Hocko
2018-03-20 12:37     ` Tetsuo Handa
2018-03-20 12:47       ` [PATCH 2/2] mm,oom_reaper: Check for MMF_OOM_SKIP before complain Tetsuo Handa
2018-03-20 20:49         ` David Rientjes
2018-03-22 10:46           ` [PATCH] " Tetsuo Handa
2018-03-22 11:59             ` Michal Hocko
2018-03-20 13:24       ` [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt Michal Hocko
2018-03-20 13:37         ` Tetsuo Handa
2018-03-20 13:44           ` Michal Hocko
2018-03-20 12:28 ` [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread Michal Hocko
2018-03-20 12:52   ` Tetsuo Handa
2018-03-20 13:19     ` Michal Hocko
2018-03-20 13:30       ` Tetsuo Handa
2018-03-20 13:34         ` Michal Hocko
2018-03-20 13:50           ` Tetsuo Handa
2018-03-20 14:10             ` Michal Hocko
2018-03-20 14:20               ` Tetsuo Handa
2018-03-20 20:46                 ` David Rientjes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.