* [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread.
@ 2018-03-20 11:57 Tetsuo Handa
2018-03-20 11:57 ` [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt Tetsuo Handa
2018-03-20 12:28 ` [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread Michal Hocko
0 siblings, 2 replies; 20+ messages in thread
From: Tetsuo Handa @ 2018-03-20 11:57 UTC (permalink / raw)
To: linux-mm; +Cc: Tetsuo Handa, David Rientjes, Michal Hocko
I found that it is not difficult to hit "oom_reaper: unable to reap pid:"
messages if the victim thread is doing copy_process(). Since I noticed
that it is likely helpful to show trace of unable to reap victim thread
for finding locations which should use killable wait, this patch does so.
[ 226.608508] oom_reaper: unable to reap pid:9261 (a.out)
[ 226.611971] a.out D13056 9261 6927 0x00100084
[ 226.615879] Call Trace:
[ 226.617926] ? __schedule+0x25f/0x780
[ 226.620559] schedule+0x2d/0x80
[ 226.623356] rwsem_down_write_failed+0x2bb/0x440
[ 226.626426] ? rwsem_down_write_failed+0x55/0x440
[ 226.629458] ? anon_vma_fork+0x124/0x150
[ 226.632679] call_rwsem_down_write_failed+0x13/0x20
[ 226.635884] down_write+0x49/0x60
[ 226.638867] ? copy_process.part.41+0x12f2/0x1fe0
[ 226.642042] copy_process.part.41+0x12f2/0x1fe0 /* i_mmap_lock_write() in dup_mmap() */
[ 226.645087] ? _do_fork+0xe6/0x560
[ 226.647991] _do_fork+0xe6/0x560
[ 226.650495] ? syscall_trace_enter+0x1a9/0x240
[ 226.653443] ? retint_user+0x18/0x18
[ 226.656601] ? page_fault+0x2f/0x50
[ 226.659159] ? trace_hardirqs_on_caller+0x11f/0x1b0
[ 226.662399] do_syscall_64+0x74/0x230
[ 226.664989] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 380.525910] oom_reaper: unable to reap pid:6927 (a.out)
[ 380.529274] a.out R running task 13544 6927 1 0x00100084
[ 380.533264] Call Trace:
[ 380.535433] ? __schedule+0x25f/0x780
[ 380.538052] ? mark_held_locks+0x60/0x80
[ 380.540775] schedule+0x2d/0x80
[ 380.543258] schedule_timeout+0x1a4/0x350
[ 380.546053] ? __next_timer_interrupt+0xd0/0xd0
[ 380.549033] msleep+0x25/0x30
[ 380.551458] shrink_inactive_list+0x5b7/0x690
[ 380.554408] ? __lock_acquire+0x1f1/0xfd0
[ 380.557228] ? find_held_lock+0x2d/0x90
[ 380.559959] shrink_node_memcg+0x340/0x770
[ 380.562776] ? __lock_acquire+0x246/0xfd0
[ 380.565532] ? mem_cgroup_iter+0x121/0x4f0
[ 380.568337] ? mem_cgroup_iter+0x121/0x4f0
[ 380.571203] shrink_node+0xd8/0x370
[ 380.573745] do_try_to_free_pages+0xe3/0x390
[ 380.576759] try_to_free_pages+0xc8/0x110
[ 380.579476] __alloc_pages_slowpath+0x28a/0x8d9
[ 380.582639] __alloc_pages_nodemask+0x21d/0x260
[ 380.585662] new_slab+0x558/0x760
[ 380.588188] ___slab_alloc+0x353/0x6f0
[ 380.590962] ? copy_process.part.41+0x121f/0x1fe0
[ 380.594008] ? find_held_lock+0x2d/0x90
[ 380.596750] ? copy_process.part.41+0x121f/0x1fe0
[ 380.599852] __slab_alloc+0x41/0x7a
[ 380.602469] ? copy_process.part.41+0x121f/0x1fe0
[ 380.605607] kmem_cache_alloc+0x1a6/0x1f0
[ 380.608528] copy_process.part.41+0x121f/0x1fe0 /* kmem_cache_alloc(GFP_KERNEL) in dup_mmap() */
[ 380.611613] ? _do_fork+0xe6/0x560
[ 380.614299] _do_fork+0xe6/0x560
[ 380.616752] ? syscall_trace_enter+0x1a9/0x240
[ 380.619813] ? retint_user+0x18/0x18
[ 380.622588] ? page_fault+0x2f/0x50
[ 380.625184] ? trace_hardirqs_on_caller+0x11f/0x1b0
[ 380.628841] do_syscall_64+0x74/0x230
[ 380.631660] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 600.285293] oom_reaper: unable to reap pid:7550 (a.out)
[ 600.289112] a.out D13104 7550 6931 0x00100084
[ 600.292918] Call Trace:
[ 600.295506] ? __schedule+0x25f/0x780
[ 600.298614] ? __lock_acquire+0x246/0xfd0
[ 600.301879] schedule+0x2d/0x80
[ 600.304623] schedule_timeout+0x1fd/0x350
[ 600.307633] ? find_held_lock+0x2d/0x90
[ 600.310909] ? mark_held_locks+0x60/0x80
[ 600.314001] ? _raw_spin_unlock_irq+0x24/0x30
[ 600.317382] wait_for_completion+0xab/0x130
[ 600.320579] ? wake_up_q+0x70/0x70
[ 600.323419] flush_work+0x1bd/0x260
[ 600.326434] ? flush_work+0x174/0x260
[ 600.329332] ? destroy_worker+0x90/0x90
[ 600.332301] drain_all_pages+0x16d/0x1e0
[ 600.335464] __alloc_pages_slowpath+0x443/0x8d9
[ 600.338743] __alloc_pages_nodemask+0x21d/0x260
[ 600.342178] new_slab+0x558/0x760
[ 600.344861] ___slab_alloc+0x353/0x6f0
[ 600.347744] ? copy_process.part.41+0x121f/0x1fe0
[ 600.351213] ? find_held_lock+0x2d/0x90
[ 600.354119] ? copy_process.part.41+0x121f/0x1fe0
[ 600.357491] __slab_alloc+0x41/0x7a
[ 600.360206] ? copy_process.part.41+0x121f/0x1fe0
[ 600.363480] kmem_cache_alloc+0x1a6/0x1f0
[ 600.366579] copy_process.part.41+0x121f/0x1fe0 /* kmem_cache_alloc(GFP_KERNEL) in dup_mmap() */
[ 600.369761] ? _do_fork+0xe6/0x560
[ 600.372382] _do_fork+0xe6/0x560
[ 600.375142] ? syscall_trace_enter+0x1a9/0x240
[ 600.378258] ? retint_user+0x18/0x18
[ 600.380978] ? page_fault+0x2f/0x50
[ 600.383814] ? trace_hardirqs_on_caller+0x11f/0x1b0
[ 600.387096] do_syscall_64+0x74/0x230
[ 600.389928] entry_SYSCALL_64_after_hwframe+0x42/0xb7
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
---
mm/oom_kill.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 5336985..900300c 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -41,6 +41,7 @@
#include <linux/kthread.h>
#include <linux/init.h>
#include <linux/mmu_notifier.h>
+#include <linux/sched/debug.h>
#include <asm/tlb.h>
#include "internal.h"
@@ -596,6 +597,7 @@ static void oom_reap_task(struct task_struct *tsk)
pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
task_pid_nr(tsk), tsk->comm);
+ sched_show_task(tsk);
debug_show_all_locks();
done:
--
1.8.3.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt.
2018-03-20 11:57 [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread Tetsuo Handa
@ 2018-03-20 11:57 ` Tetsuo Handa
2018-03-20 12:12 ` Michal Hocko
2018-03-20 12:28 ` [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread Michal Hocko
1 sibling, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2018-03-20 11:57 UTC (permalink / raw)
To: linux-mm; +Cc: Tetsuo Handa, David Rientjes, Michal Hocko
I got "oom_reaper: unable to reap pid:" messages when the victim thread
was blocked inside free_pgtables() (which occurred after returning from
unmap_vmas() and setting MMF_OOM_SKIP). We don't need to complain when
__oom_reap_task_mm() returned true (by e.g. finding MMF_OOM_SKIP already
set) when oom_reap_task() was trying MAX_OOM_REAP_RETRIES'th attempt.
[ 663.593821] Killed process 7558 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
[ 664.684801] oom_reaper: unable to reap pid:7558 (a.out)
[ 664.892292] a.out D13272 7558 6931 0x00100084
[ 664.895765] Call Trace:
[ 664.897574] ? __schedule+0x25f/0x780
[ 664.900099] schedule+0x2d/0x80
[ 664.902260] rwsem_down_write_failed+0x2bb/0x440
[ 664.905249] ? rwsem_down_write_failed+0x55/0x440
[ 664.908335] ? free_pgd_range+0x569/0x5e0
[ 664.911145] call_rwsem_down_write_failed+0x13/0x20
[ 664.914121] down_write+0x49/0x60
[ 664.916519] ? unlink_file_vma+0x28/0x50
[ 664.919255] unlink_file_vma+0x28/0x50
[ 664.922234] free_pgtables+0x36/0x100
[ 664.924797] exit_mmap+0xbb/0x180
[ 664.927220] mmput+0x50/0x110
[ 664.929504] copy_process.part.41+0xb61/0x1fe0
[ 664.932448] ? _do_fork+0xe6/0x560
[ 664.934902] ? _do_fork+0xe6/0x560
[ 664.937361] _do_fork+0xe6/0x560
[ 664.939742] ? syscall_trace_enter+0x1a9/0x240
[ 664.942693] ? retint_user+0x18/0x18
[ 664.945309] ? page_fault+0x2f/0x50
[ 664.947896] ? trace_hardirqs_on_caller+0x11f/0x1b0
[ 664.951075] do_syscall_64+0x74/0x230
[ 664.953747] entry_SYSCALL_64_after_hwframe+0x42/0xb7
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
---
mm/oom_kill.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 900300c..1cb2b98 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -588,11 +588,11 @@ static void oom_reap_task(struct task_struct *tsk)
struct mm_struct *mm = tsk->signal->oom_mm;
/* Retry the down_read_trylock(mmap_sem) a few times */
- while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm))
+ while (attempts++ < MAX_OOM_REAP_RETRIES) {
+ if (__oom_reap_task_mm(tsk, mm))
+ goto done;
schedule_timeout_idle(HZ/10);
-
- if (attempts <= MAX_OOM_REAP_RETRIES)
- goto done;
+ }
pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
--
1.8.3.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt.
2018-03-20 11:57 ` [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt Tetsuo Handa
@ 2018-03-20 12:12 ` Michal Hocko
2018-03-20 12:37 ` Tetsuo Handa
0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2018-03-20 12:12 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: linux-mm, David Rientjes
On Tue 20-03-18 20:57:56, Tetsuo Handa wrote:
> I got "oom_reaper: unable to reap pid:" messages when the victim thread
> was blocked inside free_pgtables() (which occurred after returning from
> unmap_vmas() and setting MMF_OOM_SKIP). We don't need to complain when
> __oom_reap_task_mm() returned true (by e.g. finding MMF_OOM_SKIP already
> set) when oom_reap_task() was trying MAX_OOM_REAP_RETRIES'th attempt.
>
> [ 663.593821] Killed process 7558 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
> [ 664.684801] oom_reaper: unable to reap pid:7558 (a.out)
I do not see "oom_reaper: reaped process..." so has the task been
reaped?
> [ 664.892292] a.out D13272 7558 6931 0x00100084
> [ 664.895765] Call Trace:
> [ 664.897574] ? __schedule+0x25f/0x780
> [ 664.900099] schedule+0x2d/0x80
> [ 664.902260] rwsem_down_write_failed+0x2bb/0x440
> [ 664.905249] ? rwsem_down_write_failed+0x55/0x440
> [ 664.908335] ? free_pgd_range+0x569/0x5e0
> [ 664.911145] call_rwsem_down_write_failed+0x13/0x20
> [ 664.914121] down_write+0x49/0x60
> [ 664.916519] ? unlink_file_vma+0x28/0x50
> [ 664.919255] unlink_file_vma+0x28/0x50
> [ 664.922234] free_pgtables+0x36/0x100
> [ 664.924797] exit_mmap+0xbb/0x180
> [ 664.927220] mmput+0x50/0x110
> [ 664.929504] copy_process.part.41+0xb61/0x1fe0
> [ 664.932448] ? _do_fork+0xe6/0x560
> [ 664.934902] ? _do_fork+0xe6/0x560
> [ 664.937361] _do_fork+0xe6/0x560
> [ 664.939742] ? syscall_trace_enter+0x1a9/0x240
> [ 664.942693] ? retint_user+0x18/0x18
> [ 664.945309] ? page_fault+0x2f/0x50
> [ 664.947896] ? trace_hardirqs_on_caller+0x11f/0x1b0
> [ 664.951075] do_syscall_64+0x74/0x230
> [ 664.953747] entry_SYSCALL_64_after_hwframe+0x42/0xb7
>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
> mm/oom_kill.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 900300c..1cb2b98 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -588,11 +588,11 @@ static void oom_reap_task(struct task_struct *tsk)
> struct mm_struct *mm = tsk->signal->oom_mm;
>
> /* Retry the down_read_trylock(mmap_sem) a few times */
> - while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm))
> + while (attempts++ < MAX_OOM_REAP_RETRIES) {
> + if (__oom_reap_task_mm(tsk, mm))
> + goto done;
> schedule_timeout_idle(HZ/10);
> -
> - if (attempts <= MAX_OOM_REAP_RETRIES)
> - goto done;
> + }
I do not see how this improves anything. Even if __oom_reap_task_mm
suceeded during the last attempt then attempts == MAX_OOM_REAP_RETRIES
and the if below would bail out. Or what do I miss?
> pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread.
2018-03-20 11:57 [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread Tetsuo Handa
2018-03-20 11:57 ` [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt Tetsuo Handa
@ 2018-03-20 12:28 ` Michal Hocko
2018-03-20 12:52 ` Tetsuo Handa
1 sibling, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2018-03-20 12:28 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: linux-mm, David Rientjes
On Tue 20-03-18 20:57:55, Tetsuo Handa wrote:
> I found that it is not difficult to hit "oom_reaper: unable to reap pid:"
> messages if the victim thread is doing copy_process(). Since I noticed
> that it is likely helpful to show trace of unable to reap victim thread
> for finding locations which should use killable wait, this patch does so.
>
> [ 226.608508] oom_reaper: unable to reap pid:9261 (a.out)
> [ 226.611971] a.out D13056 9261 6927 0x00100084
> [ 226.615879] Call Trace:
> [ 226.617926] ? __schedule+0x25f/0x780
> [ 226.620559] schedule+0x2d/0x80
> [ 226.623356] rwsem_down_write_failed+0x2bb/0x440
> [ 226.626426] ? rwsem_down_write_failed+0x55/0x440
> [ 226.629458] ? anon_vma_fork+0x124/0x150
> [ 226.632679] call_rwsem_down_write_failed+0x13/0x20
> [ 226.635884] down_write+0x49/0x60
> [ 226.638867] ? copy_process.part.41+0x12f2/0x1fe0
> [ 226.642042] copy_process.part.41+0x12f2/0x1fe0 /* i_mmap_lock_write() in dup_mmap() */
> [ 226.645087] ? _do_fork+0xe6/0x560
> [ 226.647991] _do_fork+0xe6/0x560
> [ 226.650495] ? syscall_trace_enter+0x1a9/0x240
> [ 226.653443] ? retint_user+0x18/0x18
> [ 226.656601] ? page_fault+0x2f/0x50
> [ 226.659159] ? trace_hardirqs_on_caller+0x11f/0x1b0
> [ 226.662399] do_syscall_64+0x74/0x230
> [ 226.664989] entry_SYSCALL_64_after_hwframe+0x42/0xb7
A single stack trace in the changelog would be sufficient IMHO.
Appart from that. What do you expect users will do about this trace?
Sure they will see a path which holds mmap_sem, we will see a bug report
but we can hardly do anything about that. We simply cannot drop the lock
from that path in 99% of situations. So _why_ do we want to add more
information to the log?
[...]
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
> mm/oom_kill.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 5336985..900300c 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -41,6 +41,7 @@
> #include <linux/kthread.h>
> #include <linux/init.h>
> #include <linux/mmu_notifier.h>
> +#include <linux/sched/debug.h>
>
> #include <asm/tlb.h>
> #include "internal.h"
> @@ -596,6 +597,7 @@ static void oom_reap_task(struct task_struct *tsk)
>
> pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
> task_pid_nr(tsk), tsk->comm);
> + sched_show_task(tsk);
> debug_show_all_locks();
>
> done:
> --
> 1.8.3.1
>
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt.
2018-03-20 12:12 ` Michal Hocko
@ 2018-03-20 12:37 ` Tetsuo Handa
2018-03-20 12:47 ` [PATCH 2/2] mm,oom_reaper: Check for MMF_OOM_SKIP before complain Tetsuo Handa
2018-03-20 13:24 ` [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt Michal Hocko
0 siblings, 2 replies; 20+ messages in thread
From: Tetsuo Handa @ 2018-03-20 12:37 UTC (permalink / raw)
To: mhocko; +Cc: linux-mm, rientjes
Michal Hocko wrote:
> On Tue 20-03-18 20:57:56, Tetsuo Handa wrote:
> > I got "oom_reaper: unable to reap pid:" messages when the victim thread
> > was blocked inside free_pgtables() (which occurred after returning from
> > unmap_vmas() and setting MMF_OOM_SKIP). We don't need to complain when
> > __oom_reap_task_mm() returned true (by e.g. finding MMF_OOM_SKIP already
> > set) when oom_reap_task() was trying MAX_OOM_REAP_RETRIES'th attempt.
> >
> > [ 663.593821] Killed process 7558 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
> > [ 664.684801] oom_reaper: unable to reap pid:7558 (a.out)
>
> I do not see "oom_reaper: reaped process..." so has the task been
> reaped?
The log is http://I-love.SAKURA.ne.jp/tmp/serial-20180320.txt.xz .
I think the task succeeded to call unmap_vmas() in exit_mmap(), for
no "oom_reaper: reaped process..." message means that __oom_reap_task_mm()
returned without setting MMF_UNSTABLE.
>
> > [ 664.892292] a.out D13272 7558 6931 0x00100084
> > [ 664.895765] Call Trace:
> > [ 664.897574] ? __schedule+0x25f/0x780
> > [ 664.900099] schedule+0x2d/0x80
> > [ 664.902260] rwsem_down_write_failed+0x2bb/0x440
> > [ 664.905249] ? rwsem_down_write_failed+0x55/0x440
> > [ 664.908335] ? free_pgd_range+0x569/0x5e0
> > [ 664.911145] call_rwsem_down_write_failed+0x13/0x20
> > [ 664.914121] down_write+0x49/0x60
> > [ 664.916519] ? unlink_file_vma+0x28/0x50
> > [ 664.919255] unlink_file_vma+0x28/0x50
> > [ 664.922234] free_pgtables+0x36/0x100
> > [ 664.924797] exit_mmap+0xbb/0x180
> > [ 664.927220] mmput+0x50/0x110
> > [ 664.929504] copy_process.part.41+0xb61/0x1fe0
> > [ 664.932448] ? _do_fork+0xe6/0x560
> > [ 664.934902] ? _do_fork+0xe6/0x560
> > [ 664.937361] _do_fork+0xe6/0x560
> > [ 664.939742] ? syscall_trace_enter+0x1a9/0x240
> > [ 664.942693] ? retint_user+0x18/0x18
> > [ 664.945309] ? page_fault+0x2f/0x50
> > [ 664.947896] ? trace_hardirqs_on_caller+0x11f/0x1b0
> > [ 664.951075] do_syscall_64+0x74/0x230
> > [ 664.953747] entry_SYSCALL_64_after_hwframe+0x42/0xb7
> >
> > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: David Rientjes <rientjes@google.com>
> > ---
> > mm/oom_kill.c | 8 ++++----
> > 1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 900300c..1cb2b98 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -588,11 +588,11 @@ static void oom_reap_task(struct task_struct *tsk)
> > struct mm_struct *mm = tsk->signal->oom_mm;
> >
> > /* Retry the down_read_trylock(mmap_sem) a few times */
> > - while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm))
> > + while (attempts++ < MAX_OOM_REAP_RETRIES) {
> > + if (__oom_reap_task_mm(tsk, mm))
> > + goto done;
> > schedule_timeout_idle(HZ/10);
> > -
> > - if (attempts <= MAX_OOM_REAP_RETRIES)
> > - goto done;
> > + }
>
> I do not see how this improves anything. Even if __oom_reap_task_mm
> suceeded during the last attempt then attempts == MAX_OOM_REAP_RETRIES
> and the if below would bail out. Or what do I miss?
Oops, I optimized incorrectly. The original patch was
- if (attempts <= MAX_OOM_REAP_RETRIES)
+ if (attempts <= MAX_OOM_REAP_RETRIES ||
+ test_bit(MMF_OOM_SKIP, &mm->flags))
goto done;
with title "Check for MMF_OOM_SKIP before complain".
Even if the last attempt failed, there is no need to complain if
MMF_OOM_SKIP was already set.
>
> > pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
>
> --
> Michal Hocko
> SUSE Labs
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 2/2] mm,oom_reaper: Check for MMF_OOM_SKIP before complain.
2018-03-20 12:37 ` Tetsuo Handa
@ 2018-03-20 12:47 ` Tetsuo Handa
2018-03-20 20:49 ` David Rientjes
2018-03-20 13:24 ` [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt Michal Hocko
1 sibling, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2018-03-20 12:47 UTC (permalink / raw)
To: mhocko; +Cc: linux-mm, rientjes
>From c8954b88c8e78d10829c646c95e076516a54e84f Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Tue, 20 Mar 2018 21:42:12 +0900
Subject: [PATCH] mm,oom_reaper: Check for MMF_OOM_SKIP before complain.
I got "oom_reaper: unable to reap pid:" messages when the victim thread
was blocked inside free_pgtables() (which occurred after returning from
unmap_vmas() and setting MMF_OOM_SKIP). We don't need to complain when
exit_mmap() already set MMF_OOM_SKIP.
[ 663.593821] Killed process 7558 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
[ 664.684801] oom_reaper: unable to reap pid:7558 (a.out)
[ 664.892292] a.out D13272 7558 6931 0x00100084
[ 664.895765] Call Trace:
[ 664.897574] ? __schedule+0x25f/0x780
[ 664.900099] schedule+0x2d/0x80
[ 664.902260] rwsem_down_write_failed+0x2bb/0x440
[ 664.905249] ? rwsem_down_write_failed+0x55/0x440
[ 664.908335] ? free_pgd_range+0x569/0x5e0
[ 664.911145] call_rwsem_down_write_failed+0x13/0x20
[ 664.914121] down_write+0x49/0x60
[ 664.916519] ? unlink_file_vma+0x28/0x50
[ 664.919255] unlink_file_vma+0x28/0x50
[ 664.922234] free_pgtables+0x36/0x100
[ 664.924797] exit_mmap+0xbb/0x180
[ 664.927220] mmput+0x50/0x110
[ 664.929504] copy_process.part.41+0xb61/0x1fe0
[ 664.932448] ? _do_fork+0xe6/0x560
[ 664.934902] ? _do_fork+0xe6/0x560
[ 664.937361] _do_fork+0xe6/0x560
[ 664.939742] ? syscall_trace_enter+0x1a9/0x240
[ 664.942693] ? retint_user+0x18/0x18
[ 664.945309] ? page_fault+0x2f/0x50
[ 664.947896] ? trace_hardirqs_on_caller+0x11f/0x1b0
[ 664.951075] do_syscall_64+0x74/0x230
[ 664.953747] entry_SYSCALL_64_after_hwframe+0x42/0xb7
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
---
mm/oom_kill.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 5336985..dfd3705 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -590,7 +590,8 @@ static void oom_reap_task(struct task_struct *tsk)
while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm))
schedule_timeout_idle(HZ/10);
- if (attempts <= MAX_OOM_REAP_RETRIES)
+ if (attempts <= MAX_OOM_REAP_RETRIES ||
+ test_bit(MMF_OOM_SKIP, &mm->flags))
goto done;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread.
2018-03-20 12:28 ` [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread Michal Hocko
@ 2018-03-20 12:52 ` Tetsuo Handa
2018-03-20 13:19 ` Michal Hocko
0 siblings, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2018-03-20 12:52 UTC (permalink / raw)
To: mhocko; +Cc: linux-mm, rientjes
Michal Hocko wrote:
> On Tue 20-03-18 20:57:55, Tetsuo Handa wrote:
> > I found that it is not difficult to hit "oom_reaper: unable to reap pid:"
> > messages if the victim thread is doing copy_process(). Since I noticed
> > that it is likely helpful to show trace of unable to reap victim thread
> > for finding locations which should use killable wait, this patch does so.
> >
> > [ 226.608508] oom_reaper: unable to reap pid:9261 (a.out)
> > [ 226.611971] a.out D13056 9261 6927 0x00100084
> > [ 226.615879] Call Trace:
> > [ 226.617926] ? __schedule+0x25f/0x780
> > [ 226.620559] schedule+0x2d/0x80
> > [ 226.623356] rwsem_down_write_failed+0x2bb/0x440
> > [ 226.626426] ? rwsem_down_write_failed+0x55/0x440
> > [ 226.629458] ? anon_vma_fork+0x124/0x150
> > [ 226.632679] call_rwsem_down_write_failed+0x13/0x20
> > [ 226.635884] down_write+0x49/0x60
> > [ 226.638867] ? copy_process.part.41+0x12f2/0x1fe0
> > [ 226.642042] copy_process.part.41+0x12f2/0x1fe0 /* i_mmap_lock_write() in dup_mmap() */
> > [ 226.645087] ? _do_fork+0xe6/0x560
> > [ 226.647991] _do_fork+0xe6/0x560
> > [ 226.650495] ? syscall_trace_enter+0x1a9/0x240
> > [ 226.653443] ? retint_user+0x18/0x18
> > [ 226.656601] ? page_fault+0x2f/0x50
> > [ 226.659159] ? trace_hardirqs_on_caller+0x11f/0x1b0
> > [ 226.662399] do_syscall_64+0x74/0x230
> > [ 226.664989] entry_SYSCALL_64_after_hwframe+0x42/0xb7
>
> A single stack trace in the changelog would be sufficient IMHO.
> Appart from that. What do you expect users will do about this trace?
> Sure they will see a path which holds mmap_sem, we will see a bug report
> but we can hardly do anything about that. We simply cannot drop the lock
> from that path in 99% of situations. So _why_ do we want to add more
> information to the log?
This case is blocked at i_mmap_lock_write(). If we can add error handling path
there, we can replace i_mmap_lock_write() with i_mmap_lock_write_killable() and
bail out soon. This patch helps finding such locations.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread.
2018-03-20 12:52 ` Tetsuo Handa
@ 2018-03-20 13:19 ` Michal Hocko
2018-03-20 13:30 ` Tetsuo Handa
0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2018-03-20 13:19 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: linux-mm, rientjes
On Tue 20-03-18 21:52:33, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Tue 20-03-18 20:57:55, Tetsuo Handa wrote:
> > > I found that it is not difficult to hit "oom_reaper: unable to reap pid:"
> > > messages if the victim thread is doing copy_process(). Since I noticed
> > > that it is likely helpful to show trace of unable to reap victim thread
> > > for finding locations which should use killable wait, this patch does so.
> > >
> > > [ 226.608508] oom_reaper: unable to reap pid:9261 (a.out)
> > > [ 226.611971] a.out D13056 9261 6927 0x00100084
> > > [ 226.615879] Call Trace:
> > > [ 226.617926] ? __schedule+0x25f/0x780
> > > [ 226.620559] schedule+0x2d/0x80
> > > [ 226.623356] rwsem_down_write_failed+0x2bb/0x440
> > > [ 226.626426] ? rwsem_down_write_failed+0x55/0x440
> > > [ 226.629458] ? anon_vma_fork+0x124/0x150
> > > [ 226.632679] call_rwsem_down_write_failed+0x13/0x20
> > > [ 226.635884] down_write+0x49/0x60
> > > [ 226.638867] ? copy_process.part.41+0x12f2/0x1fe0
> > > [ 226.642042] copy_process.part.41+0x12f2/0x1fe0 /* i_mmap_lock_write() in dup_mmap() */
> > > [ 226.645087] ? _do_fork+0xe6/0x560
> > > [ 226.647991] _do_fork+0xe6/0x560
> > > [ 226.650495] ? syscall_trace_enter+0x1a9/0x240
> > > [ 226.653443] ? retint_user+0x18/0x18
> > > [ 226.656601] ? page_fault+0x2f/0x50
> > > [ 226.659159] ? trace_hardirqs_on_caller+0x11f/0x1b0
> > > [ 226.662399] do_syscall_64+0x74/0x230
> > > [ 226.664989] entry_SYSCALL_64_after_hwframe+0x42/0xb7
> >
> > A single stack trace in the changelog would be sufficient IMHO.
> > Appart from that. What do you expect users will do about this trace?
> > Sure they will see a path which holds mmap_sem, we will see a bug report
> > but we can hardly do anything about that. We simply cannot drop the lock
> > from that path in 99% of situations. So _why_ do we want to add more
> > information to the log?
>
> This case is blocked at i_mmap_lock_write().
But why does i_mmap_lock_write matter for oom_reaping. We are not
touching hugetlb mappings. dup_mmap holds mmap_sem for write which is
the most probable source of the backoff.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt.
2018-03-20 12:37 ` Tetsuo Handa
2018-03-20 12:47 ` [PATCH 2/2] mm,oom_reaper: Check for MMF_OOM_SKIP before complain Tetsuo Handa
@ 2018-03-20 13:24 ` Michal Hocko
2018-03-20 13:37 ` Tetsuo Handa
1 sibling, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2018-03-20 13:24 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: linux-mm, rientjes
On Tue 20-03-18 21:37:42, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Tue 20-03-18 20:57:56, Tetsuo Handa wrote:
> > > I got "oom_reaper: unable to reap pid:" messages when the victim thread
> > > was blocked inside free_pgtables() (which occurred after returning from
> > > unmap_vmas() and setting MMF_OOM_SKIP). We don't need to complain when
> > > __oom_reap_task_mm() returned true (by e.g. finding MMF_OOM_SKIP already
> > > set) when oom_reap_task() was trying MAX_OOM_REAP_RETRIES'th attempt.
> > >
> > > [ 663.593821] Killed process 7558 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
> > > [ 664.684801] oom_reaper: unable to reap pid:7558 (a.out)
> >
> > I do not see "oom_reaper: reaped process..." so has the task been
> > reaped?
>
> The log is http://I-love.SAKURA.ne.jp/tmp/serial-20180320.txt.xz .
xzgrep oom_reaper: serial-20180320.txt.xz
[ 97.144805] oom_reaper: reaped process 2142 (cleanupd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 103.447171] oom_reaper: reaped process 659 (NetworkManager), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 208.906879] oom_reaper: reaped process 1415 (postgres), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 214.179760] oom_reaper: reaped process 588 (irqbalance), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 215.013260] oom_reaper: reaped process 591 (systemd-logind), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 221.483724] oom_reaper: reaped process 7388 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 223.140437] oom_reaper: reaped process 9252 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 224.735597] oom_reaper: reaped process 9257 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 226.608508] oom_reaper: unable to reap pid:9261 (a.out)
[ 354.342331] oom_reaper: unable to reap pid:16611 (a.out)
[ 380.525910] oom_reaper: unable to reap pid:6927 (a.out)
[ 408.397539] oom_reaper: unable to reap pid:6927 (a.out)
[ 435.702005] oom_reaper: unable to reap pid:7554 (a.out)
[ 466.269660] oom_reaper: unable to reap pid:7560 (a.out)
[ 495.621196] oom_reaper: unable to reap pid:7563 (a.out)
[ 534.279602] oom_reaper: reaped process 22038 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 536.271631] oom_reaper: unable to reap pid:6928 (a.out)
[ 600.285293] oom_reaper: unable to reap pid:7550 (a.out)
[ 664.684801] oom_reaper: unable to reap pid:7558 (a.out)
[ 743.188067] oom_reaper: reaped process 7562 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[ 744.980868] oom_reaper: unable to reap pid:6928 (a.out)
None of the reaped taks is reported as "unable to reap..." So what
actually gets fixed by this patch?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread.
2018-03-20 13:19 ` Michal Hocko
@ 2018-03-20 13:30 ` Tetsuo Handa
2018-03-20 13:34 ` Michal Hocko
0 siblings, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2018-03-20 13:30 UTC (permalink / raw)
To: mhocko; +Cc: linux-mm, rientjes
Michal Hocko wrote:
> On Tue 20-03-18 21:52:33, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > A single stack trace in the changelog would be sufficient IMHO.
> > > Appart from that. What do you expect users will do about this trace?
> > > Sure they will see a path which holds mmap_sem, we will see a bug report
> > > but we can hardly do anything about that. We simply cannot drop the lock
> > > from that path in 99% of situations. So _why_ do we want to add more
> > > information to the log?
> >
> > This case is blocked at i_mmap_lock_write().
>
> But why does i_mmap_lock_write matter for oom_reaping. We are not
> touching hugetlb mappings. dup_mmap holds mmap_sem for write which is
> the most probable source of the backoff.
If i_mmap_lock_write can bail out upon SIGKILL, the OOM victim will be able to
release mmap_sem held for write, which helps the OOM reaper not to back off.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread.
2018-03-20 13:30 ` Tetsuo Handa
@ 2018-03-20 13:34 ` Michal Hocko
2018-03-20 13:50 ` Tetsuo Handa
0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2018-03-20 13:34 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: linux-mm, rientjes
On Tue 20-03-18 22:30:16, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Tue 20-03-18 21:52:33, Tetsuo Handa wrote:
> > > Michal Hocko wrote:
> > > > A single stack trace in the changelog would be sufficient IMHO.
> > > > Appart from that. What do you expect users will do about this trace?
> > > > Sure they will see a path which holds mmap_sem, we will see a bug report
> > > > but we can hardly do anything about that. We simply cannot drop the lock
> > > > from that path in 99% of situations. So _why_ do we want to add more
> > > > information to the log?
> > >
> > > This case is blocked at i_mmap_lock_write().
> >
> > But why does i_mmap_lock_write matter for oom_reaping. We are not
> > touching hugetlb mappings. dup_mmap holds mmap_sem for write which is
> > the most probable source of the backoff.
>
> If i_mmap_lock_write can bail out upon SIGKILL, the OOM victim will be able to
> release mmap_sem held for write, which helps the OOM reaper not to back off.
There are so many other blocking calls (including allocations) in
dup_mmap that I do not really think i_mmap_lock_write is the biggest
problem. That will be likely the case for other mmap_sem write lockers.
Really I am not sure dumping more information is beneficial here.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt.
2018-03-20 13:24 ` [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt Michal Hocko
@ 2018-03-20 13:37 ` Tetsuo Handa
2018-03-20 13:44 ` Michal Hocko
0 siblings, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2018-03-20 13:37 UTC (permalink / raw)
To: mhocko; +Cc: linux-mm, rientjes
Michal Hocko wrote:
> On Tue 20-03-18 21:37:42, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Tue 20-03-18 20:57:56, Tetsuo Handa wrote:
> > > > I got "oom_reaper: unable to reap pid:" messages when the victim thread
> > > > was blocked inside free_pgtables() (which occurred after returning from
> > > > unmap_vmas() and setting MMF_OOM_SKIP). We don't need to complain when
> > > > __oom_reap_task_mm() returned true (by e.g. finding MMF_OOM_SKIP already
> > > > set) when oom_reap_task() was trying MAX_OOM_REAP_RETRIES'th attempt.
> > > >
> > > > [ 663.593821] Killed process 7558 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
> > > > [ 664.684801] oom_reaper: unable to reap pid:7558 (a.out)
> > >
> > > I do not see "oom_reaper: reaped process..." so has the task been
> > > reaped?
> >
> > The log is http://I-love.SAKURA.ne.jp/tmp/serial-20180320.txt.xz .
>
> xzgrep oom_reaper: serial-20180320.txt.xz
> [ 97.144805] oom_reaper: reaped process 2142 (cleanupd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [ 103.447171] oom_reaper: reaped process 659 (NetworkManager), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [ 208.906879] oom_reaper: reaped process 1415 (postgres), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [ 214.179760] oom_reaper: reaped process 588 (irqbalance), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [ 215.013260] oom_reaper: reaped process 591 (systemd-logind), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [ 221.483724] oom_reaper: reaped process 7388 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [ 223.140437] oom_reaper: reaped process 9252 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [ 224.735597] oom_reaper: reaped process 9257 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [ 226.608508] oom_reaper: unable to reap pid:9261 (a.out)
> [ 354.342331] oom_reaper: unable to reap pid:16611 (a.out)
> [ 380.525910] oom_reaper: unable to reap pid:6927 (a.out)
> [ 408.397539] oom_reaper: unable to reap pid:6927 (a.out)
> [ 435.702005] oom_reaper: unable to reap pid:7554 (a.out)
> [ 466.269660] oom_reaper: unable to reap pid:7560 (a.out)
> [ 495.621196] oom_reaper: unable to reap pid:7563 (a.out)
> [ 534.279602] oom_reaper: reaped process 22038 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [ 536.271631] oom_reaper: unable to reap pid:6928 (a.out)
> [ 600.285293] oom_reaper: unable to reap pid:7550 (a.out)
> [ 664.684801] oom_reaper: unable to reap pid:7558 (a.out)
> [ 743.188067] oom_reaper: reaped process 7562 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> [ 744.980868] oom_reaper: unable to reap pid:6928 (a.out)
>
> None of the reaped taks is reported as "unable to reap..." So what
> actually gets fixed by this patch?
This patch (which I sent as "[PATCH 2/2] mm,oom_reaper: Check for MMF_OOM_SKIP before complain.")
makes the OOM reaper not to complain if MMF_OOM_SKIP was already set by exit_mmap().
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt.
2018-03-20 13:37 ` Tetsuo Handa
@ 2018-03-20 13:44 ` Michal Hocko
0 siblings, 0 replies; 20+ messages in thread
From: Michal Hocko @ 2018-03-20 13:44 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: linux-mm, rientjes
On Tue 20-03-18 22:37:31, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Tue 20-03-18 21:37:42, Tetsuo Handa wrote:
> > > Michal Hocko wrote:
> > > > On Tue 20-03-18 20:57:56, Tetsuo Handa wrote:
> > > > > I got "oom_reaper: unable to reap pid:" messages when the victim thread
> > > > > was blocked inside free_pgtables() (which occurred after returning from
> > > > > unmap_vmas() and setting MMF_OOM_SKIP). We don't need to complain when
> > > > > __oom_reap_task_mm() returned true (by e.g. finding MMF_OOM_SKIP already
> > > > > set) when oom_reap_task() was trying MAX_OOM_REAP_RETRIES'th attempt.
> > > > >
> > > > > [ 663.593821] Killed process 7558 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
> > > > > [ 664.684801] oom_reaper: unable to reap pid:7558 (a.out)
> > > >
> > > > I do not see "oom_reaper: reaped process..." so has the task been
> > > > reaped?
> > >
> > > The log is http://I-love.SAKURA.ne.jp/tmp/serial-20180320.txt.xz .
> >
> > xzgrep oom_reaper: serial-20180320.txt.xz
> > [ 97.144805] oom_reaper: reaped process 2142 (cleanupd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [ 103.447171] oom_reaper: reaped process 659 (NetworkManager), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [ 208.906879] oom_reaper: reaped process 1415 (postgres), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [ 214.179760] oom_reaper: reaped process 588 (irqbalance), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [ 215.013260] oom_reaper: reaped process 591 (systemd-logind), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [ 221.483724] oom_reaper: reaped process 7388 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [ 223.140437] oom_reaper: reaped process 9252 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [ 224.735597] oom_reaper: reaped process 9257 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [ 226.608508] oom_reaper: unable to reap pid:9261 (a.out)
> > [ 354.342331] oom_reaper: unable to reap pid:16611 (a.out)
> > [ 380.525910] oom_reaper: unable to reap pid:6927 (a.out)
> > [ 408.397539] oom_reaper: unable to reap pid:6927 (a.out)
> > [ 435.702005] oom_reaper: unable to reap pid:7554 (a.out)
> > [ 466.269660] oom_reaper: unable to reap pid:7560 (a.out)
> > [ 495.621196] oom_reaper: unable to reap pid:7563 (a.out)
> > [ 534.279602] oom_reaper: reaped process 22038 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [ 536.271631] oom_reaper: unable to reap pid:6928 (a.out)
> > [ 600.285293] oom_reaper: unable to reap pid:7550 (a.out)
> > [ 664.684801] oom_reaper: unable to reap pid:7558 (a.out)
> > [ 743.188067] oom_reaper: reaped process 7562 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > [ 744.980868] oom_reaper: unable to reap pid:6928 (a.out)
> >
> > None of the reaped taks is reported as "unable to reap..." So what
> > actually gets fixed by this patch?
>
> This patch (which I sent as "[PATCH 2/2] mm,oom_reaper: Check for MMF_OOM_SKIP before complain.")
> makes the OOM reaper not to complain if MMF_OOM_SKIP was already set by exit_mmap().
Sigh. So can you send a patch with a meaningful changelog showing
the problem and explaining how it is fixed?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread.
2018-03-20 13:34 ` Michal Hocko
@ 2018-03-20 13:50 ` Tetsuo Handa
2018-03-20 14:10 ` Michal Hocko
0 siblings, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2018-03-20 13:50 UTC (permalink / raw)
To: mhocko; +Cc: linux-mm, rientjes
Michal Hocko wrote:
> On Tue 20-03-18 22:30:16, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Tue 20-03-18 21:52:33, Tetsuo Handa wrote:
> > > > Michal Hocko wrote:
> > > > > A single stack trace in the changelog would be sufficient IMHO.
> > > > > Appart from that. What do you expect users will do about this trace?
> > > > > Sure they will see a path which holds mmap_sem, we will see a bug report
> > > > > but we can hardly do anything about that. We simply cannot drop the lock
> > > > > from that path in 99% of situations. So _why_ do we want to add more
> > > > > information to the log?
> > > >
> > > > This case is blocked at i_mmap_lock_write().
> > >
> > > But why does i_mmap_lock_write matter for oom_reaping. We are not
> > > touching hugetlb mappings. dup_mmap holds mmap_sem for write which is
> > > the most probable source of the backoff.
> >
> > If i_mmap_lock_write can bail out upon SIGKILL, the OOM victim will be able to
> > release mmap_sem held for write, which helps the OOM reaper not to back off.
>
> There are so many other blocking calls (including allocations) in
> dup_mmap
Yes. But
> that I do not really think i_mmap_lock_write is the biggest
> problem. That will be likely the case for other mmap_sem write lockers.
i_mmap_lock_write() is one of the problems which we could afford fixing.
8 out of 11 "oom_reaper: unable to reap" messages are blocked at i_mmap_lock_write().
[ 226.608508] oom_reaper: unable to reap pid:9261 (a.out)
[ 226.611971] a.out D13056 9261 6927 0x00100084
[ 226.615879] Call Trace:
[ 226.617926] ? __schedule+0x25f/0x780
[ 226.620559] schedule+0x2d/0x80
[ 226.623356] rwsem_down_write_failed+0x2bb/0x440
[ 226.626426] ? rwsem_down_write_failed+0x55/0x440
[ 226.629458] ? anon_vma_fork+0x124/0x150
[ 226.632679] call_rwsem_down_write_failed+0x13/0x20
[ 226.635884] down_write+0x49/0x60
[ 226.638867] ? copy_process.part.41+0x12f2/0x1fe0
[ 226.642042] copy_process.part.41+0x12f2/0x1fe0
[ 226.645087] ? _do_fork+0xe6/0x560
[ 226.647991] _do_fork+0xe6/0x560
[ 226.650495] ? syscall_trace_enter+0x1a9/0x240
[ 226.653443] ? retint_user+0x18/0x18
[ 226.656601] ? page_fault+0x2f/0x50
[ 226.659159] ? trace_hardirqs_on_caller+0x11f/0x1b0
[ 226.662399] do_syscall_64+0x74/0x230
[ 226.664989] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 354.342331] oom_reaper: unable to reap pid:16611 (a.out)
[ 354.346707] a.out D13312 16611 6927 0x00100084
[ 354.351173] Call Trace:
[ 354.354164] ? __schedule+0x25f/0x780
[ 354.357633] schedule+0x2d/0x80
[ 354.360976] rwsem_down_write_failed+0x2bb/0x440
[ 354.364774] ? rwsem_down_write_failed+0x55/0x440
[ 354.368749] call_rwsem_down_write_failed+0x13/0x20
[ 354.372703] down_write+0x49/0x60
[ 354.376106] ? copy_process.part.41+0x12f2/0x1fe0
[ 354.379956] copy_process.part.41+0x12f2/0x1fe0
[ 354.383791] ? _do_fork+0xe6/0x560
[ 354.386967] _do_fork+0xe6/0x560
[ 354.390016] ? syscall_trace_enter+0x1a9/0x240
[ 354.393756] ? retint_user+0x18/0x18
[ 354.397296] ? page_fault+0x2f/0x50
[ 354.401081] ? trace_hardirqs_on_caller+0x11f/0x1b0
[ 354.404772] do_syscall_64+0x74/0x230
[ 354.407749] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 380.525910] oom_reaper: unable to reap pid:6927 (a.out)
[ 380.529274] a.out R running task 13544 6927 1 0x00100084
[ 380.533264] Call Trace:
[ 380.535433] ? __schedule+0x25f/0x780
[ 380.538052] ? mark_held_locks+0x60/0x80
[ 380.540775] schedule+0x2d/0x80
[ 380.543258] schedule_timeout+0x1a4/0x350
[ 380.546053] ? __next_timer_interrupt+0xd0/0xd0
[ 380.549033] msleep+0x25/0x30
[ 380.551458] shrink_inactive_list+0x5b7/0x690
[ 380.554408] ? __lock_acquire+0x1f1/0xfd0
[ 380.557228] ? find_held_lock+0x2d/0x90
[ 380.559959] shrink_node_memcg+0x340/0x770
[ 380.562776] ? __lock_acquire+0x246/0xfd0
[ 380.565532] ? mem_cgroup_iter+0x121/0x4f0
[ 380.568337] ? mem_cgroup_iter+0x121/0x4f0
[ 380.571203] shrink_node+0xd8/0x370
[ 380.573745] do_try_to_free_pages+0xe3/0x390
[ 380.576759] try_to_free_pages+0xc8/0x110
[ 380.579476] __alloc_pages_slowpath+0x28a/0x8d9
[ 380.582639] __alloc_pages_nodemask+0x21d/0x260
[ 380.585662] new_slab+0x558/0x760
[ 380.588188] ___slab_alloc+0x353/0x6f0
[ 380.590962] ? copy_process.part.41+0x121f/0x1fe0
[ 380.594008] ? find_held_lock+0x2d/0x90
[ 380.596750] ? copy_process.part.41+0x121f/0x1fe0
[ 380.599852] __slab_alloc+0x41/0x7a
[ 380.602469] ? copy_process.part.41+0x121f/0x1fe0
[ 380.605607] kmem_cache_alloc+0x1a6/0x1f0
[ 380.608528] copy_process.part.41+0x121f/0x1fe0
[ 380.611613] ? _do_fork+0xe6/0x560
[ 380.614299] _do_fork+0xe6/0x560
[ 380.616752] ? syscall_trace_enter+0x1a9/0x240
[ 380.619813] ? retint_user+0x18/0x18
[ 380.622588] ? page_fault+0x2f/0x50
[ 380.625184] ? trace_hardirqs_on_caller+0x11f/0x1b0
[ 380.628841] do_syscall_64+0x74/0x230
[ 380.631660] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 408.397539] oom_reaper: unable to reap pid:6927 (a.out)
[ 408.400862] a.out D13544 6927 1 0x00100084
[ 408.404262] Call Trace:
[ 408.406449] ? __schedule+0x25f/0x780
[ 408.409063] schedule+0x2d/0x80
[ 408.411440] rwsem_down_write_failed+0x2bb/0x440
[ 408.414544] ? rwsem_down_write_failed+0x55/0x440
[ 408.417684] call_rwsem_down_write_failed+0x13/0x20
[ 408.420866] down_write+0x49/0x60
[ 408.423442] ? copy_process.part.41+0x12f2/0x1fe0
[ 408.426521] copy_process.part.41+0x12f2/0x1fe0
[ 408.429587] ? _do_fork+0xe6/0x560
[ 408.432134] _do_fork+0xe6/0x560
[ 408.434601] ? syscall_trace_enter+0x1a9/0x240
[ 408.437729] ? retint_user+0x18/0x18
[ 408.440391] ? page_fault+0x2f/0x50
[ 408.442955] ? trace_hardirqs_on_caller+0x11f/0x1b0
[ 408.446850] do_syscall_64+0x74/0x230
[ 408.449523] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 435.702005] oom_reaper: unable to reap pid:7554 (a.out)
[ 435.706759] a.out D13328 7554 6928 0x00100084
[ 435.711417] Call Trace:
[ 435.714814] ? __schedule+0x25f/0x780
[ 435.718813] schedule+0x2d/0x80
[ 435.722308] rwsem_down_write_failed+0x2bb/0x440
[ 435.726661] ? rwsem_down_write_failed+0x55/0x440
[ 435.730881] ? anon_vma_fork+0x124/0x150
[ 435.734935] call_rwsem_down_write_failed+0x13/0x20
[ 435.739278] down_write+0x49/0x60
[ 435.743009] ? copy_process.part.41+0x12f2/0x1fe0
[ 435.747173] copy_process.part.41+0x12f2/0x1fe0
[ 435.751304] ? _do_fork+0xe6/0x560
[ 435.754924] _do_fork+0xe6/0x560
[ 435.758538] ? syscall_trace_enter+0x1a9/0x240
[ 435.762533] ? retint_user+0x18/0x18
[ 435.766115] ? page_fault+0x2f/0x50
[ 435.769646] ? trace_hardirqs_on_caller+0x11f/0x1b0
[ 435.773850] do_syscall_64+0x74/0x230
[ 435.777364] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 466.269660] oom_reaper: unable to reap pid:7560 (a.out)
[ 466.273267] a.out D13120 7560 6928 0x00100084
[ 466.276878] Call Trace:
[ 466.279146] ? __schedule+0x25f/0x780
[ 466.281945] schedule+0x2d/0x80
[ 466.284421] rwsem_down_write_failed+0x2bb/0x440
[ 466.287748] ? rwsem_down_write_failed+0x55/0x440
[ 466.291012] ? anon_vma_fork+0x124/0x150
[ 466.293990] call_rwsem_down_write_failed+0x13/0x20
[ 466.297344] down_write+0x49/0x60
[ 466.299925] ? copy_process.part.41+0x12f2/0x1fe0
[ 466.303310] copy_process.part.41+0x12f2/0x1fe0
[ 466.306504] ? _do_fork+0xe6/0x560
[ 466.309257] _do_fork+0xe6/0x560
[ 466.311911] ? syscall_trace_enter+0x1a9/0x240
[ 466.315112] ? retint_user+0x18/0x18
[ 466.317937] ? page_fault+0x2f/0x50
[ 466.320667] ? trace_hardirqs_on_caller+0x11f/0x1b0
[ 466.324043] do_syscall_64+0x74/0x230
[ 466.326999] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 495.621196] oom_reaper: unable to reap pid:7563 (a.out)
[ 495.624785] a.out D12992 7563 6928 0x00100084
[ 495.628398] Call Trace:
[ 495.630634] ? __schedule+0x25f/0x780
[ 495.633464] schedule+0x2d/0x80
[ 495.635952] rwsem_down_write_failed+0x2bb/0x440
[ 495.639169] ? rwsem_down_write_failed+0x55/0x440
[ 495.642432] ? anon_vma_fork+0x124/0x150
[ 495.645216] call_rwsem_down_write_failed+0x13/0x20
[ 495.648651] down_write+0x49/0x60
[ 495.651446] ? copy_process.part.41+0x12f2/0x1fe0
[ 495.654843] copy_process.part.41+0x12f2/0x1fe0
[ 495.658031] ? _do_fork+0xe6/0x560
[ 495.660718] _do_fork+0xe6/0x560
[ 495.663415] ? syscall_trace_enter+0x1a9/0x240
[ 495.666662] ? retint_user+0x18/0x18
[ 495.669369] ? page_fault+0x2f/0x50
[ 495.672162] ? trace_hardirqs_on_caller+0x11f/0x1b0
[ 495.675529] do_syscall_64+0x74/0x230
[ 495.678418] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 536.271631] oom_reaper: unable to reap pid:6928 (a.out)
[ 536.275197] a.out D12992 6928 1 0x00100084
[ 536.278972] Call Trace:
[ 536.281110] ? __schedule+0x25f/0x780
[ 536.283883] schedule+0x2d/0x80
[ 536.286593] rwsem_down_write_failed+0x2bb/0x440
[ 536.289821] ? rwsem_down_write_failed+0x55/0x440
[ 536.293113] ? anon_vma_fork+0x124/0x150
[ 536.296262] call_rwsem_down_write_failed+0x13/0x20
[ 536.299554] down_write+0x49/0x60
[ 536.302514] ? copy_process.part.41+0x12f2/0x1fe0
[ 536.305816] copy_process.part.41+0x12f2/0x1fe0
[ 536.309032] ? _do_fork+0xe6/0x560
[ 536.312206] _do_fork+0xe6/0x560
[ 536.314810] ? syscall_trace_enter+0x1a9/0x240
[ 536.318335] ? retint_user+0x18/0x18
[ 536.321064] ? page_fault+0x2f/0x50
[ 536.323757] ? trace_hardirqs_on_caller+0x11f/0x1b0
[ 536.327396] do_syscall_64+0x74/0x230
[ 536.330175] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 600.285293] oom_reaper: unable to reap pid:7550 (a.out)
[ 600.289112] a.out D13104 7550 6931 0x00100084
[ 600.292918] Call Trace:
[ 600.295506] ? __schedule+0x25f/0x780
[ 600.298614] ? __lock_acquire+0x246/0xfd0
[ 600.301879] schedule+0x2d/0x80
[ 600.304623] schedule_timeout+0x1fd/0x350
[ 600.307633] ? find_held_lock+0x2d/0x90
[ 600.310909] ? mark_held_locks+0x60/0x80
[ 600.314001] ? _raw_spin_unlock_irq+0x24/0x30
[ 600.317382] wait_for_completion+0xab/0x130
[ 600.320579] ? wake_up_q+0x70/0x70
[ 600.323419] flush_work+0x1bd/0x260
[ 600.326434] ? flush_work+0x174/0x260
[ 600.329332] ? destroy_worker+0x90/0x90
[ 600.332301] drain_all_pages+0x16d/0x1e0
[ 600.335464] __alloc_pages_slowpath+0x443/0x8d9
[ 600.338743] __alloc_pages_nodemask+0x21d/0x260
[ 600.342178] new_slab+0x558/0x760
[ 600.344861] ___slab_alloc+0x353/0x6f0
[ 600.347744] ? copy_process.part.41+0x121f/0x1fe0
[ 600.351213] ? find_held_lock+0x2d/0x90
[ 600.354119] ? copy_process.part.41+0x121f/0x1fe0
[ 600.357491] __slab_alloc+0x41/0x7a
[ 600.360206] ? copy_process.part.41+0x121f/0x1fe0
[ 600.363480] kmem_cache_alloc+0x1a6/0x1f0
[ 600.366579] copy_process.part.41+0x121f/0x1fe0
[ 600.369761] ? _do_fork+0xe6/0x560
[ 600.372382] _do_fork+0xe6/0x560
[ 600.375142] ? syscall_trace_enter+0x1a9/0x240
[ 600.378258] ? retint_user+0x18/0x18
[ 600.380978] ? page_fault+0x2f/0x50
[ 600.383814] ? trace_hardirqs_on_caller+0x11f/0x1b0
[ 600.387096] do_syscall_64+0x74/0x230
[ 600.389928] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 664.684801] oom_reaper: unable to reap pid:7558 (a.out)
[ 664.892292] a.out D13272 7558 6931 0x00100084
[ 664.895765] Call Trace:
[ 664.897574] ? __schedule+0x25f/0x780
[ 664.900099] schedule+0x2d/0x80
[ 664.902260] rwsem_down_write_failed+0x2bb/0x440
[ 664.905249] ? rwsem_down_write_failed+0x55/0x440
[ 664.908335] ? free_pgd_range+0x569/0x5e0
[ 664.911145] call_rwsem_down_write_failed+0x13/0x20
[ 664.914121] down_write+0x49/0x60
[ 664.916519] ? unlink_file_vma+0x28/0x50
[ 664.919255] unlink_file_vma+0x28/0x50
[ 664.922234] free_pgtables+0x36/0x100
[ 664.924797] exit_mmap+0xbb/0x180
[ 664.927220] mmput+0x50/0x110
[ 664.929504] copy_process.part.41+0xb61/0x1fe0
[ 664.932448] ? _do_fork+0xe6/0x560
[ 664.934902] ? _do_fork+0xe6/0x560
[ 664.937361] _do_fork+0xe6/0x560
[ 664.939742] ? syscall_trace_enter+0x1a9/0x240
[ 664.942693] ? retint_user+0x18/0x18
[ 664.945309] ? page_fault+0x2f/0x50
[ 664.947896] ? trace_hardirqs_on_caller+0x11f/0x1b0
[ 664.951075] do_syscall_64+0x74/0x230
[ 664.953747] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 744.980868] oom_reaper: unable to reap pid:6928 (a.out)
[ 744.984653] a.out D12992 6928 1 0x00100084
[ 744.988278] Call Trace:
[ 744.990524] ? __schedule+0x25f/0x780
[ 744.993435] schedule+0x2d/0x80
[ 744.995877] rwsem_down_write_failed+0x2bb/0x440
[ 744.999272] ? rwsem_down_write_failed+0x55/0x440
[ 745.002759] ? anon_vma_fork+0x124/0x150
[ 745.005789] call_rwsem_down_write_failed+0x13/0x20
[ 745.009324] down_write+0x49/0x60
[ 745.012134] ? copy_process.part.41+0x12f2/0x1fe0
[ 745.015574] copy_process.part.41+0x12f2/0x1fe0
[ 745.018845] ? _do_fork+0xe6/0x560
[ 745.021698] _do_fork+0xe6/0x560
[ 745.024275] ? syscall_trace_enter+0x1a9/0x240
[ 745.027443] ? retint_user+0x18/0x18
[ 745.030281] ? page_fault+0x2f/0x50
[ 745.033138] ? trace_hardirqs_on_caller+0x11f/0x1b0
[ 745.036703] do_syscall_64+0x74/0x230
[ 745.039476] entry_SYSCALL_64_after_hwframe+0x42/0xb7
>
> Really I am not sure dumping more information is beneficial here.
Converting to use killable where we can afford is beneficial.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread.
2018-03-20 13:50 ` Tetsuo Handa
@ 2018-03-20 14:10 ` Michal Hocko
2018-03-20 14:20 ` Tetsuo Handa
0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2018-03-20 14:10 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: linux-mm, rientjes
On Tue 20-03-18 22:50:21, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Tue 20-03-18 22:30:16, Tetsuo Handa wrote:
> > > Michal Hocko wrote:
> > > > On Tue 20-03-18 21:52:33, Tetsuo Handa wrote:
> > > > > Michal Hocko wrote:
> > > > > > A single stack trace in the changelog would be sufficient IMHO.
> > > > > > Appart from that. What do you expect users will do about this trace?
> > > > > > Sure they will see a path which holds mmap_sem, we will see a bug report
> > > > > > but we can hardly do anything about that. We simply cannot drop the lock
> > > > > > from that path in 99% of situations. So _why_ do we want to add more
> > > > > > information to the log?
> > > > >
> > > > > This case is blocked at i_mmap_lock_write().
> > > >
> > > > But why does i_mmap_lock_write matter for oom_reaping. We are not
> > > > touching hugetlb mappings. dup_mmap holds mmap_sem for write which is
> > > > the most probable source of the backoff.
> > >
> > > If i_mmap_lock_write can bail out upon SIGKILL, the OOM victim will be able to
> > > release mmap_sem held for write, which helps the OOM reaper not to back off.
> >
> > There are so many other blocking calls (including allocations) in
> > dup_mmap
>
> Yes. But
>
> > that I do not really think i_mmap_lock_write is the biggest
> > problem. That will be likely the case for other mmap_sem write lockers.
>
> i_mmap_lock_write() is one of the problems which we could afford fixing.
> 8 out of 11 "oom_reaper: unable to reap" messages are blocked at i_mmap_lock_write().
>
[...]
> > Really I am not sure dumping more information is beneficial here.
>
> Converting to use killable where we can afford is beneficial.
I am no questioning that. I am questioning the additional information
because we won't be able to do anything about mmap_sem holder most of
the time. Because they tend block on allocations...
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread.
2018-03-20 14:10 ` Michal Hocko
@ 2018-03-20 14:20 ` Tetsuo Handa
2018-03-20 20:46 ` David Rientjes
0 siblings, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2018-03-20 14:20 UTC (permalink / raw)
To: mhocko; +Cc: linux-mm, rientjes
Michal Hocko wrote:
> On Tue 20-03-18 22:50:21, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Tue 20-03-18 22:30:16, Tetsuo Handa wrote:
> > > > Michal Hocko wrote:
> > > > > On Tue 20-03-18 21:52:33, Tetsuo Handa wrote:
> > > > > > Michal Hocko wrote:
> > > > > > > A single stack trace in the changelog would be sufficient IMHO.
> > > > > > > Appart from that. What do you expect users will do about this trace?
> > > > > > > Sure they will see a path which holds mmap_sem, we will see a bug report
> > > > > > > but we can hardly do anything about that. We simply cannot drop the lock
> > > > > > > from that path in 99% of situations. So _why_ do we want to add more
> > > > > > > information to the log?
> > > > > >
> > > > > > This case is blocked at i_mmap_lock_write().
> > > > >
> > > > > But why does i_mmap_lock_write matter for oom_reaping. We are not
> > > > > touching hugetlb mappings. dup_mmap holds mmap_sem for write which is
> > > > > the most probable source of the backoff.
> > > >
> > > > If i_mmap_lock_write can bail out upon SIGKILL, the OOM victim will be able to
> > > > release mmap_sem held for write, which helps the OOM reaper not to back off.
> > >
> > > There are so many other blocking calls (including allocations) in
> > > dup_mmap
> >
> > Yes. But
> >
> > > that I do not really think i_mmap_lock_write is the biggest
> > > problem. That will be likely the case for other mmap_sem write lockers.
> >
> > i_mmap_lock_write() is one of the problems which we could afford fixing.
> > 8 out of 11 "oom_reaper: unable to reap" messages are blocked at i_mmap_lock_write().
> >
> [...]
> > > Really I am not sure dumping more information is beneficial here.
> >
> > Converting to use killable where we can afford is beneficial.
>
> I am no questioning that. I am questioning the additional information
> because we won't be able to do anything about mmap_sem holder most of
> the time. Because they tend block on allocations...
serial-20180320.txt.xz is saying that they tend to block on i_mmap_lock_write() rather
than memory allocations. Making memory allocations killable will need a lot of work.
But I think that making frequently hitting down_write() killable won't need so much work.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread.
2018-03-20 14:20 ` Tetsuo Handa
@ 2018-03-20 20:46 ` David Rientjes
0 siblings, 0 replies; 20+ messages in thread
From: David Rientjes @ 2018-03-20 20:46 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: mhocko, linux-mm
On Tue, 20 Mar 2018, Tetsuo Handa wrote:
> > I am no questioning that. I am questioning the additional information
> > because we won't be able to do anything about mmap_sem holder most of
> > the time. Because they tend block on allocations...
>
> serial-20180320.txt.xz is saying that they tend to block on i_mmap_lock_write() rather
> than memory allocations. Making memory allocations killable will need a lot of work.
> But I think that making frequently hitting down_write() killable won't need so much work.
>
I have available to me more than 28,222,058 occurrences of successful oom
reaping and 13,018 occurrences of failing to grab mm->mmap_sem.
The number of failures is low on production workloads, so I don't see an
issue with emitting a stack trace in these instances if it can help
improve things. But for my 0.04% failure rate, I can't say that I would
look into them much unless it results in the system livelocking. Unless
there's a compelling reason why this is shouldn't be done (too much email
sent to linux-mm as a result? :), I say just go ahead and add the darn
stack trace.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] mm,oom_reaper: Check for MMF_OOM_SKIP before complain.
2018-03-20 12:47 ` [PATCH 2/2] mm,oom_reaper: Check for MMF_OOM_SKIP before complain Tetsuo Handa
@ 2018-03-20 20:49 ` David Rientjes
2018-03-22 10:46 ` [PATCH] " Tetsuo Handa
0 siblings, 1 reply; 20+ messages in thread
From: David Rientjes @ 2018-03-20 20:49 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: mhocko, linux-mm
On Tue, 20 Mar 2018, Tetsuo Handa wrote:
> I got "oom_reaper: unable to reap pid:" messages when the victim thread
> was blocked inside free_pgtables() (which occurred after returning from
> unmap_vmas() and setting MMF_OOM_SKIP). We don't need to complain when
> exit_mmap() already set MMF_OOM_SKIP.
>
> [ 663.593821] Killed process 7558 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
> [ 664.684801] oom_reaper: unable to reap pid:7558 (a.out)
> [ 664.892292] a.out D13272 7558 6931 0x00100084
> [ 664.895765] Call Trace:
> [ 664.897574] ? __schedule+0x25f/0x780
> [ 664.900099] schedule+0x2d/0x80
> [ 664.902260] rwsem_down_write_failed+0x2bb/0x440
> [ 664.905249] ? rwsem_down_write_failed+0x55/0x440
> [ 664.908335] ? free_pgd_range+0x569/0x5e0
> [ 664.911145] call_rwsem_down_write_failed+0x13/0x20
> [ 664.914121] down_write+0x49/0x60
> [ 664.916519] ? unlink_file_vma+0x28/0x50
> [ 664.919255] unlink_file_vma+0x28/0x50
> [ 664.922234] free_pgtables+0x36/0x100
> [ 664.924797] exit_mmap+0xbb/0x180
> [ 664.927220] mmput+0x50/0x110
> [ 664.929504] copy_process.part.41+0xb61/0x1fe0
> [ 664.932448] ? _do_fork+0xe6/0x560
> [ 664.934902] ? _do_fork+0xe6/0x560
> [ 664.937361] _do_fork+0xe6/0x560
> [ 664.939742] ? syscall_trace_enter+0x1a9/0x240
> [ 664.942693] ? retint_user+0x18/0x18
> [ 664.945309] ? page_fault+0x2f/0x50
> [ 664.947896] ? trace_hardirqs_on_caller+0x11f/0x1b0
> [ 664.951075] do_syscall_64+0x74/0x230
> [ 664.953747] entry_SYSCALL_64_after_hwframe+0x42/0xb7
>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Rientjes <rientjes@google.com>
Acked-by: David Rientjes <rientjes@google.com>
But you'll need to send it to akpm.
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] mm,oom_reaper: Check for MMF_OOM_SKIP before complain.
2018-03-20 20:49 ` David Rientjes
@ 2018-03-22 10:46 ` Tetsuo Handa
2018-03-22 11:59 ` Michal Hocko
0 siblings, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2018-03-22 10:46 UTC (permalink / raw)
To: akpm; +Cc: rientjes, mhocko, linux-mm
>From b141cdbe0db852549c94d5b1e6a9967ca69d59fd Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Thu, 22 Mar 2018 19:44:12 +0900
Subject: [PATCH] mm,oom_reaper: Check for MMF_OOM_SKIP before complain.
I got "oom_reaper: unable to reap pid:" messages when the victim thread
was blocked inside free_pgtables() (which occurred after returning from
unmap_vmas() and setting MMF_OOM_SKIP). We don't need to complain when
exit_mmap() already set MMF_OOM_SKIP.
[ 663.593821] Killed process 7558 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
[ 664.684801] oom_reaper: unable to reap pid:7558 (a.out)
[ 664.892292] a.out D13272 7558 6931 0x00100084
[ 664.895765] Call Trace:
[ 664.897574] ? __schedule+0x25f/0x780
[ 664.900099] schedule+0x2d/0x80
[ 664.902260] rwsem_down_write_failed+0x2bb/0x440
[ 664.905249] ? rwsem_down_write_failed+0x55/0x440
[ 664.908335] ? free_pgd_range+0x569/0x5e0
[ 664.911145] call_rwsem_down_write_failed+0x13/0x20
[ 664.914121] down_write+0x49/0x60
[ 664.916519] ? unlink_file_vma+0x28/0x50
[ 664.919255] unlink_file_vma+0x28/0x50
[ 664.922234] free_pgtables+0x36/0x100
[ 664.924797] exit_mmap+0xbb/0x180
[ 664.927220] mmput+0x50/0x110
[ 664.929504] copy_process.part.41+0xb61/0x1fe0
[ 664.932448] ? _do_fork+0xe6/0x560
[ 664.934902] ? _do_fork+0xe6/0x560
[ 664.937361] _do_fork+0xe6/0x560
[ 664.939742] ? syscall_trace_enter+0x1a9/0x240
[ 664.942693] ? retint_user+0x18/0x18
[ 664.945309] ? page_fault+0x2f/0x50
[ 664.947896] ? trace_hardirqs_on_caller+0x11f/0x1b0
[ 664.951075] do_syscall_64+0x74/0x230
[ 664.953747] entry_SYSCALL_64_after_hwframe+0x42/0xb7
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.com>
---
mm/oom_kill.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 5336985..dfd3705 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -590,7 +590,8 @@ static void oom_reap_task(struct task_struct *tsk)
while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm))
schedule_timeout_idle(HZ/10);
- if (attempts <= MAX_OOM_REAP_RETRIES)
+ if (attempts <= MAX_OOM_REAP_RETRIES ||
+ test_bit(MMF_OOM_SKIP, &mm->flags))
goto done;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] mm,oom_reaper: Check for MMF_OOM_SKIP before complain.
2018-03-22 10:46 ` [PATCH] " Tetsuo Handa
@ 2018-03-22 11:59 ` Michal Hocko
0 siblings, 0 replies; 20+ messages in thread
From: Michal Hocko @ 2018-03-22 11:59 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: akpm, rientjes, linux-mm
On Thu 22-03-18 19:46:36, Tetsuo Handa wrote:
> >From b141cdbe0db852549c94d5b1e6a9967ca69d59fd Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Thu, 22 Mar 2018 19:44:12 +0900
> Subject: [PATCH] mm,oom_reaper: Check for MMF_OOM_SKIP before complain.
>
> I got "oom_reaper: unable to reap pid:" messages when the victim thread
> was blocked inside free_pgtables() (which occurred after returning from
> unmap_vmas() and setting MMF_OOM_SKIP). We don't need to complain when
> exit_mmap() already set MMF_OOM_SKIP.
... because we do not report a failure when MMF_OOM_SKIP is set already
(see __oom_reap_task_mm).
> [ 663.593821] Killed process 7558 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
> [ 664.684801] oom_reaper: unable to reap pid:7558 (a.out)
> [ 664.892292] a.out D13272 7558 6931 0x00100084
> [ 664.895765] Call Trace:
> [ 664.897574] ? __schedule+0x25f/0x780
> [ 664.900099] schedule+0x2d/0x80
> [ 664.902260] rwsem_down_write_failed+0x2bb/0x440
> [ 664.905249] ? rwsem_down_write_failed+0x55/0x440
> [ 664.908335] ? free_pgd_range+0x569/0x5e0
> [ 664.911145] call_rwsem_down_write_failed+0x13/0x20
> [ 664.914121] down_write+0x49/0x60
> [ 664.916519] ? unlink_file_vma+0x28/0x50
> [ 664.919255] unlink_file_vma+0x28/0x50
> [ 664.922234] free_pgtables+0x36/0x100
> [ 664.924797] exit_mmap+0xbb/0x180
> [ 664.927220] mmput+0x50/0x110
> [ 664.929504] copy_process.part.41+0xb61/0x1fe0
> [ 664.932448] ? _do_fork+0xe6/0x560
> [ 664.934902] ? _do_fork+0xe6/0x560
> [ 664.937361] _do_fork+0xe6/0x560
> [ 664.939742] ? syscall_trace_enter+0x1a9/0x240
> [ 664.942693] ? retint_user+0x18/0x18
> [ 664.945309] ? page_fault+0x2f/0x50
> [ 664.947896] ? trace_hardirqs_on_caller+0x11f/0x1b0
> [ 664.951075] do_syscall_64+0x74/0x230
> [ 664.953747] entry_SYSCALL_64_after_hwframe+0x42/0xb7
>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Acked-by: David Rientjes <rientjes@google.com>
> Cc: Michal Hocko <mhocko@suse.com>
OK, fair enough. I think it is on borderline of usefulness but it
doesn't seem harmful. I can already see somebody asking why the
MMF_OOM_SKIP really requires mmap_sem in __oom_reap_task_mm while it
doesn't in oom_reap_task but what ever.
Acked-by: Michal Hocko <mhocko@suse.com>
> ---
> mm/oom_kill.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 5336985..dfd3705 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -590,7 +590,8 @@ static void oom_reap_task(struct task_struct *tsk)
> while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm))
> schedule_timeout_idle(HZ/10);
>
> - if (attempts <= MAX_OOM_REAP_RETRIES)
> + if (attempts <= MAX_OOM_REAP_RETRIES ||
> + test_bit(MMF_OOM_SKIP, &mm->flags))
> goto done;
>
>
> --
> 1.8.3.1
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2018-03-22 11:59 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-20 11:57 [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread Tetsuo Handa
2018-03-20 11:57 ` [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt Tetsuo Handa
2018-03-20 12:12 ` Michal Hocko
2018-03-20 12:37 ` Tetsuo Handa
2018-03-20 12:47 ` [PATCH 2/2] mm,oom_reaper: Check for MMF_OOM_SKIP before complain Tetsuo Handa
2018-03-20 20:49 ` David Rientjes
2018-03-22 10:46 ` [PATCH] " Tetsuo Handa
2018-03-22 11:59 ` Michal Hocko
2018-03-20 13:24 ` [PATCH 2/2] mm,oom_reaper: Correct MAX_OOM_REAP_RETRIES'th attempt Michal Hocko
2018-03-20 13:37 ` Tetsuo Handa
2018-03-20 13:44 ` Michal Hocko
2018-03-20 12:28 ` [PATCH 1/2] mm,oom_reaper: Show trace of unable to reap victim thread Michal Hocko
2018-03-20 12:52 ` Tetsuo Handa
2018-03-20 13:19 ` Michal Hocko
2018-03-20 13:30 ` Tetsuo Handa
2018-03-20 13:34 ` Michal Hocko
2018-03-20 13:50 ` Tetsuo Handa
2018-03-20 14:10 ` Michal Hocko
2018-03-20 14:20 ` Tetsuo Handa
2018-03-20 20:46 ` David Rientjes
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.