* [PATCH] mm,oom_reaper: don't call mmput_async() without atomic_inc_not_zero() @ 2016-05-28 8:16 Tetsuo Handa 2016-05-30 6:52 ` Michal Hocko 2016-06-01 22:53 ` Andrew Morton 0 siblings, 2 replies; 8+ messages in thread From: Tetsuo Handa @ 2016-05-28 8:16 UTC (permalink / raw) To: akpm; +Cc: linux-mm, Tetsuo Handa, Michal Hocko, Arnd Bergmann Commit e2fe14564d3316d1 ("oom_reaper: close race with exiting task") reduced frequency of needlessly selecting next OOM victim, but was calling mmput_async() when atomic_inc_not_zero() failed. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Michal Hocko <mhocko@suse.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Andrew Morton <akpm@linux-foundation.org> --- mm/oom_kill.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index dfb1ab6..0d781b8 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -478,6 +478,7 @@ static bool __oom_reap_task(struct task_struct *tsk) mm = p->mm; if (!atomic_inc_not_zero(&mm->mm_users)) { task_unlock(p); + mm = NULL; goto unlock_oom; } -- 1.8.3.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] mm,oom_reaper: don't call mmput_async() without atomic_inc_not_zero() 2016-05-28 8:16 [PATCH] mm,oom_reaper: don't call mmput_async() without atomic_inc_not_zero() Tetsuo Handa @ 2016-05-30 6:52 ` Michal Hocko 2016-06-01 22:53 ` Andrew Morton 1 sibling, 0 replies; 8+ messages in thread From: Michal Hocko @ 2016-05-30 6:52 UTC (permalink / raw) To: Tetsuo Handa; +Cc: akpm, linux-mm, Arnd Bergmann On Sat 28-05-16 17:16:05, Tetsuo Handa wrote: > Commit e2fe14564d3316d1 ("oom_reaper: close race with exiting task") > reduced frequency of needlessly selecting next OOM victim, but was > calling mmput_async() when atomic_inc_not_zero() failed. > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Thanks! > --- > mm/oom_kill.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index dfb1ab6..0d781b8 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -478,6 +478,7 @@ static bool __oom_reap_task(struct task_struct *tsk) > mm = p->mm; > if (!atomic_inc_not_zero(&mm->mm_users)) { > task_unlock(p); > + mm = NULL; > goto unlock_oom; > } > > -- > 1.8.3.1 > -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm,oom_reaper: don't call mmput_async() without atomic_inc_not_zero() 2016-05-28 8:16 [PATCH] mm,oom_reaper: don't call mmput_async() without atomic_inc_not_zero() Tetsuo Handa 2016-05-30 6:52 ` Michal Hocko @ 2016-06-01 22:53 ` Andrew Morton 2016-06-02 6:48 ` Michal Hocko 1 sibling, 1 reply; 8+ messages in thread From: Andrew Morton @ 2016-06-01 22:53 UTC (permalink / raw) To: Tetsuo Handa; +Cc: linux-mm, Michal Hocko, Arnd Bergmann On Sat, 28 May 2016 17:16:05 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > Commit e2fe14564d3316d1 ("oom_reaper: close race with exiting task") > reduced frequency of needlessly selecting next OOM victim, but was > calling mmput_async() when atomic_inc_not_zero() failed. Changelog fail. > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -478,6 +478,7 @@ static bool __oom_reap_task(struct task_struct *tsk) > mm = p->mm; > if (!atomic_inc_not_zero(&mm->mm_users)) { > task_unlock(p); > + mm = NULL; > goto unlock_oom; > } This looks like a pretty fatal bug. I assume the result of hitting that race will be a kernel crash, yes? Is it even possible to hit that race? find_lock_task_mm() takes some care to prevent a NULL ->mm. But I guess a concurrent mmput() doesn't require task_lock(). Kinda makes me wonder what's the point in even having find_lock_task_mm() if its guarantee on ->mm is useless... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm,oom_reaper: don't call mmput_async() without atomic_inc_not_zero() 2016-06-01 22:53 ` Andrew Morton @ 2016-06-02 6:48 ` Michal Hocko 2016-06-02 12:20 ` Tetsuo Handa 0 siblings, 1 reply; 8+ messages in thread From: Michal Hocko @ 2016-06-02 6:48 UTC (permalink / raw) To: Andrew Morton; +Cc: Tetsuo Handa, linux-mm, Arnd Bergmann On Wed 01-06-16 15:53:13, Andrew Morton wrote: > On Sat, 28 May 2016 17:16:05 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > > > Commit e2fe14564d3316d1 ("oom_reaper: close race with exiting task") > > reduced frequency of needlessly selecting next OOM victim, but was > > calling mmput_async() when atomic_inc_not_zero() failed. > > Changelog fail. > > > --- a/mm/oom_kill.c > > +++ b/mm/oom_kill.c > > @@ -478,6 +478,7 @@ static bool __oom_reap_task(struct task_struct *tsk) > > mm = p->mm; > > if (!atomic_inc_not_zero(&mm->mm_users)) { > > task_unlock(p); > > + mm = NULL; > > goto unlock_oom; > > } > > This looks like a pretty fatal bug. I assume the result of hitting > that race will be a kernel crash, yes? Yes it is a nasty bug. It was (re)introduced by the final touch to the goto paths. And yes it can cause a crash. > Is it even possible to hit that race? It is, we can have a concurrent mmput followed by mmdrop. > find_lock_task_mm() takes some > care to prevent a NULL ->mm. But I guess a concurrent mmput() doesn't > require task_lock(). Kinda makes me wonder what's the point in even > having find_lock_task_mm() if its guarantee on ->mm is useless... find_lock_task_mm makes sure that the mm stays non-NULL while we hold the lock. We have to do all the necessary pinning while holding it. atomic_inc_not_zero will guarantee we are not racing with the finall mmput. Does that make more sense now? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm,oom_reaper: don't call mmput_async() without atomic_inc_not_zero() 2016-06-02 6:48 ` Michal Hocko @ 2016-06-02 12:20 ` Tetsuo Handa 2016-06-02 13:11 ` Michal Hocko 2016-06-02 13:49 ` Michal Hocko 0 siblings, 2 replies; 8+ messages in thread From: Tetsuo Handa @ 2016-06-02 12:20 UTC (permalink / raw) To: mhocko, akpm; +Cc: linux-mm, arnd Michal Hocko wrote: > On Wed 01-06-16 15:53:13, Andrew Morton wrote: > > On Sat, 28 May 2016 17:16:05 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > > > > > Commit e2fe14564d3316d1 ("oom_reaper: close race with exiting task") > > > reduced frequency of needlessly selecting next OOM victim, but was > > > calling mmput_async() when atomic_inc_not_zero() failed. > > > > Changelog fail. > > > > > --- a/mm/oom_kill.c > > > +++ b/mm/oom_kill.c > > > @@ -478,6 +478,7 @@ static bool __oom_reap_task(struct task_struct *tsk) > > > mm = p->mm; > > > if (!atomic_inc_not_zero(&mm->mm_users)) { > > > task_unlock(p); > > > + mm = NULL; > > > goto unlock_oom; > > > } > > > > This looks like a pretty fatal bug. I assume the result of hitting > > that race will be a kernel crash, yes? > > Yes it is a nasty bug. It was (re)introduced by the final touch to the > goto paths. And yes it can cause a crash. Not always a kernel crash. Calling mmput_async() when atomic_inc_not_zero(&mm->mm_users) failed (i.e. mm->mm_users == 0) will make mm->mm_users == -1 by if (atomic_dec_and_test(&mm->mm_users)) { in mmput_async(). Unless mm is released before this atomic_dec_and_test() is executed, it will be merely a counter underflow. If mm is released and reallocated for new instance before this atomic_dec_and_test() is executed, it might cause a kernel crash. But > > > Is it even possible to hit that race? > > It is, we can have a concurrent mmput followed by mmdrop. > > > find_lock_task_mm() takes some > > care to prevent a NULL ->mm. But I guess a concurrent mmput() doesn't > > require task_lock(). Kinda makes me wonder what's the point in even > > having find_lock_task_mm() if its guarantee on ->mm is useless... > > find_lock_task_mm makes sure that the mm stays non-NULL while we hold > the lock. We have to do all the necessary pinning while holding it. > atomic_inc_not_zero will guarantee we are not racing with the finall > mmput. > > Does that make more sense now? what Andrew wanted to confirm is "how can it be possible that mm->mm_users < 1 when there is a tsk with tsk->mm != NULL", isn't it? Indeed, find_lock_task_mm() returns a tsk where tsk->mm != NULL with tsk->alloc_lock held. Therefore, tsk->mm != NULL implies mm->mm_users > 0 until we release tsk->alloc_lock , and we can do p = find_lock_task_mm(tsk); if (!p) goto unlock_oom; mm = p->mm; - if (!atomic_inc_not_zero(&mm->mm_users)) { - task_unlock(p); - goto unlock_oom; - } + atomic_inc(&mm->mm_users); task_unlock(p); in __oom_reap_task() (unless I'm missing something). Also, dmesg.xz in the crash report http://lkml.kernel.org/r/20160601080209.GA7190@yexl-desktop includes an interesting race. ---------------------------------------- [ 0.000000] Initializing CPU#0 (...snipped...) [ 82.643609] seq invoked oom-killer: gfp_mask=0x24200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 [ 82.644682] CPU: 0 PID: 3946 Comm: seq Not tainted 4.6.0-10870-gdf1e2f5 #1 (...snipped...) [ 82.679858] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [ 82.680805] [ 429] 0 429 171 125 3 0 0 0 ubusd [ 82.681742] [ 430] 0 430 156 117 3 0 0 0 askfirst [ 82.682903] [ 884] 0 884 194 143 3 0 0 0 logd [ 82.683866] [ 916] 0 916 248 202 3 0 0 0 netifd [ 82.684820] [ 1029] 0 1029 277 213 3 0 0 0 S95done [ 82.685977] [ 1030] 0 1030 269 185 3 0 0 0 sh [ 82.686903] [ 1035] 0 1035 268 195 3 0 0 0 01-cpu-hotplug [ 82.687926] [ 1040] 0 1040 13637 509 4 0 0 0 trinity [ 82.689100] [ 1041] 0 1041 266 169 3 0 0 0 sleep [ 82.690055] [ 3533] 0 3533 13637 285 3 0 0 0 trinity-watchdo [ 82.691082] [ 3534] 1 3534 13986 633 3 0 0 0 trinity-main [ 82.692316] [ 3914] 1 3914 13966 7054 14 0 0 0 trinity-c0 [ 82.693322] [ 3946] 0 3946 145 30 3 0 0 0 seq [ 82.694232] Out of memory: Kill process 3914 (trinity-c0) score 167 or sacrifice child [ 82.695110] Killed process 3914 (trinity-c0) total-vm:55864kB, anon-rss:1512kB, file-rss:1088kB, shmem-rss:25616kB [ 82.706724] oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:26488kB [ 82.715540] oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:26900kB [ 82.717662] oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:26900kB [ 82.725804] oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:27296kB [ 82.739091] oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:28148kB [ 82.741088] 01-cpu-hotplug invoked oom-killer: gfp_mask=0x24200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 [ 82.743238] CPU: 0 PID: 3947 Comm: 01-cpu-hotplug Not tainted 4.6.0-10870-gdf1e2f5 #1 (...snipped...) [ 82.788986] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [ 82.790484] [ 429] 0 429 171 125 3 0 0 0 ubusd [ 82.792271] [ 430] 0 430 156 117 3 0 0 0 askfirst [ 82.793809] [ 884] 0 884 194 143 3 0 0 0 logd [ 82.795588] [ 916] 0 916 248 202 3 0 0 0 netifd [ 82.797099] [ 1029] 0 1029 277 213 3 0 0 0 S95done [ 82.798894] [ 1030] 0 1030 269 185 3 0 0 0 sh [ 82.800286] [ 1035] 0 1035 268 195 3 0 0 0 01-cpu-hotplug [ 82.801852] [ 1040] 0 1040 13637 509 4 0 0 0 trinity [ 82.803458] [ 1041] 0 1041 266 169 3 0 0 0 sleep [ 82.804943] [ 3533] 0 3533 13637 285 3 0 0 0 trinity-watchdo [ 82.806800] [ 3534] 1 3534 13986 633 3 0 0 0 trinity-main [ 82.808388] [ 3914] 1 3914 13966 7038 14 0 0 0 trinity-c0 [ 82.809970] [ 3947] 0 3947 268 36 3 0 0 0 01-cpu-hotplug [ 82.811002] Out of memory: Kill process 3534 (trinity-main) score 15 or sacrifice child [ 82.811891] Killed process 3534 (trinity-main) total-vm:55944kB, anon-rss:1440kB, file-rss:1024kB, shmem-rss:68kB [ 82.815896] BUG: unable to handle kernel NULL pointer dereference at 00000025 [ 82.816733] IP: [<81e30134>] mmput_async+0x9/0x6b [ 82.817281] *pde = 00000000 [ 82.817628] Oops: 0002 [#1] PREEMPT DEBUG_PAGEALLOC [ 82.818169] CPU: 0 PID: 13 Comm: oom_reaper Not tainted 4.6.0-10870-gdf1e2f5 #1 [ 82.818973] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 82.819867] task: 819a2340 ti: 819a4000 task.ti: 819a4000 [ 82.820419] EIP: 0060:[<81e30134>] EFLAGS: 00010246 CPU: 0 [ 82.820988] EIP is at mmput_async+0x9/0x6b [ 82.821413] EAX: 00000001 EBX: 00000001 ECX: 00000000 EDX: 00000000 [ 82.822040] ESI: 00000000 EDI: 819a5e9c EBP: 819a5e7c ESP: 819a5e78 [ 82.822683] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 [ 82.823226] CR0: 80050033 CR2: 00000025 CR3: 00740000 CR4: 00000690 [ 82.823864] DR0: 6cd78000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 82.824511] DR6: ffff0ff0 DR7: 00000600 ---------------------------------------- The crash itself seems to be caused by use of uninitialized mm. (Wow! The compiler failed to warn it.) ---------------------------------------- static bool __oom_reap_task(struct task_struct *tsk) { struct mmu_gather tlb; struct vm_area_struct *vma; struct mm_struct *mm; /***** Not initialized as of df1e2f5 *****/ struct task_struct *p; struct zap_details details = {.check_swap_entries = true, .ignore_dirty = true}; bool ret = true; /* * We have to make sure to not race with the victim exit path * and cause premature new oom victim selection: * __oom_reap_task exit_mm * atomic_inc_not_zero * mmput * atomic_dec_and_test * exit_oom_victim * [...] * out_of_memory * select_bad_process * # no TIF_MEMDIE task select new victim * unmap_page_range # frees some memory */ mutex_lock(&oom_lock); /* * Make sure we find the associated mm_struct even when the particular * thread has already terminated and cleared its mm. * We might have race with exit path so consider our work done if there * is no mm. */ p = find_lock_task_mm(tsk); /***** Seems that p == NULL here. *****/ if (!p) goto unlock_oom; mm = p->mm; if (!atomic_inc_not_zero(&mm->mm_users)) { task_unlock(p); goto unlock_oom; } (...snipped...) unlock_oom: mutex_unlock(&oom_lock); /* * Drop our reference but make sure the mmput slow path is called from a * different context because we shouldn't risk we get stuck there and * put the oom_reaper out of the way. */ mmput_async(mm); /***** Passed uninitialized mm which had a value 0x00000001 by chance. *****/ return ret; } ---------------------------------------- The consecutive oom_reaper message on the same thread ---------- [ 82.706724] oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:26488kB [ 82.715540] oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:26900kB [ 82.717662] oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:26900kB [ 82.725804] oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:27296kB [ 82.739091] oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:28148kB ---------- suggests that it repeated race that trinity-c0 called out_of_memory() and hit the shortcut if (current->mm && (fatal_signal_pending(current) || task_will_free_mem(current))) { mark_oom_victim(current); try_oom_reaper(current); return true; } and got TIF_MEMDIE and woke up the OOM reaper. But the OOM reaper started oom_reap_task() and cleared TIF_MEMDIE from trinity-c0 BEFORE trinity-c0 tries to allocate using ALLOC_NO_WATERMARKS via TIF_MEMDIE. As a result, trinity-c0 was unable to use ALLOC_NO_WATERMARKS and had to call out_of_memory() again. And again hit the shortcut and got TIF_MEMDIE and woke up the OOM reaper, the OOM reaper cleared TIF_MEMDIE. So, this set TIF_MEMDIE followed by clear TIF_MEMDIE repetition lasted for several times. Maybe we should not try to clear TIF_MEMDIE from the OOM reaper. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm,oom_reaper: don't call mmput_async() without atomic_inc_not_zero() 2016-06-02 12:20 ` Tetsuo Handa @ 2016-06-02 13:11 ` Michal Hocko 2016-06-03 6:23 ` Michal Hocko 2016-06-02 13:49 ` Michal Hocko 1 sibling, 1 reply; 8+ messages in thread From: Michal Hocko @ 2016-06-02 13:11 UTC (permalink / raw) To: Tetsuo Handa; +Cc: akpm, linux-mm, arnd On Thu 02-06-16 21:20:03, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Wed 01-06-16 15:53:13, Andrew Morton wrote: [...] > > > Is it even possible to hit that race? > > > > It is, we can have a concurrent mmput followed by mmdrop. > > > > > find_lock_task_mm() takes some > > > care to prevent a NULL ->mm. But I guess a concurrent mmput() doesn't > > > require task_lock(). Kinda makes me wonder what's the point in even > > > having find_lock_task_mm() if its guarantee on ->mm is useless... > > > > find_lock_task_mm makes sure that the mm stays non-NULL while we hold > > the lock. We have to do all the necessary pinning while holding it. > > atomic_inc_not_zero will guarantee we are not racing with the finall > > mmput. > > > > Does that make more sense now? > > what Andrew wanted to confirm is "how can it be possible that > mm->mm_users < 1 when there is a tsk with tsk->mm != NULL", isn't it? > > Indeed, find_lock_task_mm() returns a tsk where tsk->mm != NULL with > tsk->alloc_lock held. Therefore, tsk->mm != NULL implies mm->mm_users > 0 > until we release tsk->alloc_lock , and we can do > > p = find_lock_task_mm(tsk); > if (!p) > goto unlock_oom; > > mm = p->mm; > - if (!atomic_inc_not_zero(&mm->mm_users)) { > - task_unlock(p); > - goto unlock_oom; > - } > + atomic_inc(&mm->mm_users); > > task_unlock(p); > > in __oom_reap_task() (unless I'm missing something). OK, I guess you are right. Care to send a patch? That also means that your patch to set mm = NULL in the atomic_inc_not_zero path is not really needed and in fact e2fe14564d3316d1625ed20bf1083995f4960893 which is sitting in the Linus tree is OK. I will comment on the rest in a separate reply to not mix the two things. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm,oom_reaper: don't call mmput_async() without atomic_inc_not_zero() 2016-06-02 13:11 ` Michal Hocko @ 2016-06-03 6:23 ` Michal Hocko 0 siblings, 0 replies; 8+ messages in thread From: Michal Hocko @ 2016-06-03 6:23 UTC (permalink / raw) To: Tetsuo Handa; +Cc: akpm, linux-mm, arnd On Thu 02-06-16 15:11:08, Michal Hocko wrote: > On Thu 02-06-16 21:20:03, Tetsuo Handa wrote: > > Michal Hocko wrote: > > > On Wed 01-06-16 15:53:13, Andrew Morton wrote: > [...] > > > > Is it even possible to hit that race? > > > > > > It is, we can have a concurrent mmput followed by mmdrop. > > > > > > > find_lock_task_mm() takes some > > > > care to prevent a NULL ->mm. But I guess a concurrent mmput() doesn't > > > > require task_lock(). Kinda makes me wonder what's the point in even > > > > having find_lock_task_mm() if its guarantee on ->mm is useless... > > > > > > find_lock_task_mm makes sure that the mm stays non-NULL while we hold > > > the lock. We have to do all the necessary pinning while holding it. > > > atomic_inc_not_zero will guarantee we are not racing with the finall > > > mmput. > > > > > > Does that make more sense now? > > > > what Andrew wanted to confirm is "how can it be possible that > > mm->mm_users < 1 when there is a tsk with tsk->mm != NULL", isn't it? > > > > Indeed, find_lock_task_mm() returns a tsk where tsk->mm != NULL with > > tsk->alloc_lock held. Therefore, tsk->mm != NULL implies mm->mm_users > 0 > > until we release tsk->alloc_lock , and we can do > > > > p = find_lock_task_mm(tsk); > > if (!p) > > goto unlock_oom; > > > > mm = p->mm; > > - if (!atomic_inc_not_zero(&mm->mm_users)) { > > - task_unlock(p); > > - goto unlock_oom; > > - } > > + atomic_inc(&mm->mm_users); > > > > task_unlock(p); > > > > in __oom_reap_task() (unless I'm missing something). > > OK, I guess you are right. Care to send a patch? I led it rest overnight and realized on the way to work this morning that this is a left over from the earlier approach when mm_reaper got mm rather than a task. We used to pin mm_count in oom_kill_process so we had to do atomic_inc_not_zero on mm_users here. Now that we used find_lock_task_mm we indeed can simply increase mm_users while holding the lock. Thanks! -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm,oom_reaper: don't call mmput_async() without atomic_inc_not_zero() 2016-06-02 12:20 ` Tetsuo Handa 2016-06-02 13:11 ` Michal Hocko @ 2016-06-02 13:49 ` Michal Hocko 1 sibling, 0 replies; 8+ messages in thread From: Michal Hocko @ 2016-06-02 13:49 UTC (permalink / raw) To: Tetsuo Handa; +Cc: akpm, linux-mm, arnd On Thu 02-06-16 21:20:03, Tetsuo Handa wrote: [...] > Also, dmesg.xz in the crash report http://lkml.kernel.org/r/20160601080209.GA7190@yexl-desktop > includes an interesting race. > [...] > The consecutive oom_reaper message on the same thread > > ---------- > [ 82.706724] oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:26488kB > [ 82.715540] oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:26900kB > [ 82.717662] oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:26900kB > [ 82.725804] oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:27296kB > [ 82.739091] oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:28148kB > ---------- > > suggests that it repeated race that trinity-c0 called out_of_memory() > and hit the shortcut > > if (current->mm && > (fatal_signal_pending(current) || task_will_free_mem(current))) { > mark_oom_victim(current); > try_oom_reaper(current); > return true; > } > > and got TIF_MEMDIE and woke up the OOM reaper. But the OOM reaper started > oom_reap_task() and cleared TIF_MEMDIE from trinity-c0 BEFORE trinity-c0 > tries to allocate using ALLOC_NO_WATERMARKS via TIF_MEMDIE. > > As a result, trinity-c0 was unable to use ALLOC_NO_WATERMARKS and had to call > out_of_memory() again. And again hit the shortcut and got TIF_MEMDIE and woke > up the OOM reaper, the OOM reaper cleared TIF_MEMDIE. So, this set TIF_MEMDIE > followed by clear TIF_MEMDIE repetition lasted for several times. Maybe we > should not try to clear TIF_MEMDIE from the OOM reaper. If we do not clear TIF_MEMDIE then we risk other issues. What we can do instead is to check for MMF_OOM_REAPED in task_will_free_mem and do not allow to bypass the oom killer. I will enahance the series which hammers that code path with that check. Thanks for pointing this out! -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-06-03 6:23 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-05-28 8:16 [PATCH] mm,oom_reaper: don't call mmput_async() without atomic_inc_not_zero() Tetsuo Handa 2016-05-30 6:52 ` Michal Hocko 2016-06-01 22:53 ` Andrew Morton 2016-06-02 6:48 ` Michal Hocko 2016-06-02 12:20 ` Tetsuo Handa 2016-06-02 13:11 ` Michal Hocko 2016-06-03 6:23 ` Michal Hocko 2016-06-02 13:49 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).