linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: Possible race condition in oom-killer
       [not found] <e6c83a26-1d59-4afd-55cf-04e58bdde188@caviumnetworks.com>
@ 2017-07-28 12:32 ` Michal Hocko
  2017-07-28 12:59   ` Tetsuo Handa
  2017-07-28 13:50   ` Manish Jaggi
  0 siblings, 2 replies; 16+ messages in thread
From: Michal Hocko @ 2017-07-28 12:32 UTC (permalink / raw)
  To: Manish Jaggi; +Cc: linux-kernel, linux-mm

[CC linux-mm]

On Fri 28-07-17 17:22:25, Manish Jaggi wrote:
> was: Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap
> 
> Hi Michal,
> On 7/27/2017 2:54 PM, Michal Hocko wrote:
> >On Thu 27-07-17 13:59:09, Manish Jaggi wrote:
> >[...]
> >>With 4.11.6 I was getting random kernel panics (Out of memory - No process left to kill),
> >>  when running LTP oom01 /oom02 ltp tests on our arm64 hardware with ~256G memory and high core count.
> >>The issue experienced was as follows
> >>	that either test (oom01/oom02) selected a pid as victim and waited for the pid to be killed.
> >>	that pid was marked as killed but somewhere there is a race and the process didnt get killed.
> >>	and the oom01/oom02 test started killing further processes, till it panics.
> >>IIUC this issue is quite similar to your patch description. But applying your patch I still see the issue.
> >>If it is not related to this patch, can you please suggest by looking at the log, what could be preventing
> >>the killing of victim.
> >>
> >>Log (https://pastebin.com/hg5iXRj2)
> >>
> >>As a subtest of oom02 starts, it prints out the victim - In this case 4578
> >>
> >>oom02       0  TINFO  :  start OOM testing for mlocked pages.
> >>oom02       0  TINFO  :  expected victim is 4578.
> >>
> >>When oom02 thread invokes oom-killer, it did select 4578  for killing...
> >I will definitely have a look. Can you report it in a separate email
> >thread please? Are you able to reproduce with the current Linus or
> >linux-next trees?
> Yes this issue is visible with linux-next.

Could you provide the full kernel log from this run please? I do not
expect there to be much difference but just to be sure that the code I
am looking at matches logs.

[...]
> >>[  365.283361] oom02:4586 invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=1,  order=0, oom_score_adj=0
> >Yes because
> >[  365.283499] Node 1 Normal free:19500kB min:33804kB low:165916kB high:298028kB active_anon:13312kB inactive_anon:172kB active_file:0kB inactive_file:1044kB unevictable:131560064kB writepending:0kB present:134213632kB managed:132113248kB mlocked:131560064kB slab_reclaimable:5748kB slab_unreclaimable:17808kB kernel_stack:2720kB pagetables:254636kB bounce:0kB free_pcp:10476kB local_pcp:144kB free_cma:0kB
> >
> >Although we have killed and reaped oom02 process Node1 is still below
> >min watermark and that is why we have hit the oom killer again. It
> >is not immediatelly clear to me why, that would require a deeper
> >inspection.
> I have a doubt here
> my understanding of oom test: oom() function basically forks itself and
> starts n threads each thread has a loop which allocates and touches memory
> thus will trigger oom-killer and will kill the process. the parent process
> is on a wait() and will print pass/fail.
> 
> So IIUC when 4578 is reaped all the child threads should be terminated,
> which happens in pass case (line 152)
> But even after being killed and reaped,  the oom killer is invoked again
> which doesn't seem right.

As I've said the OOM killer hits because the memory from Node 1 didn't
get freed for some reasov or got immediatally populated.

> Could it be that the process is just marked hidden from oom including its
> threads, thus oom-killer continues.

The whole process should be killed and the OOM reaper should only mark
the victim oom invisible _after_ the address space has been reaped (and
memory freed). You said the patch from
http://lkml.kernel.org/r/20170724072332.31903-1-mhocko@kernel.org didn't
help so it shouldn't be a race with the last __mmput.

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Possible race condition in oom-killer
  2017-07-28 12:32 ` Possible race condition in oom-killer Michal Hocko
@ 2017-07-28 12:59   ` Tetsuo Handa
  2017-07-28 13:07     ` Michal Hocko
  2017-07-28 13:15     ` Manish Jaggi
  2017-07-28 13:50   ` Manish Jaggi
  1 sibling, 2 replies; 16+ messages in thread
From: Tetsuo Handa @ 2017-07-28 12:59 UTC (permalink / raw)
  To: Michal Hocko, Manish Jaggi; +Cc: linux-kernel, linux-mm

(Oops. Forgot to add CC.)

On 2017/07/28 21:32, Michal Hocko wrote:
> [CC linux-mm]
>
> On Fri 28-07-17 17:22:25, Manish Jaggi wrote:
>> was: Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap
>>
>> Hi Michal,
>> On 7/27/2017 2:54 PM, Michal Hocko wrote:
>>> On Thu 27-07-17 13:59:09, Manish Jaggi wrote:
>>> [...]
>>>> With 4.11.6 I was getting random kernel panics (Out of memory - No process left to kill),
>>>>  when running LTP oom01 /oom02 ltp tests on our arm64 hardware with ~256G memory and high core count.
>>>> The issue experienced was as follows
>>>> 	that either test (oom01/oom02) selected a pid as victim and waited for the pid to be killed.
>>>> 	that pid was marked as killed but somewhere there is a race and the process didnt get killed.
>>>> 	and the oom01/oom02 test started killing further processes, till it panics.
>>>> IIUC this issue is quite similar to your patch description. But applying your patch I still see the issue.
>>>> If it is not related to this patch, can you please suggest by looking at the log, what could be preventing
>>>> the killing of victim.
>>>>
>>>> Log (https://pastebin.com/hg5iXRj2)
>>>>
>>>> As a subtest of oom02 starts, it prints out the victim - In this case 4578
>>>>
>>>> oom02       0  TINFO  :  start OOM testing for mlocked pages.
>>>> oom02       0  TINFO  :  expected victim is 4578.
>>>>
>>>> When oom02 thread invokes oom-killer, it did select 4578  for killing...
>>> I will definitely have a look. Can you report it in a separate email
>>> thread please? Are you able to reproduce with the current Linus or
>>> linux-next trees?
>> Yes this issue is visible with linux-next.
>
> Could you provide the full kernel log from this run please? I do not
> expect there to be much difference but just to be sure that the code I
> am looking at matches logs.

4578 is consuming memory as mlocked pages. But the OOM reaper cannot reclaim
mlocked pages (i.e. can_madv_dontneed_vma() returns false due to VM_LOCKED), can it?

oom02       0  TINFO  :  start OOM testing for mlocked pages.
oom02       0  TINFO  :  expected victim is 4578.
[  365.267347] oom_reaper: reaped process 4578 (oom02), now anon-rss:131559616kB, file-rss:0kB, shmem-rss:0kB

As a result, MMF_OOM_SKIP is set without reclaiming much memory.
Thus, it is natural that subsequent OOM victims are selected immediately because
almost all memory is still in use. Since 4578 is multi-threaded (isn't it?),
it will take time to call final __mmput() because mm->users are large.
Since there are many threads, it is possible that all OOM killable processes are
killed before final __mmput() of 4578 (which releases mlocked pages) is called.

>
> [...]
>>>> [  365.283361] oom02:4586 invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=1,  order=0, oom_score_adj=0
>>> Yes because
>>> [  365.283499] Node 1 Normal free:19500kB min:33804kB low:165916kB high:298028kB active_anon:13312kB inactive_anon:172kB active_file:0kB inactive_file:1044kB unevictable:131560064kB writepending:0kB present:134213632kB managed:132113248kB mlocked:131560064kB slab_reclaimable:5748kB slab_unreclaimable:17808kB kernel_stack:2720kB pagetables:254636kB bounce:0kB free_pcp:10476kB local_pcp:144kB free_cma:0kB
>>>
>>> Although we have killed and reaped oom02 process Node1 is still below
>>> min watermark and that is why we have hit the oom killer again. It
>>> is not immediatelly clear to me why, that would require a deeper
>>> inspection.
>> I have a doubt here
>> my understanding of oom test: oom() function basically forks itself and
>> starts n threads each thread has a loop which allocates and touches memory
>> thus will trigger oom-killer and will kill the process. the parent process
>> is on a wait() and will print pass/fail.
>>
>> So IIUC when 4578 is reaped all the child threads should be terminated,
>> which happens in pass case (line 152)
>> But even after being killed and reaped,  the oom killer is invoked again
>> which doesn't seem right.
>
> As I've said the OOM killer hits because the memory from Node 1 didn't
> get freed for some reasov or got immediatally populated.

Because of mlocked pages by multi threaded process, it will take time to
reclaim mlocked pages.

>
>> Could it be that the process is just marked hidden from oom including its
>> threads, thus oom-killer continues.
>
> The whole process should be killed and the OOM reaper should only mark
> the victim oom invisible _after_ the address space has been reaped (and
> memory freed). You said the patch from
> http://lkml.kernel.org/r/20170724072332.31903-1-mhocko@kernel.org didn't
> help so it shouldn't be a race with the last __mmput.
>
> Thanks!
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Possible race condition in oom-killer
  2017-07-28 12:59   ` Tetsuo Handa
@ 2017-07-28 13:07     ` Michal Hocko
  2017-07-28 13:15       ` Tetsuo Handa
  2017-07-28 13:15     ` Manish Jaggi
  1 sibling, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2017-07-28 13:07 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Manish Jaggi, linux-kernel, linux-mm

On Fri 28-07-17 21:59:50, Tetsuo Handa wrote:
> (Oops. Forgot to add CC.)
> 
> On 2017/07/28 21:32, Michal Hocko wrote:
> > [CC linux-mm]
> >
> > On Fri 28-07-17 17:22:25, Manish Jaggi wrote:
> >> was: Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap
> >>
> >> Hi Michal,
> >> On 7/27/2017 2:54 PM, Michal Hocko wrote:
> >>> On Thu 27-07-17 13:59:09, Manish Jaggi wrote:
> >>> [...]
> >>>> With 4.11.6 I was getting random kernel panics (Out of memory - No process left to kill),
> >>>>  when running LTP oom01 /oom02 ltp tests on our arm64 hardware with ~256G memory and high core count.
> >>>> The issue experienced was as follows
> >>>> 	that either test (oom01/oom02) selected a pid as victim and waited for the pid to be killed.
> >>>> 	that pid was marked as killed but somewhere there is a race and the process didnt get killed.
> >>>> 	and the oom01/oom02 test started killing further processes, till it panics.
> >>>> IIUC this issue is quite similar to your patch description. But applying your patch I still see the issue.
> >>>> If it is not related to this patch, can you please suggest by looking at the log, what could be preventing
> >>>> the killing of victim.
> >>>>
> >>>> Log (https://pastebin.com/hg5iXRj2)
> >>>>
> >>>> As a subtest of oom02 starts, it prints out the victim - In this case 4578
> >>>>
> >>>> oom02       0  TINFO  :  start OOM testing for mlocked pages.
> >>>> oom02       0  TINFO  :  expected victim is 4578.
> >>>>
> >>>> When oom02 thread invokes oom-killer, it did select 4578  for killing...
> >>> I will definitely have a look. Can you report it in a separate email
> >>> thread please? Are you able to reproduce with the current Linus or
> >>> linux-next trees?
> >> Yes this issue is visible with linux-next.
> >
> > Could you provide the full kernel log from this run please? I do not
> > expect there to be much difference but just to be sure that the code I
> > am looking at matches logs.
> 
> 4578 is consuming memory as mlocked pages. But the OOM reaper cannot reclaim
> mlocked pages (i.e. can_madv_dontneed_vma() returns false due to VM_LOCKED), can it?

You are absolutely right. I am pretty sure I've checked mlocked counter
as the first thing but that must be from one of the earlier oom reports.
My fault I haven't checked it in the critical one

[  365.267347] oom_reaper: reaped process 4578 (oom02), now anon-rss:131559616kB, file-rss:0kB, shmem-rss:0kB
[  365.282658] oom_reaper: reaped process 4583 (oom02), now anon-rss:131561664kB, file-rss:0kB, shmem-rss:0kB

and the above screemed about the fact I was just completely blind.

mlock pages handling is on my todo list for quite some time already but
I didn't get around it to implement that. mlock code is very tricky.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Possible race condition in oom-killer
  2017-07-28 13:07     ` Michal Hocko
@ 2017-07-28 13:15       ` Tetsuo Handa
  2017-07-28 13:29         ` Michal Hocko
  0 siblings, 1 reply; 16+ messages in thread
From: Tetsuo Handa @ 2017-07-28 13:15 UTC (permalink / raw)
  To: mhocko; +Cc: mjaggi, linux-kernel, linux-mm

Michal Hocko wrote:
> > 4578 is consuming memory as mlocked pages. But the OOM reaper cannot reclaim
> > mlocked pages (i.e. can_madv_dontneed_vma() returns false due to VM_LOCKED), can it?
> 
> You are absolutely right. I am pretty sure I've checked mlocked counter
> as the first thing but that must be from one of the earlier oom reports.
> My fault I haven't checked it in the critical one
> 
> [  365.267347] oom_reaper: reaped process 4578 (oom02), now anon-rss:131559616kB, file-rss:0kB, shmem-rss:0kB
> [  365.282658] oom_reaper: reaped process 4583 (oom02), now anon-rss:131561664kB, file-rss:0kB, shmem-rss:0kB
> 
> and the above screemed about the fact I was just completely blind.
> 
> mlock pages handling is on my todo list for quite some time already but
> I didn't get around it to implement that. mlock code is very tricky.

task_will_free_mem(current) in out_of_memory() returning false due to
MMF_OOM_SKIP already set allowed each thread sharing that mm to select a new
OOM victim. If task_will_free_mem(current) in out_of_memory() did not return
false, threads sharing MMF_OOM_SKIP mm would not have selected new victims
to the level where all OOM killable processes are killed and calls panic().

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Possible race condition in oom-killer
  2017-07-28 12:59   ` Tetsuo Handa
  2017-07-28 13:07     ` Michal Hocko
@ 2017-07-28 13:15     ` Manish Jaggi
  1 sibling, 0 replies; 16+ messages in thread
From: Manish Jaggi @ 2017-07-28 13:15 UTC (permalink / raw)
  To: Tetsuo Handa, Michal Hocko; +Cc: linux-kernel, linux-mm

Hello Tetsuo Handa,

On 7/28/2017 6:29 PM, Tetsuo Handa wrote:
> (Oops. Forgot to add CC.)
>
> On 2017/07/28 21:32, Michal Hocko wrote:
>> [CC linux-mm]
>>
>> On Fri 28-07-17 17:22:25, Manish Jaggi wrote:
>>> was: Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap
>>>
>>> Hi Michal,
>>> On 7/27/2017 2:54 PM, Michal Hocko wrote:
>>>> On Thu 27-07-17 13:59:09, Manish Jaggi wrote:
>>>> [...]
>>>>> With 4.11.6 I was getting random kernel panics (Out of memory - No process left to kill),
>>>>>   when running LTP oom01 /oom02 ltp tests on our arm64 hardware with ~256G memory and high core count.
>>>>> The issue experienced was as follows
>>>>> 	that either test (oom01/oom02) selected a pid as victim and waited for the pid to be killed.
>>>>> 	that pid was marked as killed but somewhere there is a race and the process didnt get killed.
>>>>> 	and the oom01/oom02 test started killing further processes, till it panics.
>>>>> IIUC this issue is quite similar to your patch description. But applying your patch I still see the issue.
>>>>> If it is not related to this patch, can you please suggest by looking at the log, what could be preventing
>>>>> the killing of victim.
>>>>>
>>>>> Log (https://pastebin.com/hg5iXRj2)
>>>>>
>>>>> As a subtest of oom02 starts, it prints out the victim - In this case 4578
>>>>>
>>>>> oom02       0  TINFO  :  start OOM testing for mlocked pages.
>>>>> oom02       0  TINFO  :  expected victim is 4578.
>>>>>
>>>>> When oom02 thread invokes oom-killer, it did select 4578  for killing...
>>>> I will definitely have a look. Can you report it in a separate email
>>>> thread please? Are you able to reproduce with the current Linus or
>>>> linux-next trees?
>>> Yes this issue is visible with linux-next.
>> Could you provide the full kernel log from this run please? I do not
>> expect there to be much difference but just to be sure that the code I
>> am looking at matches logs.
> 4578 is consuming memory as mlocked pages. But the OOM reaper cannot reclaim
> mlocked pages (i.e. can_madv_dontneed_vma() returns false due to VM_LOCKED), can it?
>
> oom02       0  TINFO  :  start OOM testing for mlocked pages.
> oom02       0  TINFO  :  expected victim is 4578.
> [  365.267347] oom_reaper: reaped process 4578 (oom02), now anon-rss:131559616kB, file-rss:0kB, shmem-rss:0kB
>
> As a result, MMF_OOM_SKIP is set without reclaiming much memory.
> Thus, it is natural that subsequent OOM victims are selected immediately because
> almost all memory is still in use. Since 4578 is multi-threaded (isn't it?),
> it will take time to call final __mmput() because mm->users are large.
> Since there are many threads, it is possible that all OOM killable processes are
> killed before final __mmput() of 4578 (which releases mlocked pages) is called.
My setup has 95 or more cores, so the large number of cores could be the 
reason for the random failure?
>> [...]
>>>>> [  365.283361] oom02:4586 invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=1,  order=0, oom_score_adj=0
>>>> Yes because
>>>> [  365.283499] Node 1 Normal free:19500kB min:33804kB low:165916kB high:298028kB active_anon:13312kB inactive_anon:172kB active_file:0kB inactive_file:1044kB unevictable:131560064kB writepending:0kB present:134213632kB managed:132113248kB mlocked:131560064kB slab_reclaimable:5748kB slab_unreclaimable:17808kB kernel_stack:2720kB pagetables:254636kB bounce:0kB free_pcp:10476kB local_pcp:144kB free_cma:0kB
>>>>
>>>> Although we have killed and reaped oom02 process Node1 is still below
>>>> min watermark and that is why we have hit the oom killer again. It
>>>> is not immediatelly clear to me why, that would require a deeper
>>>> inspection.
>>> I have a doubt here
>>> my understanding of oom test: oom() function basically forks itself and
>>> starts n threads each thread has a loop which allocates and touches memory
>>> thus will trigger oom-killer and will kill the process. the parent process
>>> is on a wait() and will print pass/fail.
>>>
>>> So IIUC when 4578 is reaped all the child threads should be terminated,
>>> which happens in pass case (line 152)
>>> But even after being killed and reaped,  the oom killer is invoked again
>>> which doesn't seem right.
>> As I've said the OOM killer hits because the memory from Node 1 didn't
>> get freed for some reasov or got immediatally populated.
> Because of mlocked pages by multi threaded process, it will take time to
> reclaim mlocked pages.
>
>>> Could it be that the process is just marked hidden from oom including its
>>> threads, thus oom-killer continues.
>> The whole process should be killed and the OOM reaper should only mark
>> the victim oom invisible _after_ the address space has been reaped (and
>> memory freed). You said the patch from
>> http://lkml.kernel.org/r/20170724072332.31903-1-mhocko@kernel.org didn't
>> help so it shouldn't be a race with the last __mmput.
>>
>> Thanks!
>>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Possible race condition in oom-killer
  2017-07-28 13:15       ` Tetsuo Handa
@ 2017-07-28 13:29         ` Michal Hocko
  2017-07-28 13:55           ` Tetsuo Handa
  0 siblings, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2017-07-28 13:29 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: mjaggi, linux-kernel, linux-mm

On Fri 28-07-17 22:15:01, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > > 4578 is consuming memory as mlocked pages. But the OOM reaper cannot reclaim
> > > mlocked pages (i.e. can_madv_dontneed_vma() returns false due to VM_LOCKED), can it?
> > 
> > You are absolutely right. I am pretty sure I've checked mlocked counter
> > as the first thing but that must be from one of the earlier oom reports.
> > My fault I haven't checked it in the critical one
> > 
> > [  365.267347] oom_reaper: reaped process 4578 (oom02), now anon-rss:131559616kB, file-rss:0kB, shmem-rss:0kB
> > [  365.282658] oom_reaper: reaped process 4583 (oom02), now anon-rss:131561664kB, file-rss:0kB, shmem-rss:0kB
> > 
> > and the above screemed about the fact I was just completely blind.
> > 
> > mlock pages handling is on my todo list for quite some time already but
> > I didn't get around it to implement that. mlock code is very tricky.
> 
> task_will_free_mem(current) in out_of_memory() returning false due to
> MMF_OOM_SKIP already set allowed each thread sharing that mm to select a new
> OOM victim. If task_will_free_mem(current) in out_of_memory() did not return
> false, threads sharing MMF_OOM_SKIP mm would not have selected new victims
> to the level where all OOM killable processes are killed and calls panic().

I am not sure I understand. Do you mean this?
---
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 9e8b4f030c1c..671e4a4107d0 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -779,13 +779,6 @@ static bool task_will_free_mem(struct task_struct *task)
 	if (!__task_will_free_mem(task))
 		return false;
 
-	/*
-	 * This task has already been drained by the oom reaper so there are
-	 * only small chances it will free some more
-	 */
-	if (test_bit(MMF_OOM_SKIP, &mm->flags))
-		return false;
-
 	if (atomic_read(&mm->mm_users) <= 1)
 		return true;
 
If yes I would have to think about this some more because that might
have weird side effects (e.g. oom_victims counting after threads passed
exit_oom_victim).

Anyway the proper fix for this is to allow reaping mlocked pages. Is
something other than the LTP test affected to give this more priority?
Do we have other usecases where something mlocks the whole memory?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Possible race condition in oom-killer
  2017-07-28 12:32 ` Possible race condition in oom-killer Michal Hocko
  2017-07-28 12:59   ` Tetsuo Handa
@ 2017-07-28 13:50   ` Manish Jaggi
  2017-07-28 14:12     ` Michal Hocko
  1 sibling, 1 reply; 16+ messages in thread
From: Manish Jaggi @ 2017-07-28 13:50 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-kernel, linux-mm


Hi Michal,
On 7/28/2017 6:02 PM, Michal Hocko wrote:
> [CC linux-mm]
>
> On Fri 28-07-17 17:22:25, Manish Jaggi wrote:
>> was: Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap
>>
>> Hi Michal,
>> On 7/27/2017 2:54 PM, Michal Hocko wrote:
>>> On Thu 27-07-17 13:59:09, Manish Jaggi wrote:
>>> [...]
>>>> With 4.11.6 I was getting random kernel panics (Out of memory - No process left to kill),
>>>>   when running LTP oom01 /oom02 ltp tests on our arm64 hardware with ~256G memory and high core count.
>>>> The issue experienced was as follows
>>>> 	that either test (oom01/oom02) selected a pid as victim and waited for the pid to be killed.
>>>> 	that pid was marked as killed but somewhere there is a race and the process didnt get killed.
>>>> 	and the oom01/oom02 test started killing further processes, till it panics.
>>>> IIUC this issue is quite similar to your patch description. But applying your patch I still see the issue.
>>>> If it is not related to this patch, can you please suggest by looking at the log, what could be preventing
>>>> the killing of victim.
>>>>
>>>> Log (https://pastebin.com/hg5iXRj2)
>>>>
>>>> As a subtest of oom02 starts, it prints out the victim - In this case 4578
>>>>
>>>> oom02       0  TINFO  :  start OOM testing for mlocked pages.
>>>> oom02       0  TINFO  :  expected victim is 4578.
>>>>
>>>> When oom02 thread invokes oom-killer, it did select 4578  for killing...
>>> I will definitely have a look. Can you report it in a separate email
>>> thread please? Are you able to reproduce with the current Linus or
>>> linux-next trees?
>> Yes this issue is visible with linux-next.
> Could you provide the full kernel log from this run please? I do not
> expect there to be much difference but just to be sure that the code I
> am looking at matches logs.
The log is here: https://pastebin.com/Pmn5ZwEM
mlocked memory keeps on increasing till panic.

> [...]
>>>> [  365.283361] oom02:4586 invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=1,  order=0, oom_score_adj=0
>>> Yes because
>>> [  365.283499] Node 1 Normal free:19500kB min:33804kB low:165916kB high:298028kB active_anon:13312kB inactive_anon:172kB active_file:0kB inactive_file:1044kB unevictable:131560064kB writepending:0kB present:134213632kB managed:132113248kB mlocked:131560064kB slab_reclaimable:5748kB slab_unreclaimable:17808kB kernel_stack:2720kB pagetables:254636kB bounce:0kB free_pcp:10476kB local_pcp:144kB free_cma:0kB
>>>
>>> Although we have killed and reaped oom02 process Node1 is still below
>>> min watermark and that is why we have hit the oom killer again. It
>>> is not immediatelly clear to me why, that would require a deeper
>>> inspection.
>> I have a doubt here
>> my understanding of oom test: oom() function basically forks itself and
>> starts n threads each thread has a loop which allocates and touches memory
>> thus will trigger oom-killer and will kill the process. the parent process
>> is on a wait() and will print pass/fail.
>>
>> So IIUC when 4578 is reaped all the child threads should be terminated,
>> which happens in pass case (line 152)
>> But even after being killed and reaped,  the oom killer is invoked again
>> which doesn't seem right.
> As I've said the OOM killer hits because the memory from Node 1 didn't
> get freed for some reasov or got immediatally populated.
>
>> Could it be that the process is just marked hidden from oom including its
>> threads, thus oom-killer continues.
> The whole process should be killed and the OOM reaper should only mark
> the victim oom invisible _after_ the address space has been reaped (and
> memory freed). You said the patch from
> http://lkml.kernel.org/r/20170724072332.31903-1-mhocko@kernel.org didn't
> help so it shouldn't be a race with the last __mmput.
>
> Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Possible race condition in oom-killer
  2017-07-28 13:29         ` Michal Hocko
@ 2017-07-28 13:55           ` Tetsuo Handa
  2017-07-28 14:07             ` Michal Hocko
  0 siblings, 1 reply; 16+ messages in thread
From: Tetsuo Handa @ 2017-07-28 13:55 UTC (permalink / raw)
  To: mhocko; +Cc: mjaggi, linux-kernel, linux-mm

Michal Hocko wrote:
> On Fri 28-07-17 22:15:01, Tetsuo Handa wrote:
> > task_will_free_mem(current) in out_of_memory() returning false due to
> > MMF_OOM_SKIP already set allowed each thread sharing that mm to select a new
> > OOM victim. If task_will_free_mem(current) in out_of_memory() did not return
> > false, threads sharing MMF_OOM_SKIP mm would not have selected new victims
> > to the level where all OOM killable processes are killed and calls panic().
> 
> I am not sure I understand. Do you mean this?

Yes.

> ---
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 9e8b4f030c1c..671e4a4107d0 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -779,13 +779,6 @@ static bool task_will_free_mem(struct task_struct *task)
>  	if (!__task_will_free_mem(task))
>  		return false;
>  
> -	/*
> -	 * This task has already been drained by the oom reaper so there are
> -	 * only small chances it will free some more
> -	 */
> -	if (test_bit(MMF_OOM_SKIP, &mm->flags))
> -		return false;
> -
>  	if (atomic_read(&mm->mm_users) <= 1)
>  		return true;
>  
> If yes I would have to think about this some more because that might
> have weird side effects (e.g. oom_victims counting after threads passed
> exit_oom_victim).

But this check should not be removed unconditionally. We should still return
false if returning true was not sufficient to solve the OOM situation, for
we need to select next OOM victim in that case.

> 
> Anyway the proper fix for this is to allow reaping mlocked pages.

Different approach is to set TIF_MEMDIE to all threads sharing the same
memory so that threads sharing MMF_OOM_SKIP mm do not need to call
out_of_memory() in order to get TIF_MEMDIE. 

Yet another apporach is to use __GFP_KILLABLE (we can start it as
best effort basis).

>                                                                   Is
> something other than the LTP test affected to give this more priority?
> Do we have other usecases where something mlocks the whole memory?

This panic was caused by 50 threads sharing MMF_OOM_SKIP mm exceeding
number of OOM killable processes. Whether memory is locked or not isn't
important. If a multi-threaded process which consumes little memory was
selected as an OOM victim (and reaped by the OOM reaper and MMF_OOM_SKIP
was set immediately), it might be still possible to select next OOM victims
needlessly.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Possible race condition in oom-killer
  2017-07-28 13:55           ` Tetsuo Handa
@ 2017-07-28 14:07             ` Michal Hocko
  2017-07-29  4:31               ` Tetsuo Handa
  2017-08-01 10:46               ` Tetsuo Handa
  0 siblings, 2 replies; 16+ messages in thread
From: Michal Hocko @ 2017-07-28 14:07 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: mjaggi, linux-kernel, linux-mm

On Fri 28-07-17 22:55:51, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Fri 28-07-17 22:15:01, Tetsuo Handa wrote:
> > > task_will_free_mem(current) in out_of_memory() returning false due to
> > > MMF_OOM_SKIP already set allowed each thread sharing that mm to select a new
> > > OOM victim. If task_will_free_mem(current) in out_of_memory() did not return
> > > false, threads sharing MMF_OOM_SKIP mm would not have selected new victims
> > > to the level where all OOM killable processes are killed and calls panic().
> > 
> > I am not sure I understand. Do you mean this?
> 
> Yes.
> 
> > ---
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 9e8b4f030c1c..671e4a4107d0 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -779,13 +779,6 @@ static bool task_will_free_mem(struct task_struct *task)
> >  	if (!__task_will_free_mem(task))
> >  		return false;
> >  
> > -	/*
> > -	 * This task has already been drained by the oom reaper so there are
> > -	 * only small chances it will free some more
> > -	 */
> > -	if (test_bit(MMF_OOM_SKIP, &mm->flags))
> > -		return false;
> > -
> >  	if (atomic_read(&mm->mm_users) <= 1)
> >  		return true;
> >  
> > If yes I would have to think about this some more because that might
> > have weird side effects (e.g. oom_victims counting after threads passed
> > exit_oom_victim).
> 
> But this check should not be removed unconditionally. We should still return
> false if returning true was not sufficient to solve the OOM situation, for
> we need to select next OOM victim in that case.
> 
> > 
> > Anyway the proper fix for this is to allow reaping mlocked pages.
> 
> Different approach is to set TIF_MEMDIE to all threads sharing the same
> memory so that threads sharing MMF_OOM_SKIP mm do not need to call
> out_of_memory() in order to get TIF_MEMDIE. 

This is not so simple. If it were we could simply remove TIF_MEMDIE
altogether and rely on tsk_is_oom_victim.

> Yet another apporach is to use __GFP_KILLABLE (we can start it as
> best effort basis).
> 
> >                                                                   Is
> > something other than the LTP test affected to give this more priority?
> > Do we have other usecases where something mlocks the whole memory?
> 
> This panic was caused by 50 threads sharing MMF_OOM_SKIP mm exceeding
> number of OOM killable processes. Whether memory is locked or not isn't
> important.

You are wrong here I believe. The whole problem is that the OOM victim
is consuming basically all the memory (that is what the test case
actually does IIRC) and that memory is mlocked. oom_reaper is much
faster to evaluate the mm of the victim and bail out sooner than the
exit path actually manages to tear down the address space. And so we
have to find other oom victims until we simply kill everything and
panic.

> If a multi-threaded process which consumes little memory was
> selected as an OOM victim (and reaped by the OOM reaper and MMF_OOM_SKIP
> was set immediately), it might be still possible to select next OOM victims
> needlessly.

This would be true if the address space itself only contained a little
amount of memory and the large part of the memory was in page tables or
other resources which oom_reaper cannot work with. This is not a usual
case though.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Possible race condition in oom-killer
  2017-07-28 13:50   ` Manish Jaggi
@ 2017-07-28 14:12     ` Michal Hocko
  0 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2017-07-28 14:12 UTC (permalink / raw)
  To: Manish Jaggi; +Cc: linux-kernel, linux-mm

On Fri 28-07-17 19:20:42, Manish Jaggi wrote:
> 
> Hi Michal,
> On 7/28/2017 6:02 PM, Michal Hocko wrote:
> >[CC linux-mm]
> >
> >On Fri 28-07-17 17:22:25, Manish Jaggi wrote:
> >>was: Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap
> >>
> >>Hi Michal,
> >>On 7/27/2017 2:54 PM, Michal Hocko wrote:
> >>>On Thu 27-07-17 13:59:09, Manish Jaggi wrote:
> >>>[...]
> >>>>With 4.11.6 I was getting random kernel panics (Out of memory - No process left to kill),
> >>>>  when running LTP oom01 /oom02 ltp tests on our arm64 hardware with ~256G memory and high core count.
> >>>>The issue experienced was as follows
> >>>>	that either test (oom01/oom02) selected a pid as victim and waited for the pid to be killed.
> >>>>	that pid was marked as killed but somewhere there is a race and the process didnt get killed.
> >>>>	and the oom01/oom02 test started killing further processes, till it panics.
> >>>>IIUC this issue is quite similar to your patch description. But applying your patch I still see the issue.
> >>>>If it is not related to this patch, can you please suggest by looking at the log, what could be preventing
> >>>>the killing of victim.
> >>>>
> >>>>Log (https://pastebin.com/hg5iXRj2)
> >>>>
> >>>>As a subtest of oom02 starts, it prints out the victim - In this case 4578
> >>>>
> >>>>oom02       0  TINFO  :  start OOM testing for mlocked pages.
> >>>>oom02       0  TINFO  :  expected victim is 4578.
> >>>>
> >>>>When oom02 thread invokes oom-killer, it did select 4578  for killing...
> >>>I will definitely have a look. Can you report it in a separate email
> >>>thread please? Are you able to reproduce with the current Linus or
> >>>linux-next trees?
> >>Yes this issue is visible with linux-next.
> >Could you provide the full kernel log from this run please? I do not
> >expect there to be much difference but just to be sure that the code I
> >am looking at matches logs.
> The log is here: https://pastebin.com/Pmn5ZwEM
> mlocked memory keeps on increasing till panic.

Thank you for retesting. I confirm the issue is that the oom reaper
hides the oom victim too early because the whole address space is
mlocked basically and there is not much to free. As the exit of the test
takes some time a new instance of the test pid 4625 in this case will go
and consume more than the exiting frees and that would go on an on until
we kill other eligible tasks until we panic due to no more eligible
tasks.

This is a bad situation and as I've said elsewhere in the thread the
proper fix is to teach the oom reaper to handle mlocked pages. This is
not a trivial task. We could play some other dirty tricks but I am not
sure it is worth it considering this is rather artificial test. 

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Possible race condition in oom-killer
  2017-07-28 14:07             ` Michal Hocko
@ 2017-07-29  4:31               ` Tetsuo Handa
  2017-08-01 12:14                 ` Michal Hocko
  2017-08-01 10:46               ` Tetsuo Handa
  1 sibling, 1 reply; 16+ messages in thread
From: Tetsuo Handa @ 2017-07-29  4:31 UTC (permalink / raw)
  To: mhocko; +Cc: mjaggi, linux-kernel, linux-mm

Michal Hocko wrote:
> On Fri 28-07-17 22:55:51, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Fri 28-07-17 22:15:01, Tetsuo Handa wrote:
> > > > task_will_free_mem(current) in out_of_memory() returning false due to
> > > > MMF_OOM_SKIP already set allowed each thread sharing that mm to select a new
> > > > OOM victim. If task_will_free_mem(current) in out_of_memory() did not return
> > > > false, threads sharing MMF_OOM_SKIP mm would not have selected new victims
> > > > to the level where all OOM killable processes are killed and calls panic().
> > > 
> > > I am not sure I understand. Do you mean this?
> > 
> > Yes.
> > 
> > > ---
> > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > > index 9e8b4f030c1c..671e4a4107d0 100644
> > > --- a/mm/oom_kill.c
> > > +++ b/mm/oom_kill.c
> > > @@ -779,13 +779,6 @@ static bool task_will_free_mem(struct task_struct *task)
> > >  	if (!__task_will_free_mem(task))
> > >  		return false;
> > >  
> > > -	/*
> > > -	 * This task has already been drained by the oom reaper so there are
> > > -	 * only small chances it will free some more
> > > -	 */
> > > -	if (test_bit(MMF_OOM_SKIP, &mm->flags))
> > > -		return false;
> > > -
> > >  	if (atomic_read(&mm->mm_users) <= 1)
> > >  		return true;
> > >  
> > > If yes I would have to think about this some more because that might
> > > have weird side effects (e.g. oom_victims counting after threads passed
> > > exit_oom_victim).
> > 
> > But this check should not be removed unconditionally. We should still return
> > false if returning true was not sufficient to solve the OOM situation, for
> > we need to select next OOM victim in that case.
> > 

I think that below one can manage this race condition.

---
 include/linux/sched.h |  1 +
 mm/oom_kill.c         | 21 ++++++++++++++-------
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 0db4870..3fccf72 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -652,6 +652,7 @@ struct task_struct {
 	/* disallow userland-initiated cgroup migration */
 	unsigned			no_cgroup_migration:1;
 #endif
+	unsigned			oom_kill_free_check_raced:1;
 
 	unsigned long			atomic_flags; /* Flags requiring atomic access. */
 
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 9e8b4f0..a093193 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -779,13 +779,6 @@ static bool task_will_free_mem(struct task_struct *task)
 	if (!__task_will_free_mem(task))
 		return false;
 
-	/*
-	 * This task has already been drained by the oom reaper so there are
-	 * only small chances it will free some more
-	 */
-	if (test_bit(MMF_OOM_SKIP, &mm->flags))
-		return false;
-
 	if (atomic_read(&mm->mm_users) <= 1)
 		return true;
 
@@ -806,6 +799,20 @@ static bool task_will_free_mem(struct task_struct *task)
 	}
 	rcu_read_unlock();
 
+	/*
+	 * It is possible that current thread fails to try allocation from
+	 * memory reserves if the OOM reaper set MMF_OOM_SKIP on this mm before
+	 * current thread calls out_of_memory() in order to get TIF_MEMDIE.
+	 * In that case, allow current thread to try TIF_MEMDIE allocation
+	 * before start selecting next OOM victims.
+	 */
+	if (ret && test_bit(MMF_OOM_SKIP, &mm->flags)) {
+		if (task == current && !task->oom_kill_free_check_raced)
+			task->oom_kill_free_check_raced = true;
+		else
+			ret = false;
+	}
+
 	return ret;
 }
 
-- 
1.8.3.1

What is "oom_victims counting after threads passed exit_oom_victim" ?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Possible race condition in oom-killer
  2017-07-28 14:07             ` Michal Hocko
  2017-07-29  4:31               ` Tetsuo Handa
@ 2017-08-01 10:46               ` Tetsuo Handa
  2017-08-01 11:30                 ` Michal Hocko
  1 sibling, 1 reply; 16+ messages in thread
From: Tetsuo Handa @ 2017-08-01 10:46 UTC (permalink / raw)
  To: mhocko; +Cc: mjaggi, linux-kernel, linux-mm

Michal Hocko wrote:
> > >                                                                   Is
> > > something other than the LTP test affected to give this more priority?
> > > Do we have other usecases where something mlocks the whole memory?
> > 
> > This panic was caused by 50 threads sharing MMF_OOM_SKIP mm exceeding
> > number of OOM killable processes. Whether memory is locked or not isn't
> > important.
> 
> You are wrong here I believe. The whole problem is that the OOM victim
> is consuming basically all the memory (that is what the test case
> actually does IIRC) and that memory is mlocked. oom_reaper is much
> faster to evaluate the mm of the victim and bail out sooner than the
> exit path actually manages to tear down the address space. And so we
> have to find other oom victims until we simply kill everything and
> panic.

Again, whether memory is locked or not isn't important. I can easily
reproduce unnecessary OOM victim selection as a local unprivileged user
using below program.

----------
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sched.h>
#include <sys/mman.h>

#define NUMTHREADS 128
#define MMAPSIZE 128 * 1048576
#define STACKSIZE 4096
static int pipe_fd[2] = { EOF, EOF };
static int memory_eater(void *unused)
{
	int fd = open("/dev/zero", O_RDONLY);
	char *buf = mmap(NULL, MMAPSIZE, PROT_WRITE | PROT_READ,
			 MAP_ANONYMOUS | MAP_SHARED, EOF, 0);
	read(pipe_fd[0], buf, 1);
	read(fd, buf, MMAPSIZE);
	pause();
	return 0;
}
int main(int argc, char *argv[])
{
	int i;
	char *stack;
	if (fork() || fork() || setsid() == EOF || pipe(pipe_fd))
		_exit(0);
	stack = mmap(NULL, STACKSIZE * NUMTHREADS, PROT_WRITE | PROT_READ,
		     MAP_ANONYMOUS | MAP_SHARED, EOF, 0);
	for (i = 0; i < NUMTHREADS; i++)
                if (clone(memory_eater, stack + (i + 1) * STACKSIZE,
			  CLONE_THREAD | CLONE_SIGHAND | CLONE_VM | CLONE_FS |
			  CLONE_FILES, NULL) == -1)
                        break;
	sleep(1);
	close(pipe_fd[1]);
	pause();
	return 0;
}
----------

Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20170801-2.txt.xz :
----------
[  237.792768] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[  237.795575] [  451]     0   451     9206      639      21       3        0             0 systemd-journal
[  237.798515] [  478]     0   478    11138      740      25       3        0         -1000 systemd-udevd
[  237.801430] [  488]     0   488    13856      100      26       3        0         -1000 auditd
[  237.804212] [  592]    81   592     6135      119      18       3        0          -900 dbus-daemon
[  237.807166] [  668]     0   668     1094       23       8       3        0             0 rngd
[  237.809927] [  671]    70   671     7029       75      19       4        0             0 avahi-daemon
[  237.812809] [  672]     0   672    53144      402      57       4        0             0 abrtd
[  237.815611] [  675]     0   675    26372      246      54       3        0         -1000 sshd
[  237.818358] [  679]     0   679    52573      341      54       3        0             0 abrt-watch-log
[  237.821274] [  680]     0   680     6050       79      17       3        0             0 systemd-logind
[  237.824279] [  683]     0   683     4831       82      16       3        0             0 irqbalance
[  237.827119] [  698]     0   698    56014      630      40       4        0             0 rsyslogd
[  237.829929] [  715]    70   715     6997       59      18       4        0             0 avahi-daemon
[  237.832799] [  832]     0   832    65453      228      44       3        0             0 vmtoolsd
[  237.835605] [  852]     0   852    57168      353      58       3        0             0 vmtoolsd
[  237.838409] [  909]     0   909    31558      155      20       3        0             0 crond
[  237.841160] [  986]     0   986    84330      393     114       4        0             0 nmbd
[  237.843878] [ 1041]     0  1041    23728      168      51       3        0             0 login
[  237.846623] [ 2019]     0  2019    22261      252      43       3        0             0 master
[  237.849307] [ 2034]     0  2034    27511       33      13       3        0             0 agetty
[  237.851977] [ 2100]    89  2100    22287      250      45       3        0             0 pickup
[  237.854607] [ 2101]    89  2101    22304      251      45       3        0             0 qmgr
[  237.857179] [ 2597]     0  2597   102073      568     150       3        0             0 smbd
[  237.859773] [ 3905]  1000  3905    28885      133      15       4        0             0 bash
[  237.862337] [ 3952]     0  3952    27511       32      10       3        0             0 agetty
[  237.864905] [ 4772]     0  4772   102073      568     140       3        0             0 cleanupd
[  237.867488] [ 4775]  1000  4775  4195473   860912    1814      19        0             0 a.out
[  237.869991] Out of memory: Kill process 4775 (a.out) score 924 or sacrifice child
[  240.940617] Killed process 4775 (a.out) total-vm:16781892kB, anon-rss:88kB, file-rss:0kB, shmem-rss:3443560kB
[  240.962810] oom_reaper: reaped process 4775 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3445880kB
[  240.965513] oom_reaper: reaped process 4863 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3445880kB
[  240.968192] oom_reaper: reaped process 4896 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3445880kB
[  240.970789] oom_reaper: reaped process 4851 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3445880kB
[  240.973352] oom_reaper: reaped process 4788 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3445880kB
[  240.978376] oom_reaper: reaped process 4781 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3445880kB
[  240.981007] oom_reaper: reaped process 4780 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3445880kB
[  240.983527] oom_reaper: reaped process 4903 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3445880kB
[  240.986038] oom_reaper: reaped process 4891 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3445880kB
[  240.988697] oom_reaper: reaped process 4809 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3445880kB
[  240.999681] oom_reaper: reaped process 4806 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3445880kB
[  241.002165] oom_reaper: reaped process 4783 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3445880kB
[  241.004591] oom_reaper: reaped process 4816 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3445880kB
[  241.007022] oom_reaper: reaped process 4873 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3445880kB
[  241.009522] a.out invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null),  order=0, oom_score_adj=0
[  241.157201] a.out cpuset=/ mems_allowed=0
[  241.158983] CPU: 1 PID: 4805 Comm: a.out Not tainted 4.13.0-rc2-next-20170728 #649
(...snipped...)
[  357.797379] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[  357.800119] [  478]     0   478    11138      740      25       3        0         -1000 systemd-udevd
[  357.803005] [  488]     0   488    13856      110      26       3        0         -1000 auditd
[  357.805800] [  592]    81   592     6135      119      18       3        0          -900 dbus-daemon
[  357.808710] [  668]     0   668     1094       23       8       3        0             0 rngd
[  357.811437] [  671]    70   671     7029       75      19       4        0             0 avahi-daemon
[  357.814308] [  675]     0   675    26372      246      54       3        0         -1000 sshd
[  357.817048] [  679]     0   679    52573      341      54       3        0             0 abrt-watch-log
[  357.819969] [  680]     0   680     6050       79      17       3        0             0 systemd-logind
[  357.822863] [  683]     0   683     4831       82      16       3        0             0 irqbalance
[  357.825699] [  715]    70   715     6997       59      18       4        0             0 avahi-daemon
[  357.828567] [  832]     0   832    65453      228      44       3        0             0 vmtoolsd
[  357.831373] [  909]     0   909    31558      155      20       3        0             0 crond
[  357.834160] [ 1041]     0  1041    23728      168      51       3        0             0 login
[  357.836882] [ 2019]     0  2019    22261      258      43       3        0             0 master
[  357.839597] [ 2034]     0  2034    27511       33      13       3        0             0 agetty
[  357.842472] [ 2100]    89  2100    22287      250      45       3        0             0 pickup
[  357.845358] [ 2101]    89  2101    22304      250      45       3        0             0 qmgr
[  357.848247] [ 3905]  1000  3905    28885      134      15       4        0             0 bash
[  357.851074] [ 3952]     0  3952    27511       32      10       3        0             0 agetty
[  357.853790] [ 5040]     0  5040     9207      365      20       3        0             0 systemd-journal
[  357.856619] [ 5459]     0  5459    58062      437      42       3        0             0 rsyslogd
[  357.859296] [ 5465]  1000  5465  4195473   862629    1767      19        0             0 a.out
[  357.861907] Out of memory: Kill process 5465 (a.out) score 926 or sacrifice child
[  358.789961] Killed process 5465 (a.out) total-vm:16781892kB, anon-rss:88kB, file-rss:0kB, shmem-rss:3450428kB
[  358.903547] oom_reaper: reaped process 5465 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3451792kB
[  358.906409] oom_reaper: reaped process 5556 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3451792kB
[  358.909182] oom_reaper: reaped process 5579 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3451792kB
[  358.911930] oom_reaper: reaped process 5500 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3451792kB
[  358.914625] oom_reaper: reaped process 5514 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3451792kB
[  358.917425] oom_reaper: reaped process 5569 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3451792kB
[  358.923629] oom_reaper: reaped process 5533 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3451792kB
[  358.926289] oom_reaper: reaped process 5501 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3451792kB
[  358.928867] oom_reaper: reaped process 5534 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3451792kB
[  358.935580] oom_reaper: reaped process 5505 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3451792kB
(...snipped...)
[  359.863069] Out of memory: Kill process 5459 (rsyslogd) score 0 or sacrifice child
[  359.868030] Killed process 5459 (rsyslogd) total-vm:232248kB, anon-rss:648kB, file-rss:4kB, shmem-rss:1096kB
(...snipped...)
[  364.311536] Out of memory: Kill process 679 (abrt-watch-log) score 0 or sacrifice child
[  364.317978] Killed process 679 (abrt-watch-log) total-vm:210292kB, anon-rss:1360kB, file-rss:4kB, shmem-rss:0kB
(...snipped...)
[  364.490986] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[  364.493818] [  478]     0   478    11138      740      25       3        0         -1000 systemd-udevd
[  364.496727] [  488]     0   488    13856      110      26       3        0         -1000 auditd
[  364.499669] [  592]    81   592     6135      119      18       3        0          -900 dbus-daemon
[  364.502571] [  668]     0   668     1094       23       8       3        0             0 rngd
[  364.505303] [  671]    70   671     7029       75      19       4        0             0 avahi-daemon
[  364.508067] [  675]     0   675    26372      246      54       3        0         -1000 sshd
[  364.510723] [  680]     0   680     6050       79      17       3        0             0 systemd-logind
[  364.513735] [  683]     0   683     4831       81      16       3        0             0 irqbalance
[  364.516457] [  715]    70   715     6997       59      18       4        0             0 avahi-daemon
[  364.519210] [  832]     0   832    65453      228      44       3        0             0 vmtoolsd
[  364.521919] [  909]     0   909    31558      155      20       3        0             0 crond
[  364.524817] [ 1041]     0  1041    23728      168      51       3        0             0 login
[  364.527471] [ 2019]     0  2019    22261      258      43       3        0             0 master
[  364.530118] [ 2034]     0  2034    27511       33      13       3        0             0 agetty
[  364.532933] [ 2100]    89  2100    22287      250      45       3        0             0 pickup
[  364.535522] [ 2101]    89  2101    22304      250      45       3        0             0 qmgr
[  364.538160] [ 3905]  1000  3905    28885      134      15       4        0             0 bash
[  364.540763] [ 3952]     0  3952    27511       32      10       3        0             0 agetty
[  364.543292] [ 5040]     0  5040     9207      364      20       3        0             0 systemd-journal
[  364.546028] [ 5484]  1000  5465  4195473   867043    1767      19        0             0 a.out
[  364.548515] Out of memory: Kill process 5040 (systemd-journal) score 0 or sacrifice child
[  364.551666] Killed process 5040 (systemd-journal) total-vm:36828kB, anon-rss:260kB, file-rss:0kB, shmem-rss:1196kB
----------

Usually OOM killable tasks up to number of available CPUs are killed by one OOM killer
invocation, but similar reproducer (CLONE_THREAD | CLONE_SIGHAND removed from above one)
had killed all OOM killable tasks and panic()ed (so far only once).
Complete log is at http://I-love.SAKURA.ne.jp/tmp/serial-20170801.txt.xz :
----------
[ 1301.010587] Mem-Info:
[ 1301.021217] active_anon:3744 inactive_anon:872936 isolated_anon:0
[ 1301.021217]  active_file:59 inactive_file:22 isolated_file:37
[ 1301.021217]  unevictable:0 dirty:16 writeback:1 unstable:0
[ 1301.021217]  slab_reclaimable:0 slab_unreclaimable:17
[ 1301.021217]  mapped:853542 shmem:874817 pagetables:2039 bounce:0
[ 1301.021217]  free:21371 free_pcp:93 free_cma:0
[ 1301.030807] Node 0 active_anon:14976kB inactive_anon:3491956kB active_file:104kB inactive_file:100kB unevictable:0kB isolated(anon):0kB isolated(file):92kB mapped:3414188kB dirty:60kB writeback:4kB shmem:3499480kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 1301.038255] Node 0 DMA free:14756kB min:288kB low:360kB high:432kB active_anon:0kB inactive_anon:1108kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1301.044649] lowmem_reserve[]: 0 2688 3624 3624
[ 1301.046306] Node 0 DMA32 free:53376kB min:49908kB low:62384kB high:74860kB active_anon:16kB inactive_anon:2682284kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2752884kB mlocked:0kB kernel_stack:0kB pagetables:5328kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1301.053203] lowmem_reserve[]: 0 0 936 936
[ 1301.054748] Node 0 Normal free:16756kB min:17384kB low:21728kB high:26072kB active_anon:14960kB inactive_anon:808292kB active_file:20kB inactive_file:16kB unevictable:0kB writepending:64kB present:1048576kB managed:958868kB mlocked:0kB kernel_stack:4800kB pagetables:2836kB bounce:0kB free_pcp:392kB local_pcp:288kB free_cma:0kB
[ 1301.062749] lowmem_reserve[]: 0 0 0 0
[ 1301.065274] Node 0 DMA: 2*4kB (UM) 1*8kB (U) 1*16kB (U) 2*32kB (UM) 1*64kB (U) 2*128kB (UM) 2*256kB (UM) 1*512kB (M) 1*1024kB (U) 0*2048kB 3*4096kB (ME) = 14752kB
[ 1301.070034] Node 0 DMA32: 9*4kB (UM) 7*8kB (UM) 7*16kB (UME) 6*32kB (UME) 4*64kB (UME) 1*128kB (M) 2*256kB (ME) 0*512kB 1*1024kB (U) 1*2048kB (E) 12*4096kB (UM) = 53516kB
[ 1301.074902] Node 0 Normal: 85*4kB (UMEH) 56*8kB (UMEH) 38*16kB (UMEH) 14*32kB (UMEH) 9*64kB (UEH) 12*128kB (UMEH) 26*256kB (UMEH) 10*512kB (UME) 1*1024kB (H) 0*2048kB 0*4096kB = 16756kB
[ 1301.080143] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[ 1301.082796] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1301.085387] 874919 total pagecache pages
[ 1301.086962] 0 pages in swap cache
[ 1301.088505] Swap cache stats: add 0, delete 0, find 0/0
[ 1301.090370] Free swap  = 0kB
[ 1301.091743] Total swap = 0kB
[ 1301.093140] 1048445 pages RAM
[ 1301.094677] 0 pages HighMem/MovableOnly
[ 1301.096600] 116531 pages reserved
[ 1301.098143] 0 pages hwpoisoned
[ 1301.099690] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[ 1301.103027] [  476]     0   476    10952      553      24       3        0         -1000 systemd-udevd
[ 1301.106142] [  486]     0   486    13856      110      26       3        0         -1000 auditd
[ 1301.108737] [  660]     0   660    26372      247      55       3        0         -1000 sshd
[ 1301.111466] [ 6535]     0  6535     9207      423      21       3        0             0 systemd-journal
[ 1301.114370] [ 6536]     0  6536     6050       78      16       3        0             0 systemd-logind
[ 1301.117136] [ 6537]     0  6537    27511       32      13       3        0             0 agetty
[ 1301.119727] [ 6538]    81  6538     6103       78      17       3        0          -900 dbus-daemon
[ 1301.122372] [ 6699]     0  6699    76537     1249      50       3        0             0 rsyslogd
[ 1301.125020] [ 6731]  1000  6731  4195473   848108    1790      19        0             0 a.out
[ 1301.127633] [ 6732]  1000  6732  4195473   848108    1790      19        0             0 a.out
[ 1301.130415] [ 6733]  1000  6733  4195473   848108    1790      19        0             0 a.out
(...snipped...)
[ 1301.386215] [ 6859]  1000  6859  4195473   848108    1790      19        0             0 a.out
[ 1301.388102] [ 6862]     0  6862    27511       32      10       3        0             0 agetty
[ 1301.390015] Out of memory: Kill process 6731 (a.out) score 910 or sacrifice child
[ 1301.392740] Killed process 6731 (a.out) total-vm:16781892kB, anon-rss:84kB, file-rss:0kB, shmem-rss:3392348kB
[ 1301.395605] oom_reaper: reaped process 6731 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:3392400kB
[ 1301.397946] a.out invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null),  order=0, oom_score_adj=0
[ 1301.400456] a.out cpuset=/ mems_allowed=0
[ 1301.401574] CPU: 2 PID: 6855 Comm: a.out Not tainted 4.13.0-rc2-next-20170728 #649
[ 1301.403392] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 1301.407439] Call Trace:
[ 1301.408321]  dump_stack+0x67/0x9e
[ 1301.409352]  dump_header+0x9d/0x3fa
[ 1301.410411]  ? trace_hardirqs_on+0xd/0x10
[ 1301.411589]  oom_kill_process+0x226/0x650
[ 1301.412776]  out_of_memory+0x136/0x560
[ 1301.413898]  ? out_of_memory+0x206/0x560
[ 1301.415062]  __alloc_pages_nodemask+0xdce/0xeb0
[ 1301.416373]  alloc_pages_vma+0x76/0x1a0
[ 1301.417531]  shmem_alloc_page+0x6e/0xa0
[ 1301.418700]  ? native_sched_clock+0x36/0xa0
[ 1301.419944]  shmem_alloc_and_acct_page+0x6d/0x1f0
[ 1301.421299]  shmem_getpage_gfp+0x1b6/0xde0
[ 1301.422536]  ? current_kernel_time64+0x80/0xa0
[ 1301.423844]  shmem_fault+0x91/0x1f0
[ 1301.424969]  ? __lock_acquire+0x506/0x1a90
[ 1301.426227]  __do_fault+0x19/0x120
[ 1301.427341]  __handle_mm_fault+0x873/0x1160
[ 1301.428639]  ? native_sched_clock+0x36/0xa0
[ 1301.429964]  handle_mm_fault+0x186/0x360
[ 1301.431223]  ? handle_mm_fault+0x44/0x360
[ 1301.432487]  __do_page_fault+0x1da/0x510
[ 1301.433730]  ? __lock_acquire+0x506/0x1a90
[ 1301.435032]  do_page_fault+0x21/0x70
[ 1301.436223]  page_fault+0x22/0x30
[ 1301.437366] RIP: 0010:__clear_user+0x3d/0x70
[ 1301.438696] RSP: 0000:ffff8801395ffd70 EFLAGS: 00010206
[ 1301.441909] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000000200
[ 1301.443823] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 00007f9bda247000
[ 1301.445863] RBP: ffff8801395ffd80 R08: 0000000000000001 R09: 0000000000000000
[ 1301.447812] R10: 0000000000000001 R11: 0000000000000001 R12: 00007f9bda247000
[ 1301.449757] R13: ffff8801395ffe30 R14: 0000000001992000 R15: 0000000000001000
[ 1301.451723]  clear_user+0x34/0x50
[ 1301.452935]  iov_iter_zero+0x88/0x380
[ 1301.454219]  read_iter_zero+0x38/0xb0
[ 1301.455511]  __vfs_read+0xe3/0x140
[ 1301.456752]  vfs_read+0x9c/0x150
[ 1301.457958]  SyS_read+0x53/0xc0
[ 1301.459151]  do_syscall_64+0x61/0x1d0
[ 1301.460451]  entry_SYSCALL64_slow_path+0x25/0x25
[ 1301.461940] RIP: 0033:0x7f9ec099dc30
[ 1301.463225] RSP: 002b:00007f9ec0e81fd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 1301.465286] RAX: ffffffffffffffda RBX: 00007f9bd88b5000 RCX: 00007f9ec099dc30
[ 1301.467242] RDX: 0000000008000000 RSI: 00007f9bd88b5000 RDI: 0000000000000061
[ 1301.469187] RBP: 0000000000000061 R08: ffffffffffffffff R09: 0000000000000000
[ 1301.471111] R10: 0000000000000021 R11: 0000000000000246 R12: 00000000004007d7
[ 1301.473044] R13: 00007ffeb6830c90 R14: 0000000000000000 R15: 0000000000000000
[ 1301.475012] Mem-Info:
[ 1302.770130] active_anon:3723 inactive_anon:873018 isolated_anon:0
[ 1302.770130]  active_file:3 inactive_file:0 isolated_file:0
[ 1302.770130]  unevictable:0 dirty:1 writeback:1 unstable:0
[ 1302.770130]  slab_reclaimable:0 slab_unreclaimable:2
[ 1302.770130]  mapped:853537 shmem:874899 pagetables:2050 bounce:0
[ 1302.770130]  free:21373 free_pcp:35 free_cma:0
[ 1302.787545] Node 0 active_anon:14892kB inactive_anon:3492072kB active_file:8kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:3414148kB dirty:4kB writeback:4kB shmem:3499596kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
[ 1302.794036] Node 0 DMA free:14752kB min:288kB low:360kB high:432kB active_anon:0kB inactive_anon:1116kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:4kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1302.803259] lowmem_reserve[]: 0 2688 3624 3624
[ 1302.804839] Node 0 DMA32 free:53516kB min:49908kB low:62384kB high:74860kB active_anon:16kB inactive_anon:2682496kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2752884kB mlocked:0kB kernel_stack:0kB pagetables:5352kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1302.817760] lowmem_reserve[]: 0 0 936 936
[ 1302.819371] Node 0 Normal free:17224kB min:17384kB low:21728kB high:26072kB active_anon:14876kB inactive_anon:808460kB active_file:108kB inactive_file:0kB unevictable:0kB writepending:8kB present:1048576kB managed:958868kB mlocked:0kB kernel_stack:4784kB pagetables:2844kB bounce:0kB free_pcp:140kB local_pcp:4kB free_cma:0kB
[ 1302.829597] lowmem_reserve[]: 0 0 0 0
[ 1302.831194] Node 0 DMA: 2*4kB (UM) 1*8kB (U) 1*16kB (U) 2*32kB (UM) 1*64kB (U) 2*128kB (UM) 2*256kB (UM) 1*512kB (M) 1*1024kB (U) 0*2048kB 3*4096kB (ME) = 14752kB
[ 1302.835771] Node 0 DMA32: 9*4kB (UM) 7*8kB (UM) 7*16kB (UME) 6*32kB (UME) 4*64kB (UME) 1*128kB (M) 2*256kB (ME) 0*512kB 1*1024kB (U) 1*2048kB (E) 12*4096kB (UM) = 53516kB
[ 1302.842805] Node 0 Normal: 138*4kB (UMEH) 62*8kB (UMEH) 49*16kB (UMEH) 15*32kB (UMEH) 9*64kB (UEH) 12*128kB (UMEH) 26*256kB (UMEH) 10*512kB (UME) 1*1024kB (H) 0*2048kB 0*4096kB = 17224kB
[ 1302.850292] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[ 1302.853053] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1302.860191] 874902 total pagecache pages
[ 1302.862020] 0 pages in swap cache
[ 1302.863688] Swap cache stats: add 0, delete 0, find 0/0
[ 1302.865828] Free swap  = 0kB
[ 1302.867400] Total swap = 0kB
[ 1302.868968] 1048445 pages RAM
[ 1303.699719] 0 pages HighMem/MovableOnly
[ 1303.701531] 116531 pages reserved
[ 1303.703194] 0 pages hwpoisoned
[ 1303.704793] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[ 1303.707509] [  476]     0   476    10952      553      24       3        0         -1000 systemd-udevd
[ 1303.710392] [  486]     0   486    13856      110      26       3        0         -1000 auditd
[ 1303.713153] [  660]     0   660    26372      247      55       3        0         -1000 sshd
[ 1303.715875] [ 6535]     0  6535     9207      423      21       3        0             0 systemd-journal
[ 1303.719739] [ 6536]     0  6536     6050       78      16       3        0             0 systemd-logind
[ 1303.722604] [ 6537]     0  6537    27511       32      13       3        0             0 agetty
[ 1303.725318] [ 6538]    81  6538     6103       78      17       3        0          -900 dbus-daemon
[ 1303.728122] [ 6699]     0  6699    76537     1231      50       3        0             0 rsyslogd
[ 1303.730883] [ 6732]  1000  6732  4195473   848100    1790      19        0             0 a.out
[ 1303.733591] [ 6733]  1000  6733  4195473   848100    1790      19        0             0 a.out
[ 1303.736272] [ 6734]  1000  6734  4195473   848100    1790      19        0             0 a.out
(...snipped...)
[ 1303.996897] [ 6859]  1000  6859  4195473   848100    1790      19        0             0 a.out
[ 1303.998810] [ 6862]     0  6862    27511       32      10       3        0             0 agetty
[ 1304.000763] Out of memory: Kill process 6699 (rsyslogd) score 1 or sacrifice child
...(noisy "page allocation stalls" lines snipped)...
[ 1378.224508] kworker/u16:2 invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=2, oom_score_adj=0
[ 1378.224513] kworker/u16:2 cpuset=/ mems_allowed=0
[ 1378.224520] CPU: 3 PID: 6523 Comm: kworker/u16:2 Not tainted 4.13.0-rc2-next-20170728 #649
[ 1378.224521] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 1378.224527] Workqueue: events_unbound call_usermodehelper_exec_work
[ 1378.224528] Call Trace:
[ 1378.224535]  dump_stack+0x67/0x9e
[ 1378.224539]  dump_header+0x9d/0x3fa
[ 1378.224545]  ? trace_hardirqs_on+0xd/0x10
[ 1378.224552]  oom_kill_process+0x226/0x650
[ 1378.224558]  out_of_memory+0x136/0x560
[ 1378.224560]  ? out_of_memory+0x206/0x560
[ 1378.224565]  __alloc_pages_nodemask+0xdce/0xeb0
[ 1378.224568]  ? copy_process.part.39+0x7eb/0x1e30
[ 1378.224585]  copy_process.part.39+0x13f/0x1e30
[ 1378.224587]  ? __lock_acquire+0x506/0x1a90
[ 1378.224590]  ? load_balance+0x1b0/0xaf0
[ 1378.224598]  ? umh_complete+0x30/0x30
[ 1378.224604]  _do_fork+0xea/0x5f0
[ 1378.224610]  ? native_sched_clock+0x36/0xa0
[ 1378.224619]  kernel_thread+0x24/0x30
[ 1378.224622]  call_usermodehelper_exec_work+0x35/0xc0
[ 1378.224625]  process_one_work+0x1d0/0x3e0
[ 1378.224626]  ? process_one_work+0x16a/0x3e0
[ 1378.224634]  worker_thread+0x48/0x3c0
[ 1378.224642]  kthread+0x10d/0x140
[ 1378.224643]  ? process_one_work+0x3e0/0x3e0
[ 1378.224646]  ? kthread_create_on_node+0x60/0x60
[ 1378.224651]  ret_from_fork+0x27/0x40
[ 1378.224662] Mem-Info:
[ 1378.224666] active_anon:3649 inactive_anon:873071 isolated_anon:0
[ 1378.224666]  active_file:34 inactive_file:373 isolated_file:0
[ 1378.224666]  unevictable:0 dirty:0 writeback:0 unstable:0
[ 1378.224666]  slab_reclaimable:0 slab_unreclaimable:0
[ 1378.224666]  mapped:852813 shmem:875026 pagetables:2050 bounce:0
[ 1378.224666]  free:21442 free_pcp:86 free_cma:0
[ 1378.224668] Node 0 active_anon:14596kB inactive_anon:3492284kB active_file:136kB inactive_file:1492kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:3411252kB dirty:0kB writeback:0kB shmem:3500104kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 1378.224669] Node 0 DMA free:14752kB min:288kB low:360kB high:432kB active_anon:0kB inactive_anon:1116kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:4kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1378.224672] lowmem_reserve[]: 0 2688 3624 3624
[ 1378.224677] Node 0 DMA32 free:53516kB min:49908kB low:62384kB high:74860kB active_anon:16kB inactive_anon:2682496kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2752884kB mlocked:0kB kernel_stack:0kB pagetables:5352kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1378.224680] lowmem_reserve[]: 0 0 936 936
[ 1378.224684] Node 0 Normal free:17500kB min:17384kB low:21728kB high:26072kB active_anon:14776kB inactive_anon:808668kB active_file:0kB inactive_file:1940kB unevictable:0kB writepending:0kB present:1048576kB managed:958868kB mlocked:0kB kernel_stack:2736kB pagetables:2844kB bounce:0kB free_pcp:344kB local_pcp:224kB free_cma:0kB
[ 1378.224687] lowmem_reserve[]: 0 0 0 0
[ 1378.224691] Node 0 DMA: 2*4kB (UM) 1*8kB (U) 1*16kB (U) 2*32kB (UM) 1*64kB (U) 2*128kB (UM) 2*256kB (UM) 1*512kB (M) 1*1024kB (U) 0*2048kB 3*4096kB (ME) = 14752kB
[ 1378.224756] Node 0 DMA32: 9*4kB (UM) 7*8kB (UM) 7*16kB (UME) 6*32kB (UME) 4*64kB (UME) 1*128kB (M) 2*256kB (ME) 0*512kB 1*1024kB (U) 1*2048kB (E) 12*4096kB (UM) = 53516kB
[ 1378.224783] Node 0 Normal: 107*4kB (UEH) 68*8kB (UEH) 50*16kB (UMEH) 61*32kB (UEH) 15*64kB (UEH) 12*128kB (UMEH) 19*256kB (UMEH) 10*512kB (UME) 1*1024kB (H) 0*2048kB 0*4096kB = 17228kB
[ 1378.224802] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[ 1378.224803] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1378.224804] 875607 total pagecache pages
[ 1378.224807] 0 pages in swap cache
[ 1378.224808] Swap cache stats: add 0, delete 0, find 0/0
[ 1378.224809] Free swap  = 0kB
[ 1378.224809] Total swap = 0kB
[ 1378.224811] 1048445 pages RAM
[ 1378.224812] 0 pages HighMem/MovableOnly
[ 1378.224813] 116531 pages reserved
[ 1378.224814] 0 pages hwpoisoned
[ 1378.224814] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[ 1378.224824] [  476]     0   476    10952      553      24       3        0         -1000 systemd-udevd
[ 1378.224827] [  486]     0   486    13856      110      26       3        0         -1000 auditd
[ 1378.224830] [  660]     0   660    26372      247      55       3        0         -1000 sshd
[ 1378.224833] [ 6535]     0  6535     9207      422      21       3        0             0 systemd-journal
[ 1378.224836] [ 6536]     0  6536     6050       78      16       3        0             0 systemd-logind
[ 1378.224839] [ 6537]     0  6537    27511       32      13       3        0             0 agetty
[ 1378.224841] [ 6538]    81  6538     6103       78      17       3        0          -900 dbus-daemon
[ 1378.224930] [ 6855]  1000  6855  4195473   852376    1790      19        0             0 a.out
[ 1378.224935] [ 6862]     0  6862    27511       32      10       3        0             0 agetty
[ 1378.224937] Out of memory: Kill process 6535 (systemd-journal) score 0 or sacrifice child
[ 1378.225006] Killed process 6535 (systemd-journal) total-vm:36828kB, anon-rss:260kB, file-rss:0kB, shmem-rss:1428kB
[ 1378.232809] kworker/u16:2 invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=2, oom_score_adj=0
[ 1378.232814] kworker/u16:2 cpuset=/ mems_allowed=0
[ 1378.232821] CPU: 3 PID: 6523 Comm: kworker/u16:2 Not tainted 4.13.0-rc2-next-20170728 #649
[ 1378.232822] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 1378.232827] Workqueue: events_unbound call_usermodehelper_exec_work
[ 1378.232829] Call Trace:
[ 1378.232835]  dump_stack+0x67/0x9e
[ 1378.232839]  dump_header+0x9d/0x3fa
[ 1378.232845]  ? trace_hardirqs_on+0xd/0x10
[ 1378.232852]  oom_kill_process+0x226/0x650
[ 1378.232859]  out_of_memory+0x136/0x560
[ 1378.232860]  ? out_of_memory+0x206/0x560
[ 1378.232866]  __alloc_pages_nodemask+0xdce/0xeb0
[ 1378.232869]  ? copy_process.part.39+0x7eb/0x1e30
[ 1378.232885]  copy_process.part.39+0x13f/0x1e30
[ 1378.232888]  ? __lock_acquire+0x506/0x1a90
[ 1378.232890]  ? load_balance+0x1b0/0xaf0
[ 1378.232898]  ? umh_complete+0x30/0x30
[ 1378.232904]  _do_fork+0xea/0x5f0
[ 1378.232910]  ? native_sched_clock+0x36/0xa0
[ 1378.232919]  kernel_thread+0x24/0x30
[ 1378.232922]  call_usermodehelper_exec_work+0x35/0xc0
[ 1378.232925]  process_one_work+0x1d0/0x3e0
[ 1378.232926]  ? process_one_work+0x16a/0x3e0
[ 1378.232934]  worker_thread+0x48/0x3c0
[ 1378.232943]  kthread+0x10d/0x140
[ 1378.232944]  ? process_one_work+0x3e0/0x3e0
[ 1378.232946]  ? kthread_create_on_node+0x60/0x60
[ 1378.232951]  ret_from_fork+0x27/0x40
[ 1378.232962] Mem-Info:
[ 1378.232965] active_anon:3723 inactive_anon:872923 isolated_anon:0
[ 1378.232965]  active_file:34 inactive_file:262 isolated_file:0
[ 1378.232965]  unevictable:0 dirty:0 writeback:0 unstable:0
[ 1378.232965]  slab_reclaimable:0 slab_unreclaimable:0
[ 1378.232965]  mapped:852480 shmem:875026 pagetables:2050 bounce:0
[ 1378.232965]  free:21442 free_pcp:442 free_cma:0
[ 1378.232968] Node 0 active_anon:14892kB inactive_anon:3491692kB active_file:136kB inactive_file:1048kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:3409920kB dirty:0kB writeback:0kB shmem:3500104kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 1378.232969] Node 0 DMA free:14752kB min:288kB low:360kB high:432kB active_anon:0kB inactive_anon:1116kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:4kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1378.232972] lowmem_reserve[]: 0 2688 3624 3624
[ 1378.232977] Node 0 DMA32 free:53516kB min:49908kB low:62384kB high:74860kB active_anon:16kB inactive_anon:2682496kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2752884kB mlocked:0kB kernel_stack:0kB pagetables:5352kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1378.232980] lowmem_reserve[]: 0 0 936 936
[ 1378.232984] Node 0 Normal free:17500kB min:17384kB low:21728kB high:26072kB active_anon:14776kB inactive_anon:808164kB active_file:0kB inactive_file:1436kB unevictable:0kB writepending:0kB present:1048576kB managed:958868kB mlocked:0kB kernel_stack:2736kB pagetables:2844kB bounce:0kB free_pcp:1768kB local_pcp:676kB free_cma:0kB
[ 1378.232987] lowmem_reserve[]: 0 0 0 0
[ 1378.232991] Node 0 DMA: 2*4kB (UM) 1*8kB (U) 1*16kB (U) 2*32kB (UM) 1*64kB (U) 2*128kB (UM) 2*256kB (UM) 1*512kB (M) 1*1024kB (U) 0*2048kB 3*4096kB (ME) = 14752kB
[ 1378.233010] Node 0 DMA32: 9*4kB (UM) 7*8kB (UM) 7*16kB (UME) 6*32kB (UME) 4*64kB (UME) 1*128kB (M) 2*256kB (ME) 0*512kB 1*1024kB (U) 1*2048kB (E) 12*4096kB (UM) = 53516kB
[ 1378.233028] Node 0 Normal: 76*4kB (UEH) 68*8kB (UEH) 53*16kB (UMEH) 61*32kB (UEH) 15*64kB (UEH) 12*128kB (UMEH) 19*256kB (UMEH) 10*512kB (UME) 1*1024kB (H) 0*2048kB 0*4096kB = 17152kB
[ 1378.233047] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[ 1378.233048] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1378.233049] 875385 total pagecache pages
[ 1378.233052] 0 pages in swap cache
[ 1378.233053] Swap cache stats: add 0, delete 0, find 0/0
[ 1378.233054] Free swap  = 0kB
[ 1378.233055] Total swap = 0kB
[ 1378.233056] 1048445 pages RAM
[ 1378.233057] 0 pages HighMem/MovableOnly
[ 1378.233058] 116531 pages reserved
[ 1378.233058] 0 pages hwpoisoned
[ 1378.233059] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[ 1378.233068] [  476]     0   476    10952      553      24       3        0         -1000 systemd-udevd
[ 1378.233072] [  486]     0   486    13856      110      26       3        0         -1000 auditd
[ 1378.233075] [  660]     0   660    26372      247      55       3        0         -1000 sshd
[ 1378.233079] [ 6536]     0  6536     6050       78      16       3        0             0 systemd-logind
[ 1378.233082] [ 6537]     0  6537    27511       32      13       3        0             0 agetty
[ 1378.233085] [ 6538]    81  6538     6103       78      17       3        0          -900 dbus-daemon
[ 1378.233172] [ 6855]  1000  6855  4195473   852376    1790      19        0             0 a.out
[ 1378.233177] [ 6862]     0  6862    27511       32      10       3        0             0 agetty
[ 1378.233179] Out of memory: Kill process 6536 (systemd-logind) score 0 or sacrifice child
[ 1378.233276] Killed process 6536 (systemd-logind) total-vm:24200kB, anon-rss:308kB, file-rss:4kB, shmem-rss:0kB
[ 1378.236782] kworker/u16:1 invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=2, oom_score_adj=0
[ 1378.236786] kworker/u16:1 cpuset=/ mems_allowed=0
[ 1378.236792] CPU: 0 PID: 6520 Comm: kworker/u16:1 Not tainted 4.13.0-rc2-next-20170728 #649
[ 1378.236793] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 1378.236797] Workqueue: events_unbound call_usermodehelper_exec_work
[ 1378.236798] Call Trace:
[ 1378.236803]  dump_stack+0x67/0x9e
[ 1378.236807]  dump_header+0x9d/0x3fa
[ 1378.236812]  ? trace_hardirqs_on+0xd/0x10
[ 1378.236819]  oom_kill_process+0x226/0x650
[ 1378.236825]  out_of_memory+0x136/0x560
[ 1378.236827]  ? out_of_memory+0x206/0x560
[ 1378.236832]  __alloc_pages_nodemask+0xdce/0xeb0
[ 1378.236834]  ? copy_process.part.39+0x7eb/0x1e30
[ 1378.236851]  copy_process.part.39+0x13f/0x1e30
[ 1378.236853]  ? __lock_acquire+0x506/0x1a90
[ 1378.236861]  ? umh_complete+0x30/0x30
[ 1378.236868]  _do_fork+0xea/0x5f0
[ 1378.236873]  ? native_sched_clock+0x36/0xa0
[ 1378.236882]  kernel_thread+0x24/0x30
[ 1378.236885]  call_usermodehelper_exec_work+0x35/0xc0
[ 1378.236888]  process_one_work+0x1d0/0x3e0
[ 1378.236889]  ? process_one_work+0x16a/0x3e0
[ 1378.236897]  worker_thread+0x48/0x3c0
[ 1378.236905]  kthread+0x10d/0x140
[ 1378.236906]  ? process_one_work+0x3e0/0x3e0
[ 1378.236909]  ? kthread_create_on_node+0x60/0x60
[ 1378.236913]  ret_from_fork+0x27/0x40
[ 1378.236924] Mem-Info:
[ 1378.236927] active_anon:3649 inactive_anon:872923 isolated_anon:0
[ 1378.236927]  active_file:34 inactive_file:55 isolated_file:0
[ 1378.236927]  unevictable:0 dirty:0 writeback:0 unstable:0
[ 1378.236927]  slab_reclaimable:0 slab_unreclaimable:0
[ 1378.236927]  mapped:852480 shmem:875026 pagetables:2050 bounce:0
[ 1378.236927]  free:21694 free_pcp:471 free_cma:0
[ 1378.236929] Node 0 active_anon:14596kB inactive_anon:3491692kB active_file:136kB inactive_file:220kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:3409920kB dirty:0kB writeback:0kB shmem:3500104kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 1378.236930] Node 0 DMA free:14752kB min:288kB low:360kB high:432kB active_anon:0kB inactive_anon:1116kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:4kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1378.236933] lowmem_reserve[]: 0 2688 3624 3624
[ 1378.236938] Node 0 DMA32 free:53516kB min:49908kB low:62384kB high:74860kB active_anon:16kB inactive_anon:2682496kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2752884kB mlocked:0kB kernel_stack:0kB pagetables:5352kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1378.236941] lowmem_reserve[]: 0 0 936 936
[ 1378.236945] Node 0 Normal free:18508kB min:17384kB low:21728kB high:26072kB active_anon:14776kB inactive_anon:808164kB active_file:0kB inactive_file:384kB unevictable:0kB writepending:0kB present:1048576kB managed:958868kB mlocked:0kB kernel_stack:2736kB pagetables:2844kB bounce:0kB free_pcp:1884kB local_pcp:716kB free_cma:0kB
[ 1378.236949] lowmem_reserve[]: 0 0 0 0
[ 1378.236953] Node 0 DMA: 2*4kB (UM) 1*8kB (U) 1*16kB (U) 2*32kB (UM) 1*64kB (U) 2*128kB (UM) 2*256kB (UM) 1*512kB (M) 1*1024kB (U) 0*2048kB 3*4096kB (ME) = 14752kB
[ 1378.236972] Node 0 DMA32: 9*4kB (UM) 7*8kB (UM) 7*16kB (UME) 6*32kB (UME) 4*64kB (UME) 1*128kB (M) 2*256kB (ME) 0*512kB 1*1024kB (U) 1*2048kB (E) 12*4096kB (UM) = 53516kB
[ 1378.236990] Node 0 Normal: 216*4kB (UMEH) 89*8kB (UMEH) 71*16kB (UMEH) 66*32kB (UMEH) 16*64kB (UMEH) 12*128kB (UMEH) 19*256kB (UMEH) 10*512kB (UME) 1*1024kB (H) 0*2048kB 0*4096kB = 18392kB
[ 1378.237008] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[ 1378.237009] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1378.237010] 875163 total pagecache pages
[ 1378.237013] 0 pages in swap cache
[ 1378.237014] Swap cache stats: add 0, delete 0, find 0/0
[ 1378.237015] Free swap  = 0kB
[ 1378.237016] Total swap = 0kB
[ 1378.237017] 1048445 pages RAM
[ 1378.237018] 0 pages HighMem/MovableOnly
[ 1378.237018] 116531 pages reserved
[ 1378.237019] 0 pages hwpoisoned
[ 1378.237020] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[ 1378.237029] [  476]     0   476    10952      553      24       3        0         -1000 systemd-udevd
[ 1378.237031] [  486]     0   486    13856      110      26       3        0         -1000 auditd
[ 1378.237034] [  660]     0   660    26372      247      55       3        0         -1000 sshd
[ 1378.237039] [ 6537]     0  6537    27511       32      13       3        0             0 agetty
[ 1378.237041] [ 6538]    81  6538     6103       78      17       3        0          -900 dbus-daemon
[ 1378.237127] [ 6855]  1000  6855  4195473   852376    1790      19        0             0 a.out
[ 1378.237132] [ 6862]     0  6862    27511       32      10       3        0             0 agetty
[ 1378.237134] Out of memory: Kill process 6537 (agetty) score 0 or sacrifice child
[ 1378.237146] Killed process 6537 (agetty) total-vm:110044kB, anon-rss:124kB, file-rss:4kB, shmem-rss:0kB
[ 1378.247858] kworker/u16:2 invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=2, oom_score_adj=0
[ 1378.247864] kworker/u16:2 cpuset=/ mems_allowed=0
[ 1378.247871] CPU: 3 PID: 6523 Comm: kworker/u16:2 Not tainted 4.13.0-rc2-next-20170728 #649
[ 1378.247872] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 1378.247879] Workqueue: events_unbound call_usermodehelper_exec_work
[ 1378.247881] Call Trace:
[ 1378.247888]  dump_stack+0x67/0x9e
[ 1378.247913]  dump_header+0x9d/0x3fa
[ 1378.247924]  ? trace_hardirqs_on+0xd/0x10
[ 1378.247931]  oom_kill_process+0x226/0x650
[ 1378.247938]  out_of_memory+0x136/0x560
[ 1378.247940]  ? out_of_memory+0x206/0x560
[ 1378.247945]  __alloc_pages_nodemask+0xdce/0xeb0
[ 1378.247948]  ? copy_process.part.39+0x7eb/0x1e30
[ 1378.247966]  copy_process.part.39+0x13f/0x1e30
[ 1378.247968]  ? __lock_acquire+0x506/0x1a90
[ 1378.247971]  ? load_balance+0x1b0/0xaf0
[ 1378.248035]  ? umh_complete+0x30/0x30
[ 1378.248054]  _do_fork+0xea/0x5f0
[ 1378.248060]  ? native_sched_clock+0x36/0xa0
[ 1378.248187]  kernel_thread+0x24/0x30
[ 1378.248191]  call_usermodehelper_exec_work+0x35/0xc0
[ 1378.248194]  process_one_work+0x1d0/0x3e0
[ 1378.248225]  ? process_one_work+0x16a/0x3e0
[ 1378.248234]  worker_thread+0x48/0x3c0
[ 1378.248265]  kthread+0x10d/0x140
[ 1378.248267]  ? process_one_work+0x3e0/0x3e0
[ 1378.248269]  ? kthread_create_on_node+0x60/0x60
[ 1378.248274]  ret_from_fork+0x27/0x40
[ 1378.248301] Mem-Info:
[ 1378.248304] active_anon:3649 inactive_anon:872923 isolated_anon:0
[ 1378.248304]  active_file:34 inactive_file:18 isolated_file:0
[ 1378.248304]  unevictable:0 dirty:0 writeback:0 unstable:0
[ 1378.248304]  slab_reclaimable:0 slab_unreclaimable:0
[ 1378.248304]  mapped:852443 shmem:875026 pagetables:2050 bounce:0
[ 1378.248304]  free:21820 free_pcp:444 free_cma:0
[ 1378.248307] Node 0 active_anon:14596kB inactive_anon:3491692kB active_file:136kB inactive_file:72kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:3409772kB dirty:0kB writeback:0kB shmem:3500104kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 1378.248308] Node 0 DMA free:14752kB min:288kB low:360kB high:432kB active_anon:0kB inactive_anon:1116kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:4kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1378.248312] lowmem_reserve[]: 0 2688 3624 3624
[ 1378.248316] Node 0 DMA32 free:53516kB min:49908kB low:62384kB high:74860kB active_anon:16kB inactive_anon:2682496kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2752884kB mlocked:0kB kernel_stack:0kB pagetables:5352kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1378.248319] lowmem_reserve[]: 0 0 936 936
[ 1378.248324] Node 0 Normal free:19012kB min:17384kB low:21728kB high:26072kB active_anon:14776kB inactive_anon:808164kB active_file:0kB inactive_file:384kB unevictable:0kB writepending:0kB present:1048576kB managed:958868kB mlocked:0kB kernel_stack:2736kB pagetables:2844kB bounce:0kB free_pcp:1776kB local_pcp:692kB free_cma:0kB
[ 1378.248326] lowmem_reserve[]: 0 0 0 0
[ 1378.248331] Node 0 DMA: 2*4kB (UM) 1*8kB (U) 1*16kB (U) 2*32kB (UM) 1*64kB (U) 2*128kB (UM) 2*256kB (UM) 1*512kB (M) 1*1024kB (U) 0*2048kB 3*4096kB (ME) = 14752kB
[ 1378.248350] Node 0 DMA32: 9*4kB (UM) 7*8kB (UM) 7*16kB (UME) 6*32kB (UME) 4*64kB (UME) 1*128kB (M) 2*256kB (ME) 0*512kB 1*1024kB (U) 1*2048kB (E) 12*4096kB (UM) = 53516kB
[ 1378.248368] Node 0 Normal: 285*4kB (UMEH) 92*8kB (UMEH) 77*16kB (UMEH) 69*32kB (UMEH) 18*64kB (UMEH) 12*128kB (UMEH) 19*256kB (UMEH) 10*512kB (UME) 1*1024kB (H) 0*2048kB 0*4096kB = 19012kB
[ 1378.248387] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[ 1378.248388] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1378.248389] 875052 total pagecache pages
[ 1378.248392] 0 pages in swap cache
[ 1378.248393] Swap cache stats: add 0, delete 0, find 0/0
[ 1378.248394] Free swap  = 0kB
[ 1378.248395] Total swap = 0kB
[ 1378.248396] 1048445 pages RAM
[ 1378.248397] 0 pages HighMem/MovableOnly
[ 1378.248398] 116531 pages reserved
[ 1378.248399] 0 pages hwpoisoned
[ 1378.248400] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[ 1378.248412] [  476]     0   476    10952      553      24       3        0         -1000 systemd-udevd
[ 1378.248416] [  486]     0   486    13856      110      26       3        0         -1000 auditd
[ 1378.248419] [  660]     0   660    26372      247      55       3        0         -1000 sshd
[ 1378.248426] [ 6538]    81  6538     6103       78      17       3        0          -900 dbus-daemon
[ 1378.248527] [ 6855]  1000  6855  4195473   852376    1790      19        0             0 a.out
[ 1378.248533] [ 6862]     0  6862    27511       32      10       3        0             0 agetty
[ 1378.248535] Out of memory: Kill process 6862 (agetty) score 0 or sacrifice child
[ 1378.248566] Killed process 6862 (agetty) total-vm:110044kB, anon-rss:124kB, file-rss:4kB, shmem-rss:0kB
[ 1378.252379] kworker/u16:26 invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=2, oom_score_adj=0
[ 1378.252384] kworker/u16:26 cpuset=/ mems_allowed=0
[ 1378.252390] CPU: 0 PID: 353 Comm: kworker/u16:26 Not tainted 4.13.0-rc2-next-20170728 #649
[ 1378.252391] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 1378.252396] Workqueue: events_unbound call_usermodehelper_exec_work
[ 1378.252397] Call Trace:
[ 1378.252402]  dump_stack+0x67/0x9e
[ 1378.252407]  dump_header+0x9d/0x3fa
[ 1378.252413]  ? trace_hardirqs_on+0xd/0x10
[ 1378.252420]  oom_kill_process+0x226/0x650
[ 1378.252426]  out_of_memory+0x136/0x560
[ 1378.252428]  ? out_of_memory+0x206/0x560
[ 1378.252433]  __alloc_pages_nodemask+0xdce/0xeb0
[ 1378.252436]  ? copy_process.part.39+0x7eb/0x1e30
[ 1378.252453]  copy_process.part.39+0x13f/0x1e30
[ 1378.252455]  ? __lock_acquire+0x506/0x1a90
[ 1378.252458]  ? load_balance+0x1b0/0xaf0
[ 1378.252465]  ? umh_complete+0x30/0x30
[ 1378.252472]  _do_fork+0xea/0x5f0
[ 1378.252477]  ? native_sched_clock+0x36/0xa0
[ 1378.252487]  kernel_thread+0x24/0x30
[ 1378.252490]  call_usermodehelper_exec_work+0x35/0xc0
[ 1378.252493]  process_one_work+0x1d0/0x3e0
[ 1378.252494]  ? process_one_work+0x16a/0x3e0
[ 1378.252502]  worker_thread+0x48/0x3c0
[ 1378.252510]  kthread+0x10d/0x140
[ 1378.252511]  ? process_one_work+0x3e0/0x3e0
[ 1378.252514]  ? kthread_create_on_node+0x60/0x60
[ 1378.252519]  ret_from_fork+0x27/0x40
[ 1378.252530] Mem-Info:
[ 1378.252533] active_anon:3649 inactive_anon:872923 isolated_anon:0
[ 1378.252533]  active_file:34 inactive_file:0 isolated_file:0
[ 1378.252533]  unevictable:0 dirty:0 writeback:0 unstable:0
[ 1378.252533]  slab_reclaimable:0 slab_unreclaimable:0
[ 1378.252533]  mapped:852443 shmem:875026 pagetables:2050 bounce:0
[ 1378.252533]  free:21820 free_pcp:501 free_cma:0
[ 1378.252535] Node 0 active_anon:14596kB inactive_anon:3491692kB active_file:136kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:3409772kB dirty:0kB writeback:0kB shmem:3500104kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 1378.252536] Node 0 DMA free:14752kB min:288kB low:360kB high:432kB active_anon:0kB inactive_anon:1116kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:4kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1378.252539] lowmem_reserve[]: 0 2688 3624 3624
[ 1378.252544] Node 0 DMA32 free:53516kB min:49908kB low:62384kB high:74860kB active_anon:16kB inactive_anon:2682496kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2752884kB mlocked:0kB kernel_stack:0kB pagetables:5352kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1378.252547] lowmem_reserve[]: 0 0 936 936
[ 1378.252551] Node 0 Normal free:19012kB min:17384kB low:21728kB high:26072kB active_anon:14776kB inactive_anon:808164kB active_file:0kB inactive_file:384kB unevictable:0kB writepending:0kB present:1048576kB managed:958868kB mlocked:0kB kernel_stack:2736kB pagetables:2844kB bounce:0kB free_pcp:2004kB local_pcp:668kB free_cma:0kB
[ 1378.252554] lowmem_reserve[]: 0 0 0 0
[ 1378.252559] Node 0 DMA: 2*4kB (UM) 1*8kB (U) 1*16kB (U) 2*32kB (UM) 1*64kB (U) 2*128kB (UM) 2*256kB (UM) 1*512kB (M) 1*1024kB (U) 0*2048kB 3*4096kB (ME) = 14752kB
[ 1378.252578] Node 0 DMA32: 9*4kB (UM) 7*8kB (UM) 7*16kB (UME) 6*32kB (UME) 4*64kB (UME) 1*128kB (M) 2*256kB (ME) 0*512kB 1*1024kB (U) 1*2048kB (E) 12*4096kB (UM) = 53516kB
[ 1378.252596] Node 0 Normal: 271*4kB (UMEH) 97*8kB (UMEH) 79*16kB (UMEH) 69*32kB (UMEH) 18*64kB (UMEH) 12*128kB (UMEH) 19*256kB (UMEH) 10*512kB (UME) 1*1024kB (H) 0*2048kB 0*4096kB = 19028kB
[ 1378.252614] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[ 1378.252616] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1378.252617] 875015 total pagecache pages
[ 1378.252620] 0 pages in swap cache
[ 1378.252621] Swap cache stats: add 0, delete 0, find 0/0
[ 1378.252622] Free swap  = 0kB
[ 1378.252622] Total swap = 0kB
[ 1378.252624] 1048445 pages RAM
[ 1378.252625] 0 pages HighMem/MovableOnly
[ 1378.252626] 116531 pages reserved
[ 1378.252627] 0 pages hwpoisoned
[ 1378.252628] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[ 1378.252634] [  476]     0   476    10952      553      24       3        0         -1000 systemd-udevd
[ 1378.252638] [  486]     0   486    13856      110      26       3        0         -1000 auditd
[ 1378.252641] [  660]     0   660    26372      247      55       3        0         -1000 sshd
[ 1378.252646] [ 6538]    81  6538     6103       78      17       3        0          -900 dbus-daemon
[ 1378.252859] [ 6855]  1000  6855  4195473   852376    1790      19        0             0 a.out
[ 1378.252866] Out of memory: Kill process 6538 (dbus-daemon) score 0 or sacrifice child
[ 1378.252879] Killed process 6538 (dbus-daemon) total-vm:24412kB, anon-rss:312kB, file-rss:0kB, shmem-rss:0kB
[ 1379.068645] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1379.068647] 875105 total pagecache pages
[ 1379.068650] 0 pages in swap cache
[ 1379.068651] Swap cache stats: add 0, delete 0, find 0/0
[ 1379.068652] Free swap  = 0kB
[ 1379.068653] Total swap = 0kB
[ 1379.068655] 1048445 pages RAM
[ 1379.068656] 0 pages HighMem/MovableOnly
[ 1379.068656] 116531 pages reserved
[ 1379.068657] 0 pages hwpoisoned
[ 1379.101641] kworker/u16:29 invoked oom-killer: gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), nodemask=(null),  order=2, oom_score_adj=0
[ 1379.105795] kworker/u16:29 cpuset=/ mems_allowed=0
[ 1379.107976] CPU: 0 PID: 356 Comm: kworker/u16:29 Not tainted 4.13.0-rc2-next-20170728 #649
[ 1379.110761] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 1379.114084] Workqueue: events_unbound call_usermodehelper_exec_work
[ 1379.116440] Call Trace:
[ 1379.117938]  dump_stack+0x67/0x9e
[ 1379.119603]  dump_header+0x9d/0x3fa
[ 1379.121301]  out_of_memory+0x470/0x560
[ 1379.122979]  ? out_of_memory+0x206/0x560
[ 1379.124689]  __alloc_pages_nodemask+0xdce/0xeb0
[ 1379.126519]  ? copy_process.part.39+0x7eb/0x1e30
[ 1379.128346]  copy_process.part.39+0x13f/0x1e30
[ 1379.130084]  ? __lock_acquire+0x506/0x1a90
[ 1379.131719]  ? umh_complete+0x30/0x30
[ 1379.133265]  _do_fork+0xea/0x5f0
[ 1379.134689]  ? native_sched_clock+0x36/0xa0
[ 1379.136239]  kernel_thread+0x24/0x30
[ 1379.137628]  call_usermodehelper_exec_work+0x35/0xc0
[ 1379.139364]  process_one_work+0x1d0/0x3e0
[ 1379.140857]  ? process_one_work+0x16a/0x3e0
[ 1379.142361]  worker_thread+0x48/0x3c0
[ 1379.143813]  kthread+0x10d/0x140
[ 1379.145138]  ? process_one_work+0x3e0/0x3e0
[ 1379.146657]  ? kthread_create_on_node+0x60/0x60
[ 1379.148252]  ret_from_fork+0x27/0x40
[ 1379.149659] Mem-Info:
[ 1379.151818] active_anon:3493 inactive_anon:872905 isolated_anon:0
[ 1379.151818]  active_file:44 inactive_file:82 isolated_file:0
[ 1379.151818]  unevictable:0 dirty:0 writeback:0 unstable:0
[ 1379.151818]  slab_reclaimable:0 slab_unreclaimable:0
[ 1379.151818]  mapped:532931 shmem:875026 pagetables:1923 bounce:0
[ 1379.151818]  free:21712 free_pcp:565 free_cma:0
[ 1379.163532] Node 0 active_anon:13972kB inactive_anon:3491620kB active_file:176kB inactive_file:204kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:1943792kB dirty:0kB writeback:0kB shmem:3500104kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 1379.171228] Node 0 DMA free:14752kB min:288kB low:360kB high:432kB active_anon:0kB inactive_anon:1116kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:4kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1379.178826] lowmem_reserve[]: 0 2688 3624 3624
[ 1379.180764] Node 0 DMA32 free:53516kB min:49908kB low:62384kB high:74860kB active_anon:16kB inactive_anon:2682496kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2752884kB mlocked:0kB kernel_stack:0kB pagetables:5352kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1379.188610] lowmem_reserve[]: 0 0 936 936
[ 1379.191335] Node 0 Normal free:19124kB min:17384kB low:21728kB high:26072kB active_anon:13956kB inactive_anon:808008kB active_file:100kB inactive_file:516kB unevictable:0kB writepending:0kB present:1048576kB managed:958868kB mlocked:0kB kernel_stack:2640kB pagetables:2336kB bounce:0kB free_pcp:2156kB local_pcp:540kB free_cma:0kB
[ 1379.200442] lowmem_reserve[]: 0 0 0 0
[ 1379.202105] Node 0 DMA: 2*4kB (UM) 1*8kB (U) 1*16kB (U) 2*32kB (UM) 1*64kB (U) 2*128kB (UM) 2*256kB (UM) 1*512kB (M) 1*1024kB (U) 0*2048kB 3*4096kB (ME) = 14752kB
[ 1379.207076] Node 0 DMA32: 9*4kB (UM) 7*8kB (UM) 7*16kB (UME) 6*32kB (UME) 4*64kB (UME) 1*128kB (M) 2*256kB (ME) 0*512kB 1*1024kB (U) 1*2048kB (E) 12*4096kB (UM) = 53516kB
[ 1379.212387] Node 0 Normal: 72*4kB (UMEH) 108*8kB (UMEH) 97*16kB (UMEH) 83*32kB (UMEH) 21*64kB (UMEH) 12*128kB (UMEH) 19*256kB (UMEH) 10*512kB (UME) 1*1024kB (H) 0*2048kB 0*4096kB = 19248kB
[ 1379.217936] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[ 1379.220795] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1379.223635] 875203 total pagecache pages
[ 1379.225549] 0 pages in swap cache
[ 1379.228278] Swap cache stats: add 0, delete 0, find 0/0
[ 1379.230716] Free swap  = 0kB
[ 1379.232374] Total swap = 0kB
[ 1379.233975] 1048445 pages RAM
[ 1379.235603] 0 pages HighMem/MovableOnly
[ 1379.237382] 116531 pages reserved
[ 1379.239170] 0 pages hwpoisoned
[ 1379.240891] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[ 1379.243705] [  476]     0   476    10952      553      24       3        0         -1000 systemd-udevd
[ 1379.246883] [  486]     0   486    13856      110      26       3        0         -1000 auditd
[ 1379.249890] [  660]     0   660    26372      247      55       3        0         -1000 sshd
[ 1379.253002] Kernel panic - not syncing: Out of memory and no killable processes...
[ 1379.253002] 
[ 1379.256943] CPU: 0 PID: 356 Comm: kworker/u16:29 Not tainted 4.13.0-rc2-next-20170728 #649
[ 1379.259739] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 1379.263081] Workqueue: events_unbound call_usermodehelper_exec_work
[ 1379.265496] Call Trace:
[ 1379.267085]  dump_stack+0x67/0x9e
[ 1379.268897]  panic+0xe5/0x23f
[ 1379.270588]  out_of_memory+0x47e/0x560
[ 1379.272475]  ? out_of_memory+0x206/0x560
[ 1379.274397]  __alloc_pages_nodemask+0xdce/0xeb0
[ 1379.276422]  ? copy_process.part.39+0x7eb/0x1e30
[ 1379.278475]  copy_process.part.39+0x13f/0x1e30
[ 1379.280491]  ? __lock_acquire+0x506/0x1a90
[ 1379.282561]  ? umh_complete+0x30/0x30
[ 1379.284514]  _do_fork+0xea/0x5f0
[ 1379.286341]  ? native_sched_clock+0x36/0xa0
[ 1379.288370]  kernel_thread+0x24/0x30
----------

> 
> > If a multi-threaded process which consumes little memory was
> > selected as an OOM victim (and reaped by the OOM reaper and MMF_OOM_SKIP
> > was set immediately), it might be still possible to select next OOM victims
> > needlessly.
> 
> This would be true if the address space itself only contained a little
> amount of memory and the large part of the memory was in page tables or
> other resources which oom_reaper cannot work with. This is not a usual
> case though.

mlock()ing whole memory needs CAP_IPC_LOCK, but consuming whole memory as
MAP_SHARED does not need CAP_IPC_LOCK. And I think we can relax MMF_OOM_SKIP
test in task_will_free_mem() to ignore MMF_OOM_SKIP for once, for "mm, oom:
do not grant oom victims full memory reserves access" might be too large change
for older kernels which next version of LTS distributions would choose.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Possible race condition in oom-killer
  2017-08-01 10:46               ` Tetsuo Handa
@ 2017-08-01 11:30                 ` Michal Hocko
  0 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2017-08-01 11:30 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: mjaggi, linux-kernel, linux-mm

On Tue 01-08-17 19:46:38, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > > >                                                                   Is
> > > > something other than the LTP test affected to give this more priority?
> > > > Do we have other usecases where something mlocks the whole memory?
> > > 
> > > This panic was caused by 50 threads sharing MMF_OOM_SKIP mm exceeding
> > > number of OOM killable processes. Whether memory is locked or not isn't
> > > important.
> > 
> > You are wrong here I believe. The whole problem is that the OOM victim
> > is consuming basically all the memory (that is what the test case
> > actually does IIRC) and that memory is mlocked. oom_reaper is much
> > faster to evaluate the mm of the victim and bail out sooner than the
> > exit path actually manages to tear down the address space. And so we
> > have to find other oom victims until we simply kill everything and
> > panic.
> 
> Again, whether memory is locked or not isn't important. I can easily
> reproduce unnecessary OOM victim selection as a local unprivileged user
> using below program.
> 
> ----------
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <unistd.h>
> #include <sched.h>
> #include <sys/mman.h>
> 
> #define NUMTHREADS 128
> #define MMAPSIZE 128 * 1048576
> #define STACKSIZE 4096
> static int pipe_fd[2] = { EOF, EOF };
> static int memory_eater(void *unused)
> {
> 	int fd = open("/dev/zero", O_RDONLY);
> 	char *buf = mmap(NULL, MMAPSIZE, PROT_WRITE | PROT_READ,
> 			 MAP_ANONYMOUS | MAP_SHARED, EOF, 0);
> 	read(pipe_fd[0], buf, 1);
> 	read(fd, buf, MMAPSIZE);
> 	pause();
> 	return 0;
> }
> int main(int argc, char *argv[])
> {
> 	int i;
> 	char *stack;
> 	if (fork() || fork() || setsid() == EOF || pipe(pipe_fd))
> 		_exit(0);
> 	stack = mmap(NULL, STACKSIZE * NUMTHREADS, PROT_WRITE | PROT_READ,
> 		     MAP_ANONYMOUS | MAP_SHARED, EOF, 0);
> 	for (i = 0; i < NUMTHREADS; i++)
>                 if (clone(memory_eater, stack + (i + 1) * STACKSIZE,
> 			  CLONE_THREAD | CLONE_SIGHAND | CLONE_VM | CLONE_FS |
> 			  CLONE_FILES, NULL) == -1)
>                         break;
> 	sleep(1);
> 	close(pipe_fd[1]);
> 	pause();
> 	return 0;
> }

This is a clear DoS. There is sadly^Wsimply no implicit limit for the
amount of shared anonymous memory. This is very close to consuming shmem
via fs interface except the fs interface has an upper bound for the size.
I do not thing this is anything new. If you are creative enough you can
DoS the system the same way regardless of the oom reaper by passing
shmem fds around AFAICS...

[...]
> > > If a multi-threaded process which consumes little memory was
> > > selected as an OOM victim (and reaped by the OOM reaper and MMF_OOM_SKIP
> > > was set immediately), it might be still possible to select next OOM victims
> > > needlessly.
> > 
> > This would be true if the address space itself only contained a little
> > amount of memory and the large part of the memory was in page tables or
> > other resources which oom_reaper cannot work with. This is not a usual
> > case though.
> 
> mlock()ing whole memory needs CAP_IPC_LOCK, but consuming whole memory as
> MAP_SHARED does not need CAP_IPC_LOCK. And I think we can relax MMF_OOM_SKIP
> test in task_will_free_mem() to ignore MMF_OOM_SKIP for once

As I've said it is not that simple. I will comment on your other email.

> for "mm, oom: do not grant oom victims full memory reserves access"
> might be too large change for older kernels which next version of LTS
> distributions would choose.

While this is annoying I do not think this is something new. If you have
an untrusted user on the system you better contain it (you can use memcg
for example).

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Possible race condition in oom-killer
  2017-07-29  4:31               ` Tetsuo Handa
@ 2017-08-01 12:14                 ` Michal Hocko
  2017-08-01 14:16                   ` Tetsuo Handa
  0 siblings, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2017-08-01 12:14 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: mjaggi, linux-kernel, linux-mm

On Sat 29-07-17 13:31:44, Tetsuo Handa wrote:
[...]
> @@ -806,6 +799,20 @@ static bool task_will_free_mem(struct task_struct *task)
>  	}
>  	rcu_read_unlock();
>  
> +	/*
> +	 * It is possible that current thread fails to try allocation from
> +	 * memory reserves if the OOM reaper set MMF_OOM_SKIP on this mm before
> +	 * current thread calls out_of_memory() in order to get TIF_MEMDIE.
> +	 * In that case, allow current thread to try TIF_MEMDIE allocation
> +	 * before start selecting next OOM victims.
> +	 */
> +	if (ret && test_bit(MMF_OOM_SKIP, &mm->flags)) {
> +		if (task == current && !task->oom_kill_free_check_raced)
> +			task->oom_kill_free_check_raced = true;
> +		else
> +			ret = false;
> +	}
> +
>  	return ret;
>  }

I was going to argue that this will not work because we could mark a
former OOM victim again after it passed exit_oom_victim but this seems
impossible because task_will_free_mem checks task->mm and that will be
NULL by that time. This is still an ugly hack and it doesn't provide any
additional guarantee. Once we merge [1] then the oom victim wouldn't
need to get TIF_MEMDIE to access memory reserves.

[1] http://lkml.kernel.org/r/20170727090357.3205-2-mhocko@kernel.org
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Possible race condition in oom-killer
  2017-08-01 12:14                 ` Michal Hocko
@ 2017-08-01 14:16                   ` Tetsuo Handa
  2017-08-01 14:47                     ` Michal Hocko
  0 siblings, 1 reply; 16+ messages in thread
From: Tetsuo Handa @ 2017-08-01 14:16 UTC (permalink / raw)
  To: mhocko; +Cc: mjaggi, linux-kernel, linux-mm

Michal Hocko wrote:
>                       Once we merge [1] then the oom victim wouldn't
> need to get TIF_MEMDIE to access memory reserves.
> 
> [1] http://lkml.kernel.org/r/20170727090357.3205-2-mhocko@kernel.org

False. We are not setting oom_mm to all thread groups (!CLONE_THREAD) sharing
that mm (CLONE_VM). Thus, one thread from each thread group sharing that mm
will have to call out_of_memory() in order to set oom_mm, and they will find
task_will_free_mem() returning false due to MMF_OOM_SKIP already set, and
after all goes to next OOM victim selection.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Possible race condition in oom-killer
  2017-08-01 14:16                   ` Tetsuo Handa
@ 2017-08-01 14:47                     ` Michal Hocko
  0 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2017-08-01 14:47 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: mjaggi, linux-kernel, linux-mm

On Tue 01-08-17 23:16:13, Tetsuo Handa wrote:
> Michal Hocko wrote:
> >                       Once we merge [1] then the oom victim wouldn't
> > need to get TIF_MEMDIE to access memory reserves.
> > 
> > [1] http://lkml.kernel.org/r/20170727090357.3205-2-mhocko@kernel.org
> 
> False. We are not setting oom_mm to all thread groups (!CLONE_THREAD) sharing
> that mm (CLONE_VM). Thus, one thread from each thread group sharing that mm
> will have to call out_of_memory() in order to set oom_mm, and they will find
> task_will_free_mem() returning false due to MMF_OOM_SKIP already set, and
> after all goes to next OOM victim selection.

Once the patch is merged we can mark_oom_victim all of them as well.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-08-01 14:47 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <e6c83a26-1d59-4afd-55cf-04e58bdde188@caviumnetworks.com>
2017-07-28 12:32 ` Possible race condition in oom-killer Michal Hocko
2017-07-28 12:59   ` Tetsuo Handa
2017-07-28 13:07     ` Michal Hocko
2017-07-28 13:15       ` Tetsuo Handa
2017-07-28 13:29         ` Michal Hocko
2017-07-28 13:55           ` Tetsuo Handa
2017-07-28 14:07             ` Michal Hocko
2017-07-29  4:31               ` Tetsuo Handa
2017-08-01 12:14                 ` Michal Hocko
2017-08-01 14:16                   ` Tetsuo Handa
2017-08-01 14:47                     ` Michal Hocko
2017-08-01 10:46               ` Tetsuo Handa
2017-08-01 11:30                 ` Michal Hocko
2017-07-28 13:15     ` Manish Jaggi
2017-07-28 13:50   ` Manish Jaggi
2017-07-28 14:12     ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).