* [PATCH] memcg, oom: be careful about races when warning about no reclaimable task @ 2018-08-07 7:25 Michal Hocko 2018-08-07 10:15 ` Tetsuo Handa 2018-08-07 20:02 ` Johannes Weiner 0 siblings, 2 replies; 18+ messages in thread From: Michal Hocko @ 2018-08-07 7:25 UTC (permalink / raw) To: Andrew Morton Cc: Johannes Weiner, Vladimir Davydov, linux-mm, Greg Thelen, Tetsuo Handa, Dmitry Vyukov, LKML, Michal Hocko From: Michal Hocko <mhocko@suse.com> "memcg, oom: move out_of_memory back to the charge path" has added a warning triggered when the oom killer cannot find any eligible task and so there is no way to reclaim the oom memcg under its hard limit. Further charges for such a memcg are forced and therefore the hard limit isolation is weakened. The current warning is however too eager to trigger even when we are not really hitting the above condition. Syzbot[1] and Greg Thelen have noticed that we can hit this condition even when there is still oom victim pending. E.g. the following race is possible: memcg has two tasks taskA, taskB. CPU1 (taskA) CPU2 CPU3 (taskB) try_charge mem_cgroup_out_of_memory try_charge select_bad_process(taskB) oom_kill_process oom_reap_task # No real memory reaped mem_cgroup_out_of_memory # set taskB -> MMF_OOM_SKIP # retry charge mem_cgroup_out_of_memory oom_lock oom_lock select_bad_process(self) oom_kill_process(self) oom_unlock # no eligible task In fact syzbot test triggered this situation by placing multiple tasks into a memcg with hard limit set to 0. So no task really had any memory charged to the memcg : Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB : Tasks state (memory values in pages): : [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name : [ 6569] 0 6562 9427 1 53248 0 0 syz-executor0 : [ 6576] 0 6576 9426 0 61440 0 0 syz-executor6 : [ 6578] 0 6578 9426 534 61440 0 0 syz-executor4 : [ 6579] 0 6579 9426 0 57344 0 0 syz-executor5 : [ 6582] 0 6582 9426 0 61440 0 0 syz-executor7 : [ 6584] 0 6584 9426 0 57344 0 0 syz-executor1 so in principle there is indeed nothing reclaimable in this memcg and this looks like a misconfiguration. On the other hand we can clearly kill all those tasks so it is a bit early to warn and scare users. Do that by checking that the current is the oom victim and bypass the warning then. The victim is allowed to force charge and terminate to release its temporal charge along the way. [1] http://lkml.kernel.org/r/0000000000005e979605729c1564@google.com Fixes: "memcg, oom: move out_of_memory back to the charge path" Noticed-by: Greg Thelen <gthelen@google.com> Reported-and-tested-by: syzbot+bab151e82a4e973fa325@syzkaller.appspotmail.com Signed-off-by: Michal Hocko <mhocko@suse.com> --- mm/memcontrol.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4603ad75c9a9..1b6eed1bc404 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int return OOM_ASYNC; } - if (mem_cgroup_out_of_memory(memcg, mask, order)) + if (mem_cgroup_out_of_memory(memcg, mask, order) || + tsk_is_oom_victim(current)) return OOM_SUCCESS; WARN(1,"Memory cgroup charge failed because of no reclaimable memory! " -- 2.18.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] memcg, oom: be careful about races when warning about no reclaimable task 2018-08-07 7:25 [PATCH] memcg, oom: be careful about races when warning about no reclaimable task Michal Hocko @ 2018-08-07 10:15 ` Tetsuo Handa 2018-08-07 11:04 ` Michal Hocko 2018-08-07 20:19 ` Johannes Weiner 2018-08-07 20:02 ` Johannes Weiner 1 sibling, 2 replies; 18+ messages in thread From: Tetsuo Handa @ 2018-08-07 10:15 UTC (permalink / raw) To: Michal Hocko, Andrew Morton Cc: Johannes Weiner, Vladimir Davydov, linux-mm, Greg Thelen, Dmitry Vyukov, LKML, Michal Hocko, David Rientjes On 2018/08/07 16:25, Michal Hocko wrote: > @@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int > return OOM_ASYNC; > } > > - if (mem_cgroup_out_of_memory(memcg, mask, order)) > + if (mem_cgroup_out_of_memory(memcg, mask, order) || > + tsk_is_oom_victim(current)) > return OOM_SUCCESS; > > WARN(1,"Memory cgroup charge failed because of no reclaimable memory! " > I don't think this patch is appropriate. This patch only avoids hitting WARN(1). This patch does not address the root cause: The task_will_free_mem(current) test in out_of_memory() is returning false because test_bit(MMF_OOM_SKIP, &mm->flags) test in task_will_free_mem() is returning false because MMF_OOM_SKIP was already set by the OOM reaper. The OOM killer does not need to start selecting next OOM victim until "current thread completes __mmput()" or "it fails to complete __mmput() within reasonable period". According to https://syzkaller.appspot.com/text?tag=CrashLog&x=15a1c770400000 , PID=23767 selected PID=23766 as an OOM victim and the OOM reaper set MMF_OOM_SKIP before PID=23766 unnecessarily selects PID=23767 as next OOM victim. At uptime = 366.550949, out_of_memory() should have returned true without selecting next OOM victim because tsk_is_oom_victim(current) == true. [ 365.869417] syz-executor2 invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), order=0, oom_score_adj=0 [ 365.878899] CPU: 0 PID: 23767 Comm: syz-executor2 Not tainted 4.18.0-rc6-next-20180725+ #18 (...snipped...) [ 366.487490] Tasks state (memory values in pages): [ 366.492349] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [ 366.501237] [ 23766] 0 23766 17620 8221 126976 0 0 syz-executor3 [ 366.510367] [ 23767] 0 23767 17618 8218 126976 0 0 syz-executor2 [ 366.519409] Memory cgroup out of memory: Kill process 23766 (syz-executor3) score 8252000 or sacrifice child [ 366.529422] Killed process 23766 (syz-executor3) total-vm:70480kB, anon-rss:116kB, file-rss:32768kB, shmem-rss:0kB [ 366.540456] oom_reaper: reaped process 23766 (syz-executor3), now anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB [ 366.550949] syz-executor3 invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), order=0, oom_score_adj=0 [ 366.560374] CPU: 1 PID: 23766 Comm: syz-executor3 Not tainted 4.18.0-rc6-next-20180725+ #18 (...snipped...) [ 367.138136] Tasks state (memory values in pages): [ 367.142986] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [ 367.151889] [ 23766] 0 23766 17620 8002 126976 0 0 syz-executor3 [ 367.160946] [ 23767] 0 23767 17618 8218 126976 0 0 syz-executor2 [ 367.169994] Memory cgroup out of memory: Kill process 23767 (syz-executor2) score 8249000 or sacrifice child [ 367.180119] Killed process 23767 (syz-executor2) total-vm:70472kB, anon-rss:104kB, file-rss:32768kB, shmem-rss:0kB [ 367.192101] oom_reaper: reaped process 23767 (syz-executor2), now anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB [ 367.202986] ------------[ cut here ]------------ [ 367.207845] Memory cgroup charge failed because of no reclaimable memory! This looks like a misconfiguration or a kernel bug. [ 367.207965] WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710 try_charge+0x734/0x1680 [ 367.227540] Kernel panic - not syncing: panic_on_warn set ... Of course, if the hard limit is 0, all processes will be killed after all. But Michal is ignoring the fact that if the hard limit were not 0, there is a chance of saving next process from needlessly killed if we waited until "mm of PID=23766 completed __mmput()" or "mm of PID=23766 failed to complete __mmput() within reasonable period". We can make efforts not to return false at /* * This task has already been drained by the oom reaper so there are * only small chances it will free some more */ if (test_bit(MMF_OOM_SKIP, &mm->flags)) return false; (I admit that ignoring MMF_OOM_SKIP for once might not be sufficient for memcg case), and we can use feedback based backoff like "[PATCH 4/4] mm, oom: Fix unnecessary killing of additional processes." *UNTIL* we come to the point where the OOM reaper can always reclaim all memory. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] memcg, oom: be careful about races when warning about no reclaimable task 2018-08-07 10:15 ` Tetsuo Handa @ 2018-08-07 11:04 ` Michal Hocko 2018-08-07 20:19 ` Johannes Weiner 1 sibling, 0 replies; 18+ messages in thread From: Michal Hocko @ 2018-08-07 11:04 UTC (permalink / raw) To: Tetsuo Handa Cc: Andrew Morton, Johannes Weiner, Vladimir Davydov, linux-mm, Greg Thelen, Dmitry Vyukov, LKML, David Rientjes On Tue 07-08-18 19:15:11, Tetsuo Handa wrote: [...] > Of course, if the hard limit is 0, all processes will be killed after all. But > Michal is ignoring the fact that if the hard limit were not 0, there is a chance > of saving next process from needlessly killed if we waited until "mm of PID=23766 > completed __mmput()" or "mm of PID=23766 failed to complete __mmput() within > reasonable period". This is a completely different issue IMHO. I haven't seen reports about overly eager memcg oom killing so far. > We can make efforts not to return false at > > /* > * This task has already been drained by the oom reaper so there are > * only small chances it will free some more > */ > if (test_bit(MMF_OOM_SKIP, &mm->flags)) > return false; > > (I admit that ignoring MMF_OOM_SKIP for once might not be sufficient for memcg > case), and we can use feedback based backoff like > "[PATCH 4/4] mm, oom: Fix unnecessary killing of additional processes." *UNTIL* > we come to the point where the OOM reaper can always reclaim all memory. The code is quite tricky and I am really reluctant to make it even more so without seeing this is really hurting real users with real workloads. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] memcg, oom: be careful about races when warning about no reclaimable task 2018-08-07 10:15 ` Tetsuo Handa 2018-08-07 11:04 ` Michal Hocko @ 2018-08-07 20:19 ` Johannes Weiner 2018-08-07 20:38 ` Tetsuo Handa 1 sibling, 1 reply; 18+ messages in thread From: Johannes Weiner @ 2018-08-07 20:19 UTC (permalink / raw) To: Tetsuo Handa Cc: Michal Hocko, Andrew Morton, Vladimir Davydov, linux-mm, Greg Thelen, Dmitry Vyukov, LKML, Michal Hocko, David Rientjes On Tue, Aug 07, 2018 at 07:15:11PM +0900, Tetsuo Handa wrote: > On 2018/08/07 16:25, Michal Hocko wrote: > > @@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int > > return OOM_ASYNC; > > } > > > > - if (mem_cgroup_out_of_memory(memcg, mask, order)) > > + if (mem_cgroup_out_of_memory(memcg, mask, order) || > > + tsk_is_oom_victim(current)) > > return OOM_SUCCESS; > > > > WARN(1,"Memory cgroup charge failed because of no reclaimable memory! " > > > > I don't think this patch is appropriate. This patch only avoids hitting WARN(1). > This patch does not address the root cause: > > The task_will_free_mem(current) test in out_of_memory() is returning false > because test_bit(MMF_OOM_SKIP, &mm->flags) test in task_will_free_mem() is > returning false because MMF_OOM_SKIP was already set by the OOM reaper. The OOM > killer does not need to start selecting next OOM victim until "current thread > completes __mmput()" or "it fails to complete __mmput() within reasonable > period". I don't see why it matters whether the OOM victim exits or not, unless you count the memory consumed by struct task_struct. > According to https://syzkaller.appspot.com/text?tag=CrashLog&x=15a1c770400000 , > PID=23767 selected PID=23766 as an OOM victim and the OOM reaper set MMF_OOM_SKIP > before PID=23766 unnecessarily selects PID=23767 as next OOM victim. > At uptime = 366.550949, out_of_memory() should have returned true without selecting > next OOM victim because tsk_is_oom_victim(current) == true. The code works just fine. We have to kill tasks until we a) free enough memory or b) run out of tasks or c) kill current. When one of these outcomes is reached, we allow the charge and return. The only problem here is a warning in the wrong place. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] memcg, oom: be careful about races when warning about no reclaimable task 2018-08-07 20:19 ` Johannes Weiner @ 2018-08-07 20:38 ` Tetsuo Handa 2018-08-08 12:57 ` Tetsuo Handa 0 siblings, 1 reply; 18+ messages in thread From: Tetsuo Handa @ 2018-08-07 20:38 UTC (permalink / raw) To: Johannes Weiner Cc: Michal Hocko, Andrew Morton, Vladimir Davydov, linux-mm, Greg Thelen, Dmitry Vyukov, LKML, Michal Hocko, David Rientjes On 2018/08/08 5:19, Johannes Weiner wrote: > On Tue, Aug 07, 2018 at 07:15:11PM +0900, Tetsuo Handa wrote: >> On 2018/08/07 16:25, Michal Hocko wrote: >>> @@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int >>> return OOM_ASYNC; >>> } >>> >>> - if (mem_cgroup_out_of_memory(memcg, mask, order)) >>> + if (mem_cgroup_out_of_memory(memcg, mask, order) || >>> + tsk_is_oom_victim(current)) >>> return OOM_SUCCESS; >>> >>> WARN(1,"Memory cgroup charge failed because of no reclaimable memory! " >>> >> >> I don't think this patch is appropriate. This patch only avoids hitting WARN(1). >> This patch does not address the root cause: >> >> The task_will_free_mem(current) test in out_of_memory() is returning false >> because test_bit(MMF_OOM_SKIP, &mm->flags) test in task_will_free_mem() is >> returning false because MMF_OOM_SKIP was already set by the OOM reaper. The OOM >> killer does not need to start selecting next OOM victim until "current thread >> completes __mmput()" or "it fails to complete __mmput() within reasonable >> period". > > I don't see why it matters whether the OOM victim exits or not, unless > you count the memory consumed by struct task_struct. We are not counting memory consumed by struct task_struct. But David is counting memory released between set_bit(MMF_OOM_SKIP, &mm->flags) and completion of exit_mmap(). > >> According to https://syzkaller.appspot.com/text?tag=CrashLog&x=15a1c770400000 , >> PID=23767 selected PID=23766 as an OOM victim and the OOM reaper set MMF_OOM_SKIP >> before PID=23766 unnecessarily selects PID=23767 as next OOM victim. >> At uptime = 366.550949, out_of_memory() should have returned true without selecting >> next OOM victim because tsk_is_oom_victim(current) == true. > > The code works just fine. We have to kill tasks until we a) free > enough memory or b) run out of tasks or c) kill current. When one of > these outcomes is reached, we allow the charge and return. > > The only problem here is a warning in the wrong place. > If forced charge contained a bug, removing this WARN(1) deprives users of chance to know that something is going wrong. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] memcg, oom: be careful about races when warning about no reclaimable task 2018-08-07 20:38 ` Tetsuo Handa @ 2018-08-08 12:57 ` Tetsuo Handa 2018-08-08 13:16 ` Michal Hocko 0 siblings, 1 reply; 18+ messages in thread From: Tetsuo Handa @ 2018-08-08 12:57 UTC (permalink / raw) To: Johannes Weiner Cc: Michal Hocko, Andrew Morton, Vladimir Davydov, linux-mm, Greg Thelen, Dmitry Vyukov, LKML, Michal Hocko, David Rientjes On 2018/08/08 5:38, Tetsuo Handa wrote: > On 2018/08/08 5:19, Johannes Weiner wrote: >> On Tue, Aug 07, 2018 at 07:15:11PM +0900, Tetsuo Handa wrote: >>> On 2018/08/07 16:25, Michal Hocko wrote: >>>> @@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int >>>> return OOM_ASYNC; >>>> } >>>> >>>> - if (mem_cgroup_out_of_memory(memcg, mask, order)) >>>> + if (mem_cgroup_out_of_memory(memcg, mask, order) || >>>> + tsk_is_oom_victim(current)) >>>> return OOM_SUCCESS; >>>> >>>> WARN(1,"Memory cgroup charge failed because of no reclaimable memory! " >>>> >>> >>> I don't think this patch is appropriate. This patch only avoids hitting WARN(1). >>> This patch does not address the root cause: >>> >>> The task_will_free_mem(current) test in out_of_memory() is returning false >>> because test_bit(MMF_OOM_SKIP, &mm->flags) test in task_will_free_mem() is >>> returning false because MMF_OOM_SKIP was already set by the OOM reaper. The OOM >>> killer does not need to start selecting next OOM victim until "current thread >>> completes __mmput()" or "it fails to complete __mmput() within reasonable >>> period". >> >> I don't see why it matters whether the OOM victim exits or not, unless >> you count the memory consumed by struct task_struct. > > We are not counting memory consumed by struct task_struct. But David is > counting memory released between set_bit(MMF_OOM_SKIP, &mm->flags) and > completion of exit_mmap(). Also, before the OOM reaper was introduced, we waited until TIF_MEMDIE is cleared from the OOM victim thread. Compared to pre OOM reaper era, giving up so early is certainly a regression. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] memcg, oom: be careful about races when warning about no reclaimable task 2018-08-08 12:57 ` Tetsuo Handa @ 2018-08-08 13:16 ` Michal Hocko 0 siblings, 0 replies; 18+ messages in thread From: Michal Hocko @ 2018-08-08 13:16 UTC (permalink / raw) To: Tetsuo Handa Cc: Johannes Weiner, Andrew Morton, Vladimir Davydov, linux-mm, Greg Thelen, Dmitry Vyukov, LKML, David Rientjes On Wed 08-08-18 21:57:13, Tetsuo Handa wrote: [...] > Also, before the OOM reaper was introduced, we waited until TIF_MEMDIE is > cleared from the OOM victim thread. Compared to pre OOM reaper era, giving up > so early is certainly a regression. We did clear TIF_MEMDIE flag after mmput() in do_exit so this was not a silver bullet either. Any reference on the mm_struct would lead to a similar problem. So could you please stop making strong stamements and start being reasonable? Yeah, this is racy. Nobody is claiming otherwise. All we are trying to say is that this area is full of dragons and before we start making it more complicating by covering weird cornercases we really need to see that those corner cases happen in real workloads. Otherwise we end up with a unmaintainable and fragile mess. And more importantly this is _not_ what this patch is trying to address so please do not go tangent again. I really do not know how to send this simply message to you. I have tried so many times before. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] memcg, oom: be careful about races when warning about no reclaimable task 2018-08-07 7:25 [PATCH] memcg, oom: be careful about races when warning about no reclaimable task Michal Hocko 2018-08-07 10:15 ` Tetsuo Handa @ 2018-08-07 20:02 ` Johannes Weiner 2018-08-07 20:23 ` Michal Hocko 1 sibling, 1 reply; 18+ messages in thread From: Johannes Weiner @ 2018-08-07 20:02 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, Vladimir Davydov, linux-mm, Greg Thelen, Tetsuo Handa, Dmitry Vyukov, LKML, Michal Hocko On Tue, Aug 07, 2018 at 09:25:53AM +0200, Michal Hocko wrote: > From: Michal Hocko <mhocko@suse.com> > > "memcg, oom: move out_of_memory back to the charge path" has added a > warning triggered when the oom killer cannot find any eligible task > and so there is no way to reclaim the oom memcg under its hard limit. > Further charges for such a memcg are forced and therefore the hard limit > isolation is weakened. > > The current warning is however too eager to trigger even when we are not > really hitting the above condition. Syzbot[1] and Greg Thelen have noticed > that we can hit this condition even when there is still oom victim > pending. E.g. the following race is possible: > > memcg has two tasks taskA, taskB. > > CPU1 (taskA) CPU2 CPU3 (taskB) > try_charge > mem_cgroup_out_of_memory try_charge > select_bad_process(taskB) > oom_kill_process oom_reap_task > # No real memory reaped > mem_cgroup_out_of_memory > # set taskB -> MMF_OOM_SKIP > # retry charge > mem_cgroup_out_of_memory > oom_lock oom_lock > select_bad_process(self) > oom_kill_process(self) > oom_unlock > # no eligible task > > In fact syzbot test triggered this situation by placing multiple tasks > into a memcg with hard limit set to 0. So no task really had any memory > charged to the memcg > > : Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB > : Tasks state (memory values in pages): > : [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name > : [ 6569] 0 6562 9427 1 53248 0 0 syz-executor0 > : [ 6576] 0 6576 9426 0 61440 0 0 syz-executor6 > : [ 6578] 0 6578 9426 534 61440 0 0 syz-executor4 > : [ 6579] 0 6579 9426 0 57344 0 0 syz-executor5 > : [ 6582] 0 6582 9426 0 61440 0 0 syz-executor7 > : [ 6584] 0 6584 9426 0 57344 0 0 syz-executor1 > > so in principle there is indeed nothing reclaimable in this memcg and > this looks like a misconfiguration. On the other hand we can clearly > kill all those tasks so it is a bit early to warn and scare users. Do > that by checking that the current is the oom victim and bypass the > warning then. The victim is allowed to force charge and terminate to > release its temporal charge along the way. > > [1] http://lkml.kernel.org/r/0000000000005e979605729c1564@google.com > Fixes: "memcg, oom: move out_of_memory back to the charge path" > Noticed-by: Greg Thelen <gthelen@google.com> > Reported-and-tested-by: syzbot+bab151e82a4e973fa325@syzkaller.appspotmail.com > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > mm/memcontrol.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 4603ad75c9a9..1b6eed1bc404 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int > return OOM_ASYNC; > } > > - if (mem_cgroup_out_of_memory(memcg, mask, order)) > + if (mem_cgroup_out_of_memory(memcg, mask, order) || > + tsk_is_oom_victim(current)) > return OOM_SUCCESS; > > WARN(1,"Memory cgroup charge failed because of no reclaimable memory! " This is really ugly. :( If that check is only there to suppress the warning when the limit is 0, this should really be a separate branch around the warning, with a fat comment that this is a ridiculous cornercase, and not look like it is an essential part of the memcg reclaim/oom process. Personally, I really don't get the point of this message. What is the user to do with this information? What are we to do with it if people report it? It conveys zero information on what the problem could be, because it asserts a really vague high-level thing. Shouldn't such debugging happen inside the OOM killer? What are the conceivable scenarios in which this triggers other than obvious misconfigs? What would we lose by just deleting it? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] memcg, oom: be careful about races when warning about no reclaimable task 2018-08-07 20:02 ` Johannes Weiner @ 2018-08-07 20:23 ` Michal Hocko 2018-08-07 20:54 ` Johannes Weiner 0 siblings, 1 reply; 18+ messages in thread From: Michal Hocko @ 2018-08-07 20:23 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, Vladimir Davydov, linux-mm, Greg Thelen, Tetsuo Handa, Dmitry Vyukov, LKML On Tue 07-08-18 16:02:47, Johannes Weiner wrote: > On Tue, Aug 07, 2018 at 09:25:53AM +0200, Michal Hocko wrote: > > From: Michal Hocko <mhocko@suse.com> > > > > "memcg, oom: move out_of_memory back to the charge path" has added a > > warning triggered when the oom killer cannot find any eligible task > > and so there is no way to reclaim the oom memcg under its hard limit. > > Further charges for such a memcg are forced and therefore the hard limit > > isolation is weakened. > > > > The current warning is however too eager to trigger even when we are not > > really hitting the above condition. Syzbot[1] and Greg Thelen have noticed > > that we can hit this condition even when there is still oom victim > > pending. E.g. the following race is possible: > > > > memcg has two tasks taskA, taskB. > > > > CPU1 (taskA) CPU2 CPU3 (taskB) > > try_charge > > mem_cgroup_out_of_memory try_charge > > select_bad_process(taskB) > > oom_kill_process oom_reap_task > > # No real memory reaped > > mem_cgroup_out_of_memory > > # set taskB -> MMF_OOM_SKIP > > # retry charge > > mem_cgroup_out_of_memory > > oom_lock oom_lock > > select_bad_process(self) > > oom_kill_process(self) > > oom_unlock > > # no eligible task > > > > In fact syzbot test triggered this situation by placing multiple tasks > > into a memcg with hard limit set to 0. So no task really had any memory > > charged to the memcg > > > > : Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB > > : Tasks state (memory values in pages): > > : [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name > > : [ 6569] 0 6562 9427 1 53248 0 0 syz-executor0 > > : [ 6576] 0 6576 9426 0 61440 0 0 syz-executor6 > > : [ 6578] 0 6578 9426 534 61440 0 0 syz-executor4 > > : [ 6579] 0 6579 9426 0 57344 0 0 syz-executor5 > > : [ 6582] 0 6582 9426 0 61440 0 0 syz-executor7 > > : [ 6584] 0 6584 9426 0 57344 0 0 syz-executor1 > > > > so in principle there is indeed nothing reclaimable in this memcg and > > this looks like a misconfiguration. On the other hand we can clearly > > kill all those tasks so it is a bit early to warn and scare users. Do > > that by checking that the current is the oom victim and bypass the > > warning then. The victim is allowed to force charge and terminate to > > release its temporal charge along the way. > > > > [1] http://lkml.kernel.org/r/0000000000005e979605729c1564@google.com > > Fixes: "memcg, oom: move out_of_memory back to the charge path" > > Noticed-by: Greg Thelen <gthelen@google.com> > > Reported-and-tested-by: syzbot+bab151e82a4e973fa325@syzkaller.appspotmail.com > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > --- > > mm/memcontrol.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 4603ad75c9a9..1b6eed1bc404 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int > > return OOM_ASYNC; > > } > > > > - if (mem_cgroup_out_of_memory(memcg, mask, order)) > > + if (mem_cgroup_out_of_memory(memcg, mask, order) || > > + tsk_is_oom_victim(current)) > > return OOM_SUCCESS; > > > > WARN(1,"Memory cgroup charge failed because of no reclaimable memory! " > > This is really ugly. :( > > If that check is only there to suppress the warning when the limit is > 0, this should really be a separate branch around the warning, with a > fat comment that this is a ridiculous cornercase, and not look like it > is an essential part of the memcg reclaim/oom process. I do not mind having it in a separate branch. Btw. this is not just about hard limit set to 0. Similar can happen anytime we are getting out of oom victims. The likelihood goes up with the remote memcg charging merged recently. > Personally, I really don't get the point of this message. What is the > user to do with this information? What are we to do with it if people > report it? It conveys zero information on what the problem could be, > because it asserts a really vague high-level thing. Shouldn't such > debugging happen inside the OOM killer? What are the conceivable > scenarios in which this triggers other than obvious misconfigs? > > What would we lose by just deleting it? We know that _something_ bad is going on because we have no way to reclaim down to the hard limit. And that is the primary tool for the workload isolation. I am all for a better information to tell us more. I do not know what that would be right now, though. My primary motivation for this warning was to catch potential issues after we have moved oom handling back to the charge path. If we just remove it then we have no information at all. I wouldn't mind removing it if it generated more false possitives but that hasn't happened so far. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] memcg, oom: be careful about races when warning about no reclaimable task 2018-08-07 20:23 ` Michal Hocko @ 2018-08-07 20:54 ` Johannes Weiner 2018-08-08 6:44 ` Michal Hocko 0 siblings, 1 reply; 18+ messages in thread From: Johannes Weiner @ 2018-08-07 20:54 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, Vladimir Davydov, linux-mm, Greg Thelen, Tetsuo Handa, Dmitry Vyukov, LKML On Tue, Aug 07, 2018 at 10:23:32PM +0200, Michal Hocko wrote: > On Tue 07-08-18 16:02:47, Johannes Weiner wrote: > > On Tue, Aug 07, 2018 at 09:25:53AM +0200, Michal Hocko wrote: > > > From: Michal Hocko <mhocko@suse.com> > > > > > > "memcg, oom: move out_of_memory back to the charge path" has added a > > > warning triggered when the oom killer cannot find any eligible task > > > and so there is no way to reclaim the oom memcg under its hard limit. > > > Further charges for such a memcg are forced and therefore the hard limit > > > isolation is weakened. > > > > > > The current warning is however too eager to trigger even when we are not > > > really hitting the above condition. Syzbot[1] and Greg Thelen have noticed > > > that we can hit this condition even when there is still oom victim > > > pending. E.g. the following race is possible: > > > > > > memcg has two tasks taskA, taskB. > > > > > > CPU1 (taskA) CPU2 CPU3 (taskB) > > > try_charge > > > mem_cgroup_out_of_memory try_charge > > > select_bad_process(taskB) > > > oom_kill_process oom_reap_task > > > # No real memory reaped > > > mem_cgroup_out_of_memory > > > # set taskB -> MMF_OOM_SKIP > > > # retry charge > > > mem_cgroup_out_of_memory > > > oom_lock oom_lock > > > select_bad_process(self) > > > oom_kill_process(self) > > > oom_unlock > > > # no eligible task > > > > > > In fact syzbot test triggered this situation by placing multiple tasks > > > into a memcg with hard limit set to 0. So no task really had any memory > > > charged to the memcg > > > > > > : Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB > > > : Tasks state (memory values in pages): > > > : [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name > > > : [ 6569] 0 6562 9427 1 53248 0 0 syz-executor0 > > > : [ 6576] 0 6576 9426 0 61440 0 0 syz-executor6 > > > : [ 6578] 0 6578 9426 534 61440 0 0 syz-executor4 > > > : [ 6579] 0 6579 9426 0 57344 0 0 syz-executor5 > > > : [ 6582] 0 6582 9426 0 61440 0 0 syz-executor7 > > > : [ 6584] 0 6584 9426 0 57344 0 0 syz-executor1 > > > > > > so in principle there is indeed nothing reclaimable in this memcg and > > > this looks like a misconfiguration. On the other hand we can clearly > > > kill all those tasks so it is a bit early to warn and scare users. Do > > > that by checking that the current is the oom victim and bypass the > > > warning then. The victim is allowed to force charge and terminate to > > > release its temporal charge along the way. > > > > > > [1] http://lkml.kernel.org/r/0000000000005e979605729c1564@google.com > > > Fixes: "memcg, oom: move out_of_memory back to the charge path" > > > Noticed-by: Greg Thelen <gthelen@google.com> > > > Reported-and-tested-by: syzbot+bab151e82a4e973fa325@syzkaller.appspotmail.com > > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > > --- > > > mm/memcontrol.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index 4603ad75c9a9..1b6eed1bc404 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int > > > return OOM_ASYNC; > > > } > > > > > > - if (mem_cgroup_out_of_memory(memcg, mask, order)) > > > + if (mem_cgroup_out_of_memory(memcg, mask, order) || > > > + tsk_is_oom_victim(current)) > > > return OOM_SUCCESS; > > > > > > WARN(1,"Memory cgroup charge failed because of no reclaimable memory! " > > > > This is really ugly. :( > > > > If that check is only there to suppress the warning when the limit is > > 0, this should really be a separate branch around the warning, with a > > fat comment that this is a ridiculous cornercase, and not look like it > > is an essential part of the memcg reclaim/oom process. > > I do not mind having it in a separate branch. Btw. this is not just about > hard limit set to 0. Similar can happen anytime we are getting out of > oom victims. The likelihood goes up with the remote memcg charging > merged recently. What the global OOM killer does in that situation is dump the header anyway: /* Found nothing?!?! Either we hang forever, or we panic. */ if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) { dump_header(oc, NULL); panic("Out of memory and no killable processes...\n"); } I think that would make sense here as well - without the panic, obviously, but we can add our own pr_err() line following the header. That gives us the exact memory situation of the cgroup and who is trying to allocate and from what context, but in a format that is known to users without claiming right away that it's a kernel issue. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] memcg, oom: be careful about races when warning about no reclaimable task 2018-08-07 20:54 ` Johannes Weiner @ 2018-08-08 6:44 ` Michal Hocko 2018-08-08 7:12 ` Michal Hocko 0 siblings, 1 reply; 18+ messages in thread From: Michal Hocko @ 2018-08-08 6:44 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, Vladimir Davydov, linux-mm, Greg Thelen, Tetsuo Handa, Dmitry Vyukov, LKML On Tue 07-08-18 16:54:25, Johannes Weiner wrote: > On Tue, Aug 07, 2018 at 10:23:32PM +0200, Michal Hocko wrote: > > On Tue 07-08-18 16:02:47, Johannes Weiner wrote: > > > On Tue, Aug 07, 2018 at 09:25:53AM +0200, Michal Hocko wrote: > > > > From: Michal Hocko <mhocko@suse.com> > > > > > > > > "memcg, oom: move out_of_memory back to the charge path" has added a > > > > warning triggered when the oom killer cannot find any eligible task > > > > and so there is no way to reclaim the oom memcg under its hard limit. > > > > Further charges for such a memcg are forced and therefore the hard limit > > > > isolation is weakened. > > > > > > > > The current warning is however too eager to trigger even when we are not > > > > really hitting the above condition. Syzbot[1] and Greg Thelen have noticed > > > > that we can hit this condition even when there is still oom victim > > > > pending. E.g. the following race is possible: > > > > > > > > memcg has two tasks taskA, taskB. > > > > > > > > CPU1 (taskA) CPU2 CPU3 (taskB) > > > > try_charge > > > > mem_cgroup_out_of_memory try_charge > > > > select_bad_process(taskB) > > > > oom_kill_process oom_reap_task > > > > # No real memory reaped > > > > mem_cgroup_out_of_memory > > > > # set taskB -> MMF_OOM_SKIP > > > > # retry charge > > > > mem_cgroup_out_of_memory > > > > oom_lock oom_lock > > > > select_bad_process(self) > > > > oom_kill_process(self) > > > > oom_unlock > > > > # no eligible task > > > > > > > > In fact syzbot test triggered this situation by placing multiple tasks > > > > into a memcg with hard limit set to 0. So no task really had any memory > > > > charged to the memcg > > > > > > > > : Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB > > > > : Tasks state (memory values in pages): > > > > : [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name > > > > : [ 6569] 0 6562 9427 1 53248 0 0 syz-executor0 > > > > : [ 6576] 0 6576 9426 0 61440 0 0 syz-executor6 > > > > : [ 6578] 0 6578 9426 534 61440 0 0 syz-executor4 > > > > : [ 6579] 0 6579 9426 0 57344 0 0 syz-executor5 > > > > : [ 6582] 0 6582 9426 0 61440 0 0 syz-executor7 > > > > : [ 6584] 0 6584 9426 0 57344 0 0 syz-executor1 > > > > > > > > so in principle there is indeed nothing reclaimable in this memcg and > > > > this looks like a misconfiguration. On the other hand we can clearly > > > > kill all those tasks so it is a bit early to warn and scare users. Do > > > > that by checking that the current is the oom victim and bypass the > > > > warning then. The victim is allowed to force charge and terminate to > > > > release its temporal charge along the way. > > > > > > > > [1] http://lkml.kernel.org/r/0000000000005e979605729c1564@google.com > > > > Fixes: "memcg, oom: move out_of_memory back to the charge path" > > > > Noticed-by: Greg Thelen <gthelen@google.com> > > > > Reported-and-tested-by: syzbot+bab151e82a4e973fa325@syzkaller.appspotmail.com > > > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > > > --- > > > > mm/memcontrol.c | 3 ++- > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > > index 4603ad75c9a9..1b6eed1bc404 100644 > > > > --- a/mm/memcontrol.c > > > > +++ b/mm/memcontrol.c > > > > @@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int > > > > return OOM_ASYNC; > > > > } > > > > > > > > - if (mem_cgroup_out_of_memory(memcg, mask, order)) > > > > + if (mem_cgroup_out_of_memory(memcg, mask, order) || > > > > + tsk_is_oom_victim(current)) > > > > return OOM_SUCCESS; > > > > > > > > WARN(1,"Memory cgroup charge failed because of no reclaimable memory! " > > > > > > This is really ugly. :( > > > > > > If that check is only there to suppress the warning when the limit is > > > 0, this should really be a separate branch around the warning, with a > > > fat comment that this is a ridiculous cornercase, and not look like it > > > is an essential part of the memcg reclaim/oom process. > > > > I do not mind having it in a separate branch. Btw. this is not just about > > hard limit set to 0. Similar can happen anytime we are getting out of > > oom victims. The likelihood goes up with the remote memcg charging > > merged recently. > > What the global OOM killer does in that situation is dump the header > anyway: > > /* Found nothing?!?! Either we hang forever, or we panic. */ > if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) { > dump_header(oc, NULL); > panic("Out of memory and no killable processes...\n"); > } > > I think that would make sense here as well - without the panic, > obviously, but we can add our own pr_err() line following the header. > > That gives us the exact memory situation of the cgroup and who is > trying to allocate and from what context, but in a format that is > known to users without claiming right away that it's a kernel issue. I was considering doing that initially but then decided that warning is less noisy and still a good "let us know" trigger. It doesn't give us the whole picture which is obviously a downside but we would at least know that something is going south one have the trace to who that might be should this be a bug rather than a misconfiguration. But I do not mind doing dump_header as well. Care to send a patch? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] memcg, oom: be careful about races when warning about no reclaimable task 2018-08-08 6:44 ` Michal Hocko @ 2018-08-08 7:12 ` Michal Hocko 2018-08-08 7:13 ` [PATCH 1/2] " Michal Hocko 2018-08-08 7:13 ` [PATCH 2/2] memcg, oom: emit oom report when there is no eligible task Michal Hocko 0 siblings, 2 replies; 18+ messages in thread From: Michal Hocko @ 2018-08-08 7:12 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, Vladimir Davydov, Greg Thelen, Tetsuo Handa, Dmitry Vyukov, linux-mm, LKML On Wed 08-08-18 08:44:14, Michal Hocko wrote: > On Tue 07-08-18 16:54:25, Johannes Weiner wrote: [...] > > What the global OOM killer does in that situation is dump the header > > anyway: > > > > /* Found nothing?!?! Either we hang forever, or we panic. */ > > if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) { > > dump_header(oc, NULL); > > panic("Out of memory and no killable processes...\n"); > > } > > > > I think that would make sense here as well - without the panic, > > obviously, but we can add our own pr_err() line following the header. > > > > That gives us the exact memory situation of the cgroup and who is > > trying to allocate and from what context, but in a format that is > > known to users without claiming right away that it's a kernel issue. > > I was considering doing that initially but then decided that warning is > less noisy and still a good "let us know" trigger. It doesn't give us > the whole picture which is obviously a downside but we would at least > know that something is going south one have the trace to who that might > be should this be a bug rather than a misconfiguration. > > But I do not mind doing dump_header as well. Care to send a patch? OK, so I found few spare cycles and here is what I came up with. The first patch fixes the spurious warning and I have separated the check and added a comment as you asked. The second patch replaces warning with oom report. Does that look better? ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/2] memcg, oom: be careful about races when warning about no reclaimable task 2018-08-08 7:12 ` Michal Hocko @ 2018-08-08 7:13 ` Michal Hocko 2018-08-08 7:13 ` [PATCH 2/2] memcg, oom: emit oom report when there is no eligible task Michal Hocko 1 sibling, 0 replies; 18+ messages in thread From: Michal Hocko @ 2018-08-08 7:13 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, Vladimir Davydov, Greg Thelen, Tetsuo Handa, Dmitry Vyukov, linux-mm, LKML, Michal Hocko From: Michal Hocko <mhocko@suse.com> "memcg, oom: move out_of_memory back to the charge path" has added a warning triggered when the oom killer cannot find any eligible task and so there is no way to reclaim the oom memcg under its hard limit. Further charges for such a memcg are forced and therefore the hard limit isolation is weakened. The current warning is however too eager to trigger even when we are not really hitting the above condition. Syzbot[1] and Greg Thelen have noticed that we can hit this condition even when there is still oom victim pending. E.g. the following race is possible: memcg has two tasks taskA, taskB. CPU1 (taskA) CPU2 CPU3 (taskB) try_charge mem_cgroup_out_of_memory try_charge select_bad_process(taskB) oom_kill_process oom_reap_task # No real memory reaped mem_cgroup_out_of_memory # set taskB -> MMF_OOM_SKIP # retry charge mem_cgroup_out_of_memory oom_lock oom_lock select_bad_process(self) oom_kill_process(self) oom_unlock # no eligible task In fact syzbot test triggered this situation by placing multiple tasks into a memcg with hard limit set to 0. So no task really had any memory charged to the memcg : Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB : Tasks state (memory values in pages): : [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name : [ 6569] 0 6562 9427 1 53248 0 0 syz-executor0 : [ 6576] 0 6576 9426 0 61440 0 0 syz-executor6 : [ 6578] 0 6578 9426 534 61440 0 0 syz-executor4 : [ 6579] 0 6579 9426 0 57344 0 0 syz-executor5 : [ 6582] 0 6582 9426 0 61440 0 0 syz-executor7 : [ 6584] 0 6584 9426 0 57344 0 0 syz-executor1 so in principle there is indeed nothing reclaimable in this memcg and this looks like a misconfiguration. On the other hand we can clearly kill all those tasks so it is a bit early to warn and scare users. Do that by checking that the current is the oom victim and bypass the warning then. The victim is allowed to force charge and terminate to release its temporal charge along the way. [1] http://lkml.kernel.org/r/0000000000005e979605729c1564@google.com Fixes: "memcg, oom: move out_of_memory back to the charge path" Noticed-by: Greg Thelen <gthelen@google.com> Reported-and-tested-by: syzbot+bab151e82a4e973fa325@syzkaller.appspotmail.com Signed-off-by: Michal Hocko <mhocko@suse.com> --- mm/memcontrol.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4603ad75c9a9..c80e5b6a8e9f 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1705,6 +1705,15 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int if (mem_cgroup_out_of_memory(memcg, mask, order)) return OOM_SUCCESS; + + /* + * under rare race the current task might have been selected while + * reaching mem_cgroup_out_of_memory and there is no other oom victim + * left. There is still no reason to warn because this task will + * die and release its bypassed charge eventually. + */ + if (tsk_is_oom_victim(current)) + return OOM_SUCCESS; WARN(1,"Memory cgroup charge failed because of no reclaimable memory! " "This looks like a misconfiguration or a kernel bug."); -- 2.18.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/2] memcg, oom: emit oom report when there is no eligible task 2018-08-08 7:12 ` Michal Hocko 2018-08-08 7:13 ` [PATCH 1/2] " Michal Hocko @ 2018-08-08 7:13 ` Michal Hocko 2018-08-08 14:45 ` Johannes Weiner 1 sibling, 1 reply; 18+ messages in thread From: Michal Hocko @ 2018-08-08 7:13 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, Vladimir Davydov, Greg Thelen, Tetsuo Handa, Dmitry Vyukov, linux-mm, LKML, Michal Hocko From: Michal Hocko <mhocko@suse.com> Johannes had doubts that the current WARN in the memcg oom path when there is no eligible task is not all that useful because it doesn't really give any useful insight into the memcg state. My original intention was to make this lightweight but it is true that seeing a stack trace will likely be not sufficient when somebody gets back to us and report this warning. Therefore replace the current warning by the full oom report which will give us not only the back trace of the offending path but also the full memcg state - memory counters and existing tasks. Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Michal Hocko <mhocko@suse.com> --- include/linux/oom.h | 2 ++ mm/memcontrol.c | 24 +++++++++++++----------- mm/oom_kill.c | 8 ++++---- 3 files changed, 19 insertions(+), 15 deletions(-) diff --git a/include/linux/oom.h b/include/linux/oom.h index a16a155a0d19..7424f9673cd1 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -133,6 +133,8 @@ extern struct task_struct *find_lock_task_mm(struct task_struct *p); extern int oom_evaluate_task(struct task_struct *task, void *arg); +extern void dump_oom_header(struct oom_control *oc, struct task_struct *victim); + /* sysctls */ extern int sysctl_oom_dump_tasks; extern int sysctl_oom_kill_allocating_task; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c80e5b6a8e9f..3d7c90e6c235 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1390,6 +1390,19 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, mutex_lock(&oom_lock); ret = out_of_memory(&oc); mutex_unlock(&oom_lock); + + /* + * under rare race the current task might have been selected while + * reaching mem_cgroup_out_of_memory and there is no other oom victim + * left. There is still no reason to warn because this task will + * die and release its bypassed charge eventually. + */ + if (tsk_is_oom_victim(current)) + return ret; + + pr_warn("Memory cgroup charge failed because of no reclaimable memory! " + "This looks like a misconfiguration or a kernel bug."); + dump_oom_header(&oc, NULL); return ret; } @@ -1706,17 +1719,6 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int if (mem_cgroup_out_of_memory(memcg, mask, order)) return OOM_SUCCESS; - /* - * under rare race the current task might have been selected while - * reaching mem_cgroup_out_of_memory and there is no other oom victim - * left. There is still no reason to warn because this task will - * die and release its bypassed charge eventually. - */ - if (tsk_is_oom_victim(current)) - return OOM_SUCCESS; - - WARN(1,"Memory cgroup charge failed because of no reclaimable memory! " - "This looks like a misconfiguration or a kernel bug."); return OOM_FAILED; } diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 104ef4a01a55..8918640fcb85 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -428,7 +428,7 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask) rcu_read_unlock(); } -static void dump_header(struct oom_control *oc, struct task_struct *p) +void dump_oom_header(struct oom_control *oc, struct task_struct *p) { pr_warn("%s invoked oom-killer: gfp_mask=%#x(%pGg), order=%d, oom_score_adj=%hd\n", current->comm, oc->gfp_mask, &oc->gfp_mask, oc->order, @@ -945,7 +945,7 @@ static void oom_kill_process(struct oom_control *oc, const char *message) task_unlock(p); if (__ratelimit(&oom_rs)) - dump_header(oc, p); + dump_oom_header(oc, p); pr_err("%s: Kill process %d (%s) score %u or sacrifice child\n", message, task_pid_nr(p), p->comm, points); @@ -1039,7 +1039,7 @@ static void check_panic_on_oom(struct oom_control *oc) /* Do not panic for oom kills triggered by sysrq */ if (is_sysrq_oom(oc)) return; - dump_header(oc, NULL); + dump_oom_header(oc, NULL); panic("Out of memory: %s panic_on_oom is enabled\n", sysctl_panic_on_oom == 2 ? "compulsory" : "system-wide"); } @@ -1129,7 +1129,7 @@ bool out_of_memory(struct oom_control *oc) select_bad_process(oc); /* Found nothing?!?! Either we hang forever, or we panic. */ if (!oc->chosen_task && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) { - dump_header(oc, NULL); + dump_oom_header(oc, NULL); panic("Out of memory and no killable processes...\n"); } if (oc->chosen_task && oc->chosen_task != INFLIGHT_VICTIM) -- 2.18.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] memcg, oom: emit oom report when there is no eligible task 2018-08-08 7:13 ` [PATCH 2/2] memcg, oom: emit oom report when there is no eligible task Michal Hocko @ 2018-08-08 14:45 ` Johannes Weiner 2018-08-08 16:17 ` Michal Hocko 0 siblings, 1 reply; 18+ messages in thread From: Johannes Weiner @ 2018-08-08 14:45 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, Vladimir Davydov, Greg Thelen, Tetsuo Handa, Dmitry Vyukov, linux-mm, LKML, Michal Hocko On Wed, Aug 08, 2018 at 09:13:01AM +0200, Michal Hocko wrote: > From: Michal Hocko <mhocko@suse.com> > > Johannes had doubts that the current WARN in the memcg oom path > when there is no eligible task is not all that useful because it doesn't > really give any useful insight into the memcg state. My original > intention was to make this lightweight but it is true that seeing > a stack trace will likely be not sufficient when somebody gets back to > us and report this warning. > > Therefore replace the current warning by the full oom report which will > give us not only the back trace of the offending path but also the full > memcg state - memory counters and existing tasks. > > Suggested-by: Johannes Weiner <hannes@cmpxchg.org> > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > include/linux/oom.h | 2 ++ > mm/memcontrol.c | 24 +++++++++++++----------- > mm/oom_kill.c | 8 ++++---- > 3 files changed, 19 insertions(+), 15 deletions(-) > > diff --git a/include/linux/oom.h b/include/linux/oom.h > index a16a155a0d19..7424f9673cd1 100644 > --- a/include/linux/oom.h > +++ b/include/linux/oom.h > @@ -133,6 +133,8 @@ extern struct task_struct *find_lock_task_mm(struct task_struct *p); > > extern int oom_evaluate_task(struct task_struct *task, void *arg); > > +extern void dump_oom_header(struct oom_control *oc, struct task_struct *victim); > + > /* sysctls */ > extern int sysctl_oom_dump_tasks; > extern int sysctl_oom_kill_allocating_task; > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index c80e5b6a8e9f..3d7c90e6c235 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1390,6 +1390,19 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > mutex_lock(&oom_lock); > ret = out_of_memory(&oc); > mutex_unlock(&oom_lock); > + > + /* > + * under rare race the current task might have been selected while > + * reaching mem_cgroup_out_of_memory and there is no other oom victim > + * left. There is still no reason to warn because this task will > + * die and release its bypassed charge eventually. "rare race" is a bit vague. Can we describe the situation? /* * We killed and reaped every task in the group, and still no * luck with the charge. This is likely the result of a crazy * configuration, let the user know. * * With one exception: current is the last task, it's already * been killed and reaped, but that wasn't enough to satisfy * the charge request under the configured limit. In that case * let it bypass quietly and current exit. */ And after spelling that out, I no longer think we want to skip the OOM header in that situation. The first paragraph still applies: this is probably a funny configuration, we're going to bypass the charge, let the user know that we failed containment - to help THEM identify by themselves what is likely an easy to fix problem. > + */ > + if (tsk_is_oom_victim(current)) > + return ret; > + > + pr_warn("Memory cgroup charge failed because of no reclaimable memory! " > + "This looks like a misconfiguration or a kernel bug."); > + dump_oom_header(&oc, NULL); All other sites print the context first before printing the conclusion, we should probably do the same here. I'd also prefer keeping the message in line with the global case when no eligible tasks are left. There is no need to speculate whose fault this could be, that's apparent from the OOM header. If the user can't figure it out from the OOM header, they'll still report it to us. How about this? --- ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] memcg, oom: emit oom report when there is no eligible task 2018-08-08 14:45 ` Johannes Weiner @ 2018-08-08 16:17 ` Michal Hocko 2018-08-21 14:06 ` Michal Hocko 0 siblings, 1 reply; 18+ messages in thread From: Michal Hocko @ 2018-08-08 16:17 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, Vladimir Davydov, Greg Thelen, Tetsuo Handa, Dmitry Vyukov, linux-mm, LKML On Wed 08-08-18 10:45:15, Johannes Weiner wrote: > On Wed, Aug 08, 2018 at 09:13:01AM +0200, Michal Hocko wrote: > > From: Michal Hocko <mhocko@suse.com> > > > > Johannes had doubts that the current WARN in the memcg oom path > > when there is no eligible task is not all that useful because it doesn't > > really give any useful insight into the memcg state. My original > > intention was to make this lightweight but it is true that seeing > > a stack trace will likely be not sufficient when somebody gets back to > > us and report this warning. > > > > Therefore replace the current warning by the full oom report which will > > give us not only the back trace of the offending path but also the full > > memcg state - memory counters and existing tasks. > > > > Suggested-by: Johannes Weiner <hannes@cmpxchg.org> > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > --- > > include/linux/oom.h | 2 ++ > > mm/memcontrol.c | 24 +++++++++++++----------- > > mm/oom_kill.c | 8 ++++---- > > 3 files changed, 19 insertions(+), 15 deletions(-) > > > > diff --git a/include/linux/oom.h b/include/linux/oom.h > > index a16a155a0d19..7424f9673cd1 100644 > > --- a/include/linux/oom.h > > +++ b/include/linux/oom.h > > @@ -133,6 +133,8 @@ extern struct task_struct *find_lock_task_mm(struct task_struct *p); > > > > extern int oom_evaluate_task(struct task_struct *task, void *arg); > > > > +extern void dump_oom_header(struct oom_control *oc, struct task_struct *victim); > > + > > /* sysctls */ > > extern int sysctl_oom_dump_tasks; > > extern int sysctl_oom_kill_allocating_task; > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index c80e5b6a8e9f..3d7c90e6c235 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -1390,6 +1390,19 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > > mutex_lock(&oom_lock); > > ret = out_of_memory(&oc); > > mutex_unlock(&oom_lock); > > + > > + /* > > + * under rare race the current task might have been selected while > > + * reaching mem_cgroup_out_of_memory and there is no other oom victim > > + * left. There is still no reason to warn because this task will > > + * die and release its bypassed charge eventually. > > "rare race" is a bit vague. Can we describe the situation? > > /* > * We killed and reaped every task in the group, and still no > * luck with the charge. This is likely the result of a crazy > * configuration, let the user know. > * > * With one exception: current is the last task, it's already > * been killed and reaped, but that wasn't enough to satisfy > * the charge request under the configured limit. In that case > * let it bypass quietly and current exit. > */ Sounds good. > And after spelling that out, I no longer think we want to skip the OOM > header in that situation. The first paragraph still applies: this is > probably a funny configuration, we're going to bypass the charge, let > the user know that we failed containment - to help THEM identify by > themselves what is likely an easy to fix problem. > > > + */ > > + if (tsk_is_oom_victim(current)) > > + return ret; > > + > > + pr_warn("Memory cgroup charge failed because of no reclaimable memory! " > > + "This looks like a misconfiguration or a kernel bug."); > > + dump_oom_header(&oc, NULL); > > All other sites print the context first before printing the > conclusion, we should probably do the same here. > > I'd also prefer keeping the message in line with the global case when > no eligible tasks are left. There is no need to speculate whose fault > this could be, that's apparent from the OOM header. If the user can't > figure it out from the OOM header, they'll still report it to us. > > How about this? > > --- > > >From bba01122f739b05a689dbf1eeeb4f0e07affd4e7 Mon Sep 17 00:00:00 2001 > From: Johannes Weiner <hannes@cmpxchg.org> > Date: Wed, 8 Aug 2018 09:59:40 -0400 > Subject: [PATCH] mm: memcontrol: print proper OOM header when no eligible > victim left > > When the memcg OOM killer runs out of killable tasks, it currently > prints a WARN with no further OOM context. This has caused some user > confusion. > > Warnings indicate a kernel problem. In a reported case, however, the > situation was triggered by a non-sensical memcg configuration (hard > limit set to 0). But without any VM context this wasn't obvious from > the report, and it took some back and forth on the mailing list to > identify what is actually a trivial issue. > > Handle this OOM condition like we handle it in the global OOM killer: > dump the full OOM context and tell the user we ran out of tasks. > > This way the user can identify misconfigurations easily by themselves > and rectify the problem - without having to go through the hassle of > running into an obscure but unsettling warning, finding the > appropriate kernel mailing list and waiting for a kernel developer to > remote-analyze that the memcg configuration caused this. > > If users cannot make sense of why the OOM killer was triggered or why > it failed, they will still report it to the mailing list, we know that > from experience. So in case there is an actual kernel bug causing > this, kernel developers will very likely hear about it. > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Yes this works as well. We would get a dump even for the race we have seen but I do not think this is something to lose sleep over. And if it triggers too often to be disturbing we can add tsk_is_oom_victim(current) check there. Acked-by: Michal Hocko <mhocko@suse.com> > --- > mm/memcontrol.c | 2 -- > mm/oom_kill.c | 13 ++++++++++--- > 2 files changed, 10 insertions(+), 5 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 4e3c1315b1de..29d9d1a69b36 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1701,8 +1701,6 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int > if (mem_cgroup_out_of_memory(memcg, mask, order)) > return OOM_SUCCESS; > > - WARN(1,"Memory cgroup charge failed because of no reclaimable memory! " > - "This looks like a misconfiguration or a kernel bug."); > return OOM_FAILED; > } > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 0e10b864e074..07ae222d7830 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -1103,10 +1103,17 @@ bool out_of_memory(struct oom_control *oc) > } > > select_bad_process(oc); > - /* Found nothing?!?! Either we hang forever, or we panic. */ > - if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) { > + /* Found nothing?!?! */ > + if (!oc->chosen) { > dump_header(oc, NULL); > - panic("Out of memory and no killable processes...\n"); > + pr_warn("Out of memory and no killable processes...\n"); > + /* > + * If we got here due to an actual allocation at the > + * system level, we cannot survive this and will enter > + * an endless loop in the allocator. Bail out now. > + */ > + if (!is_sysrq_oom(oc) && !is_memcg_oom(oc)) > + panic("System is deadlocked on memory\n"); > } > if (oc->chosen && oc->chosen != (void *)-1UL) > oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" : > -- > 2.18.0 > -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] memcg, oom: emit oom report when there is no eligible task 2018-08-08 16:17 ` Michal Hocko @ 2018-08-21 14:06 ` Michal Hocko 2018-08-21 17:20 ` Johannes Weiner 0 siblings, 1 reply; 18+ messages in thread From: Michal Hocko @ 2018-08-21 14:06 UTC (permalink / raw) To: Johannes Weiner Cc: Andrew Morton, Vladimir Davydov, Greg Thelen, Tetsuo Handa, Dmitry Vyukov, linux-mm, LKML Do you plan to repost these two? They are quite deep in the email thread so they can easily fall through cracks. On Wed 08-08-18 18:17:37, Michal Hocko wrote: > On Wed 08-08-18 10:45:15, Johannes Weiner wrote: [...] > > >From bba01122f739b05a689dbf1eeeb4f0e07affd4e7 Mon Sep 17 00:00:00 2001 > > From: Johannes Weiner <hannes@cmpxchg.org> > > Date: Wed, 8 Aug 2018 09:59:40 -0400 > > Subject: [PATCH] mm: memcontrol: print proper OOM header when no eligible > > victim left > > > > When the memcg OOM killer runs out of killable tasks, it currently > > prints a WARN with no further OOM context. This has caused some user > > confusion. > > > > Warnings indicate a kernel problem. In a reported case, however, the > > situation was triggered by a non-sensical memcg configuration (hard > > limit set to 0). But without any VM context this wasn't obvious from > > the report, and it took some back and forth on the mailing list to > > identify what is actually a trivial issue. > > > > Handle this OOM condition like we handle it in the global OOM killer: > > dump the full OOM context and tell the user we ran out of tasks. > > > > This way the user can identify misconfigurations easily by themselves > > and rectify the problem - without having to go through the hassle of > > running into an obscure but unsettling warning, finding the > > appropriate kernel mailing list and waiting for a kernel developer to > > remote-analyze that the memcg configuration caused this. > > > > If users cannot make sense of why the OOM killer was triggered or why > > it failed, they will still report it to the mailing list, we know that > > from experience. So in case there is an actual kernel bug causing > > this, kernel developers will very likely hear about it. > > > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> > > Yes this works as well. We would get a dump even for the race we have > seen but I do not think this is something to lose sleep over. And if it > triggers too often to be disturbing we can add > tsk_is_oom_victim(current) check there. > > Acked-by: Michal Hocko <mhocko@suse.com> > > > --- > > mm/memcontrol.c | 2 -- > > mm/oom_kill.c | 13 ++++++++++--- > > 2 files changed, 10 insertions(+), 5 deletions(-) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 4e3c1315b1de..29d9d1a69b36 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -1701,8 +1701,6 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int > > if (mem_cgroup_out_of_memory(memcg, mask, order)) > > return OOM_SUCCESS; > > > > - WARN(1,"Memory cgroup charge failed because of no reclaimable memory! " > > - "This looks like a misconfiguration or a kernel bug."); > > return OOM_FAILED; > > } > > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > > index 0e10b864e074..07ae222d7830 100644 > > --- a/mm/oom_kill.c > > +++ b/mm/oom_kill.c > > @@ -1103,10 +1103,17 @@ bool out_of_memory(struct oom_control *oc) > > } > > > > select_bad_process(oc); > > - /* Found nothing?!?! Either we hang forever, or we panic. */ > > - if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) { > > + /* Found nothing?!?! */ > > + if (!oc->chosen) { > > dump_header(oc, NULL); > > - panic("Out of memory and no killable processes...\n"); > > + pr_warn("Out of memory and no killable processes...\n"); > > + /* > > + * If we got here due to an actual allocation at the > > + * system level, we cannot survive this and will enter > > + * an endless loop in the allocator. Bail out now. > > + */ > > + if (!is_sysrq_oom(oc) && !is_memcg_oom(oc)) > > + panic("System is deadlocked on memory\n"); > > } > > if (oc->chosen && oc->chosen != (void *)-1UL) > > oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" : > > -- > > 2.18.0 > > > > -- > Michal Hocko > SUSE Labs -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] memcg, oom: emit oom report when there is no eligible task 2018-08-21 14:06 ` Michal Hocko @ 2018-08-21 17:20 ` Johannes Weiner 0 siblings, 0 replies; 18+ messages in thread From: Johannes Weiner @ 2018-08-21 17:20 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, Vladimir Davydov, Greg Thelen, Tetsuo Handa, Dmitry Vyukov, linux-mm, LKML I sent them in a separate thread. Thanks. On Tue, Aug 21, 2018 at 04:06:12PM +0200, Michal Hocko wrote: > Do you plan to repost these two? They are quite deep in the email thread > so they can easily fall through cracks. > > On Wed 08-08-18 18:17:37, Michal Hocko wrote: > > On Wed 08-08-18 10:45:15, Johannes Weiner wrote: > [...] > > > >From bba01122f739b05a689dbf1eeeb4f0e07affd4e7 Mon Sep 17 00:00:00 2001 > > > From: Johannes Weiner <hannes@cmpxchg.org> > > > Date: Wed, 8 Aug 2018 09:59:40 -0400 > > > Subject: [PATCH] mm: memcontrol: print proper OOM header when no eligible > > > victim left > > > > > > When the memcg OOM killer runs out of killable tasks, it currently > > > prints a WARN with no further OOM context. This has caused some user > > > confusion. > > > > > > Warnings indicate a kernel problem. In a reported case, however, the > > > situation was triggered by a non-sensical memcg configuration (hard > > > limit set to 0). But without any VM context this wasn't obvious from > > > the report, and it took some back and forth on the mailing list to > > > identify what is actually a trivial issue. > > > > > > Handle this OOM condition like we handle it in the global OOM killer: > > > dump the full OOM context and tell the user we ran out of tasks. > > > > > > This way the user can identify misconfigurations easily by themselves > > > and rectify the problem - without having to go through the hassle of > > > running into an obscure but unsettling warning, finding the > > > appropriate kernel mailing list and waiting for a kernel developer to > > > remote-analyze that the memcg configuration caused this. > > > > > > If users cannot make sense of why the OOM killer was triggered or why > > > it failed, they will still report it to the mailing list, we know that > > > from experience. So in case there is an actual kernel bug causing > > > this, kernel developers will very likely hear about it. > > > > > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> > > > > Yes this works as well. We would get a dump even for the race we have > > seen but I do not think this is something to lose sleep over. And if it > > triggers too often to be disturbing we can add > > tsk_is_oom_victim(current) check there. > > > > Acked-by: Michal Hocko <mhocko@suse.com> > > > > > --- > > > mm/memcontrol.c | 2 -- > > > mm/oom_kill.c | 13 ++++++++++--- > > > 2 files changed, 10 insertions(+), 5 deletions(-) > > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index 4e3c1315b1de..29d9d1a69b36 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -1701,8 +1701,6 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int > > > if (mem_cgroup_out_of_memory(memcg, mask, order)) > > > return OOM_SUCCESS; > > > > > > - WARN(1,"Memory cgroup charge failed because of no reclaimable memory! " > > > - "This looks like a misconfiguration or a kernel bug."); > > > return OOM_FAILED; > > > } > > > > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > > > index 0e10b864e074..07ae222d7830 100644 > > > --- a/mm/oom_kill.c > > > +++ b/mm/oom_kill.c > > > @@ -1103,10 +1103,17 @@ bool out_of_memory(struct oom_control *oc) > > > } > > > > > > select_bad_process(oc); > > > - /* Found nothing?!?! Either we hang forever, or we panic. */ > > > - if (!oc->chosen && !is_sysrq_oom(oc) && !is_memcg_oom(oc)) { > > > + /* Found nothing?!?! */ > > > + if (!oc->chosen) { > > > dump_header(oc, NULL); > > > - panic("Out of memory and no killable processes...\n"); > > > + pr_warn("Out of memory and no killable processes...\n"); > > > + /* > > > + * If we got here due to an actual allocation at the > > > + * system level, we cannot survive this and will enter > > > + * an endless loop in the allocator. Bail out now. > > > + */ > > > + if (!is_sysrq_oom(oc) && !is_memcg_oom(oc)) > > > + panic("System is deadlocked on memory\n"); > > > } > > > if (oc->chosen && oc->chosen != (void *)-1UL) > > > oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" : > > > -- > > > 2.18.0 > > > > > > > -- > > Michal Hocko > > SUSE Labs > > -- > Michal Hocko > SUSE Labs ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2018-08-21 17:20 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-08-07 7:25 [PATCH] memcg, oom: be careful about races when warning about no reclaimable task Michal Hocko 2018-08-07 10:15 ` Tetsuo Handa 2018-08-07 11:04 ` Michal Hocko 2018-08-07 20:19 ` Johannes Weiner 2018-08-07 20:38 ` Tetsuo Handa 2018-08-08 12:57 ` Tetsuo Handa 2018-08-08 13:16 ` Michal Hocko 2018-08-07 20:02 ` Johannes Weiner 2018-08-07 20:23 ` Michal Hocko 2018-08-07 20:54 ` Johannes Weiner 2018-08-08 6:44 ` Michal Hocko 2018-08-08 7:12 ` Michal Hocko 2018-08-08 7:13 ` [PATCH 1/2] " Michal Hocko 2018-08-08 7:13 ` [PATCH 2/2] memcg, oom: emit oom report when there is no eligible task Michal Hocko 2018-08-08 14:45 ` Johannes Weiner 2018-08-08 16:17 ` Michal Hocko 2018-08-21 14:06 ` Michal Hocko 2018-08-21 17:20 ` Johannes Weiner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).