From: Roman Gushchin <guro@fb.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
<hannes@cmpxchg.org>, <vdavydov.dev@gmail.com>,
<kernel-team@fb.com>, <linux-mm@kvack.org>,
<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm,oom: fix oom invocation issues
Date: Thu, 18 May 2017 14:20:33 +0100 [thread overview]
Message-ID: <20170518132033.GA12219@castle> (raw)
In-Reply-To: <20170518090039.GC25462@dhcp22.suse.cz>
On Thu, May 18, 2017 at 11:00:39AM +0200, Michal Hocko wrote:
> On Thu 18-05-17 10:47:29, Michal Hocko wrote:
> >
> > Hmm, I guess you are right. I haven't realized that pagefault_out_of_memory
> > can race and pick up another victim. For some reason I thought that the
> > page fault would break out on fatal signal pending but we don't do that (we
> > used to in the past). Now that I think about that more we should
> > probably remove out_of_memory out of pagefault_out_of_memory completely.
> > It is racy and it basically doesn't have any allocation context so we
> > might kill a task from a different domain. So can we do this instead?
> > There is a slight risk that somebody might have returned VM_FAULT_OOM
> > without doing an allocation but from my quick look nobody does that
> > currently.
>
> If this is considered too risky then we can do what Roman was proposing
> and check tsk_is_oom_victim in pagefault_out_of_memory and bail out.
Hi, Michal!
If we consider this approach, I've prepared a separate patch for this problem
(stripped all oom reaper list stuff).
Thanks!
>From 317fad44a0fe79fb76e8e4fd6bd81c52ae1712e9 Mon Sep 17 00:00:00 2001
From: Roman Gushchin <guro@fb.com>
Date: Tue, 16 May 2017 21:19:56 +0100
Subject: [PATCH] mm,oom: prevent OOM double kill from a pagefault handling
path
During the debugging of some OOM-related stuff, I've noticed
that sometimes OOM kills two processes instead of one.
The problem can be easily reproduced on a vanilla kernel:
[ 25.721494] allocate invoked oom-killer: gfp_mask=0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null), order=0, oom_score_adj=0
[ 25.725658] allocate cpuset=/ mems_allowed=0
[ 25.727033] CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
[ 25.729215] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 25.729598] Call Trace:
[ 25.729598] dump_stack+0x63/0x82
[ 25.729598] dump_header+0x97/0x21a
[ 25.729598] ? do_try_to_free_pages+0x2d7/0x360
[ 25.729598] ? security_capable_noaudit+0x45/0x60
[ 25.729598] oom_kill_process+0x219/0x3e0
[ 25.729598] out_of_memory+0x11d/0x480
[ 25.729598] __alloc_pages_slowpath+0xc84/0xd40
[ 25.729598] __alloc_pages_nodemask+0x245/0x260
[ 25.729598] alloc_pages_vma+0xa2/0x270
[ 25.729598] __handle_mm_fault+0xca9/0x10c0
[ 25.729598] handle_mm_fault+0xf3/0x210
[ 25.729598] __do_page_fault+0x240/0x4e0
[ 25.729598] trace_do_page_fault+0x37/0xe0
[ 25.729598] do_async_page_fault+0x19/0x70
[ 25.729598] async_page_fault+0x28/0x30
< cut >
[ 25.810868] oom_reaper: reaped process 492 (allocate), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
< cut >
[ 25.817589] allocate invoked oom-killer: gfp_mask=0x0(), nodemask=(null), order=0, oom_score_adj=0
[ 25.818821] allocate cpuset=/ mems_allowed=0
[ 25.819259] CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
[ 25.819847] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 25.820549] Call Trace:
[ 25.820733] dump_stack+0x63/0x82
[ 25.820961] dump_header+0x97/0x21a
[ 25.820961] ? security_capable_noaudit+0x45/0x60
[ 25.820961] oom_kill_process+0x219/0x3e0
[ 25.820961] out_of_memory+0x11d/0x480
[ 25.820961] pagefault_out_of_memory+0x68/0x80
[ 25.820961] mm_fault_error+0x8f/0x190
[ 25.820961] ? handle_mm_fault+0xf3/0x210
[ 25.820961] __do_page_fault+0x4b2/0x4e0
[ 25.820961] trace_do_page_fault+0x37/0xe0
[ 25.820961] do_async_page_fault+0x19/0x70
[ 25.820961] async_page_fault+0x28/0x30
< cut >
[ 25.863078] Out of memory: Kill process 233 (firewalld) score 10 or sacrifice child
[ 25.863634] Killed process 233 (firewalld) total-vm:246076kB, anon-rss:20956kB, file-rss:0kB, shmem-rss:0kB
This actually happens if pagefault_out_of_memory() is called
after the calling process has already been selected as an OOM victim
and killed. There is a race with the oom reaper: if the process
is reaped before it enters out_of_memory(), the MMF_OOM_SKIP
flag is set, and out_of_memory() will not consider the process
as a eligible victim. That means that another victim will be selected
and killed.
Tetsuo Handa has noticed, that this is a side effect of
commit 9a67f6488eca926f ("mm: consolidate GFP_NOFAIL checks
in the allocator slowpath").
To avoid this, out_of_memory() shouldn't be called from
pagefault_out_of_memory(), if current task already
has been chosen as an oom victim.
v2: dropped changes related to the oom_reaper synchronization,
as it looks like a separate and minor issue;
rebased on new mm;
renamed, updated commit message.
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: kernel-team@fb.com
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
mm/oom_kill.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 04c9143..9c643a3 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -1068,6 +1068,9 @@ void pagefault_out_of_memory(void)
if (mem_cgroup_oom_synchronize(true))
return;
+ if (tsk_is_oom_victim(current))
+ return;
+
if (!mutex_trylock(&oom_lock))
return;
out_of_memory(&oc);
--
2.7.4
WARNING: multiple messages have this Message-ID
From: Roman Gushchin <guro@fb.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
hannes@cmpxchg.org, vdavydov.dev@gmail.com, kernel-team@fb.com,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm,oom: fix oom invocation issues
Date: Thu, 18 May 2017 14:20:33 +0100 [thread overview]
Message-ID: <20170518132033.GA12219@castle> (raw)
In-Reply-To: <20170518090039.GC25462@dhcp22.suse.cz>
On Thu, May 18, 2017 at 11:00:39AM +0200, Michal Hocko wrote:
> On Thu 18-05-17 10:47:29, Michal Hocko wrote:
> >
> > Hmm, I guess you are right. I haven't realized that pagefault_out_of_memory
> > can race and pick up another victim. For some reason I thought that the
> > page fault would break out on fatal signal pending but we don't do that (we
> > used to in the past). Now that I think about that more we should
> > probably remove out_of_memory out of pagefault_out_of_memory completely.
> > It is racy and it basically doesn't have any allocation context so we
> > might kill a task from a different domain. So can we do this instead?
> > There is a slight risk that somebody might have returned VM_FAULT_OOM
> > without doing an allocation but from my quick look nobody does that
> > currently.
>
> If this is considered too risky then we can do what Roman was proposing
> and check tsk_is_oom_victim in pagefault_out_of_memory and bail out.
Hi, Michal!
If we consider this approach, I've prepared a separate patch for this problem
(stripped all oom reaper list stuff).
Thanks!
next prev parent reply other threads:[~2017-05-18 13:21 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-17 15:26 Roman Gushchin
2017-05-17 15:26 ` Roman Gushchin
2017-05-17 16:14 ` Michal Hocko
2017-05-17 16:14 ` Michal Hocko
2017-05-17 19:43 ` Roman Gushchin
2017-05-17 19:43 ` Roman Gushchin
2017-05-17 22:03 ` Tetsuo Handa
2017-05-17 22:03 ` Tetsuo Handa
2017-05-18 8:47 ` Michal Hocko
2017-05-18 8:47 ` Michal Hocko
2017-05-18 9:00 ` Michal Hocko
2017-05-18 9:00 ` Michal Hocko
2017-05-18 13:20 ` Roman Gushchin [this message]
2017-05-18 13:20 ` Roman Gushchin
2017-05-18 13:57 ` Tetsuo Handa
2017-05-18 13:57 ` Tetsuo Handa
2017-05-18 14:29 ` Michal Hocko
2017-05-18 14:29 ` Michal Hocko
2017-05-18 14:57 ` Tetsuo Handa
2017-05-18 14:57 ` Tetsuo Handa
2017-05-18 15:07 ` Michal Hocko
2017-05-18 15:07 ` Michal Hocko
2017-05-18 15:01 ` Michal Hocko
2017-05-18 15:01 ` Michal Hocko
2017-05-18 8:01 ` Michal Hocko
2017-05-18 8:01 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170518132033.GA12219@castle \
--to=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=vdavydov.dev@gmail.com \
--subject='Re: [PATCH] mm,oom: fix oom invocation issues' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.