From: Michal Hocko <mhocko@kernel.org>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch] mm, oom: fix unnecessary killing of additional processes
Date: Fri, 15 Jun 2018 08:55:41 +0200 [thread overview]
Message-ID: <20180615065541.GA24039@dhcp22.suse.cz> (raw)
In-Reply-To: <alpine.DEB.2.21.1806141339580.4543@chino.kir.corp.google.com>
On Thu 14-06-18 13:42:59, David Rientjes wrote:
> The oom reaper ensures forward progress by setting MMF_OOM_SKIP itself if
> it cannot reap an mm. This can happen for a variety of reasons,
> including:
>
> - the inability to grab mm->mmap_sem in a sufficient amount of time,
>
> - when the mm has blockable mmu notifiers that could cause the oom reaper
> to stall indefinitely,
>
> but we can also add a third when the oom reaper can "reap" an mm but doing
> so is unlikely to free any amount of memory:
>
> - when the mm's memory is fully mlocked.
>
> When all memory is mlocked, the oom reaper will not be able to free any
> substantial amount of memory. It sets MMF_OOM_SKIP before the victim can
> unmap and free its memory in exit_mmap() and subsequent oom victims are
> chosen unnecessarily. This is trivial to reproduce if all eligible
> processes on the system have mlocked their memory: the oom killer calls
> panic() even though forward progress can be made.
>
> This is the same issue where the exit path sets MMF_OOM_SKIP before
> unmapping memory and additional processes can be chosen unnecessarily
> because the oom killer is racing with exit_mmap().
>
> We can't simply defer setting MMF_OOM_SKIP, however, because if there is
> a true oom livelock in progress, it never gets set and no additional
> killing is possible.
>
> To fix this, this patch introduces a per-mm reaping timeout, initially set
> at 10s. It requires that the oom reaper's list becomes a properly linked
> list so that other mm's may be reaped while waiting for an mm's timeout to
> expire.
>
> This replaces the current timeouts in the oom reaper: (1) when trying to
> grab mm->mmap_sem 10 times in a row with HZ/10 sleeps in between and (2)
> a HZ sleep if there are blockable mmu notifiers. It extends it with
> timeout to allow an oom victim to reach exit_mmap() before choosing
> additional processes unnecessarily.
>
> The exit path will now set MMF_OOM_SKIP only after all memory has been
> freed, so additional oom killing is justified, and rely on MMF_UNSTABLE to
> determine when it can race with the oom reaper.
>
> The oom reaper will now set MMF_OOM_SKIP only after the reap timeout has
> lapsed because it can no longer guarantee forward progress.
>
> The reaping timeout is intentionally set for a substantial amount of time
> since oom livelock is a very rare occurrence and it's better to optimize
> for preventing additional (unnecessary) oom killing than a scenario that
> is much more unlikely.
>
> Signed-off-by: David Rientjes <rientjes@google.com>
Nacked-by: Michal Hocko <mhocko@suse.com>
as already explained elsewhere in this email thread.
> ---
> Note: I understand there is an objection based on timeout based delays.
> This is currently the only possible way to avoid oom killing important
> processes completely unnecessarily. If the oom reaper can someday free
> all memory, including mlocked memory and those mm's with blockable mmu
> notifiers, and is guaranteed to always be able to grab mm->mmap_sem,
> this can be removed. I do not believe any such guarantee is possible
> and consider the massive killing of additional processes unnecessarily
> to be a regression introduced by the oom reaper and its very quick
> setting of MMF_OOM_SKIP to allow additional processes to be oom killed.
If you find oom reaper more harmful than useful I would be willing to
ack a comman line option to disable it. Especially when you keep
claiming that the lockups are not really happening in your environment.
Other than that I've already pointed to a more robust solution. If you
are reluctant to try it out I will do, but introducing a timeout is just
papering over the real problem. Maybe we will not reach the state that
_all_ the memory is reapable but we definitely should try to make as
much as possible to be reapable and I do not see any fundamental
problems in that direction.
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2018-06-15 6:55 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-24 21:22 [rfc patch] mm, oom: fix unnecessary killing of additional processes David Rientjes
2018-05-25 0:19 ` Tetsuo Handa
2018-05-25 19:44 ` David Rientjes
2018-05-25 7:26 ` Michal Hocko
2018-05-25 19:36 ` David Rientjes
2018-05-28 8:13 ` Michal Hocko
2018-05-30 21:06 ` David Rientjes
2018-05-31 6:32 ` Michal Hocko
2018-05-31 21:16 ` David Rientjes
2018-06-01 7:46 ` Michal Hocko
2018-06-05 4:25 ` David Rientjes
2018-06-05 8:57 ` Michal Hocko
2018-06-13 13:20 ` Tetsuo Handa
2018-06-13 13:29 ` Michal Hocko
2018-06-04 5:48 ` [lkp-robot] [mm, oom] 2d251ff6e6: BUG:unable_to_handle_kernel kernel test robot
2018-06-14 20:42 ` [patch] mm, oom: fix unnecessary killing of additional processes David Rientjes
2018-06-15 6:55 ` Michal Hocko [this message]
2018-06-15 23:15 ` David Rientjes
2018-06-19 8:33 ` Michal Hocko
2018-06-20 13:03 ` Michal Hocko
2018-06-20 20:34 ` David Rientjes
2018-06-21 7:45 ` Michal Hocko
2018-06-21 7:54 ` Michal Hocko
2018-06-21 20:50 ` David Rientjes
2018-06-22 7:42 ` Michal Hocko
2018-06-22 14:29 ` Michal Hocko
2018-06-22 18:49 ` David Rientjes
2018-06-25 9:04 ` Michal Hocko
2018-06-19 0:27 ` Andrew Morton
2018-06-19 8:47 ` Michal Hocko
2018-06-19 20:34 ` David Rientjes
2018-06-20 21:59 ` [patch v2] " David Rientjes
2018-06-21 10:58 ` kbuild test robot
2018-06-21 10:58 ` [RFC PATCH] mm, oom: oom_free_timeout_ms can be static kbuild test robot
2018-06-24 2:36 ` [patch] mm, oom: fix unnecessary killing of additional processes Tetsuo Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180615065541.GA24039@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).