All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: David Rientjes <rientjes@google.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [rfc patch] mm, oom: fix unnecessary killing of additional processes
Date: Tue, 5 Jun 2018 10:57:07 +0200	[thread overview]
Message-ID: <20180605085707.GV19202@dhcp22.suse.cz> (raw)
In-Reply-To: <alpine.DEB.2.21.1806042100200.71129@chino.kir.corp.google.com>

On Mon 04-06-18 21:25:39, David Rientjes wrote:
> On Fri, 1 Jun 2018, Michal Hocko wrote:
> 
> > > We've discussed the mm 
> > > having a single blockable mmu notifier.  Regardless of how we arrive at 
> > > the point where the oom reaper can't free memory, which could be any of 
> > > those three cases, if (1) the original victim is sufficiently large that 
> > > follow-up oom kills would become unnecessary and (2) other threads 
> > > allocate/charge before the oom victim reaches exit_mmap(), this occurs.
> > > 
> > > We have examples of cases where oom reaping was successful, but the rss 
> > > numbers in the kernel log are very similar to when it was oom killed and 
> > > the process is known not to mlock, the reason is because the oom reaper 
> > > could free very little memory due to blockable mmu notifiers.
> > 
> > Please be more specific. Which notifiers these were. Blockable notifiers
> > are a PITA and we should be addressing them. That requiers identifying
> > them first.
> > 
> 
> The most common offender seems to be ib_umem_notifier, but I have also 
> heard of possible occurrences for mv_invl_range_start() for xen, but that 
> would need more investigation.  The rather new invalidate_range callback 
> for hmm mirroring could also be problematic.  Any mmu_notifier without 
> MMU_INVALIDATE_DOES_NOT_BLOCK causes the mm to immediately be disregarded.  

Yes, this is unfortunate and it was meant as a stop gap quick fix with a
long term vision to be fixed properly. I am pretty sure that we can do
much better here. Teach mmu_notifier_invalidate_range_start to get a
non-block flag and back out on ranges that would block. I am pretty sure
that notifiers can be targeted a lot and so we can still process some
vmas at least.

> For this reason, we see testing harnesses often oom killed immediately 
> after running a unittest that stresses reclaim or compaction by inducing a 
> system-wide oom condition.  The harness spawns the unittest which spawns 
> an antagonist memory hog that is intended to be oom killed.  When memory 
> is mlocked or there are a large number of threads faulting memory for the 
> antagonist, the unittest and the harness itself get oom killed because the 
> oom reaper sets MMF_OOM_SKIP; this ends up happening a lot on powerpc.  
> The memory hog has mm->mmap_sem readers queued ahead of a writer that is 
> doing mmap() so the oom reaper can't grab the sem quickly enough.

How come the writer doesn't back off. mmap paths should be taking an
exclusive mmap sem in killable sleep so it should back off. Or is the
holder of the lock deep inside mmap path doing something else and not
backing out with the exclusive lock held?
 
[...]

> > As I've already said. I will nack any timeout based solution until we
> > address all particular problems and still see more to come. Here we have
> > a clear goal. Address mlocked pages and identify mmu notifier offenders.
> 
> I cannot fix all mmu notifiers to not block, I can't fix the configuration 
> to allow direct compaction for thp allocations and a large number of 
> concurrent faulters, and I cannot fix userspace mlocking a lot of memory.  
> It's worthwhile to work in that direction, but it will never be 100% 
> possible to avoid.  We must have a solution that prevents innocent 
> processes from consistently being oom killed completely unnecessarily.

None of the above has been attempted and shown not worth doing. The oom
even should be a rare thing to happen so I absolutely do not see any
reason to rush any misdesigned fix to be done right now.

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2018-06-05  8:57 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-24 21:22 [rfc patch] mm, oom: fix unnecessary killing of additional processes David Rientjes
2018-05-25  0:19 ` Tetsuo Handa
2018-05-25 19:44   ` David Rientjes
2018-05-25  7:26 ` Michal Hocko
2018-05-25 19:36   ` David Rientjes
2018-05-28  8:13     ` Michal Hocko
2018-05-30 21:06       ` David Rientjes
2018-05-31  6:32         ` Michal Hocko
2018-05-31 21:16           ` David Rientjes
2018-06-01  7:46             ` Michal Hocko
2018-06-05  4:25               ` David Rientjes
2018-06-05  8:57                 ` Michal Hocko [this message]
2018-06-13 13:20                   ` Tetsuo Handa
2018-06-13 13:29                     ` Michal Hocko
2018-06-04  5:48 ` [lkp-robot] [mm, oom] 2d251ff6e6: BUG:unable_to_handle_kernel kernel test robot
2018-06-04  5:48   ` kernel test robot
2018-06-14 20:42 ` [patch] mm, oom: fix unnecessary killing of additional processes David Rientjes
2018-06-15  6:55   ` Michal Hocko
2018-06-15 23:15     ` David Rientjes
2018-06-19  8:33       ` Michal Hocko
2018-06-20 13:03         ` Michal Hocko
2018-06-20 20:34           ` David Rientjes
2018-06-21  7:45             ` Michal Hocko
2018-06-21  7:54               ` Michal Hocko
2018-06-21 20:50               ` David Rientjes
2018-06-22  7:42                 ` Michal Hocko
2018-06-22 14:29                   ` Michal Hocko
2018-06-22 18:49                     ` David Rientjes
2018-06-25  9:04                       ` Michal Hocko
2018-06-19  0:27   ` Andrew Morton
2018-06-19  8:47     ` Michal Hocko
2018-06-19 20:34     ` David Rientjes
2018-06-20 21:59       ` [patch v2] " David Rientjes
2018-06-21 10:58         ` kbuild test robot
2018-06-21 10:58         ` [RFC PATCH] mm, oom: oom_free_timeout_ms can be static kbuild test robot
2018-06-24  2:36   ` [patch] mm, oom: fix unnecessary killing of additional processes Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180605085707.GV19202@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.