All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Michal Hocko <mhocko@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-mm@kvack.org, syzkaller-bugs@googlegroups.com, guro@fb.com,
	kirill.shutemov@linux.intel.com, linux-kernel@vger.kernel.org,
	yang.s@alibaba-inc.com, Andrew Morton <akpm@linux-foundation.org>,
	Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Petr Mladek <pmladek@suse.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [PATCH] mm,oom: Use timeout based back off.
Date: Mon, 22 Oct 2018 14:11:10 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.21.1810221406400.120157@chino.kir.corp.google.com> (raw)
In-Reply-To: <1540033021-3258-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp>

On Sat, 20 Oct 2018, Tetsuo Handa wrote:

> This patch changes the OOM killer to wait for either
> 
>   (A) __mmput() of the OOM victim's mm completes
> 
> or
> 
>   (B) the OOM reaper gives up waiting for (A) because memory pages
>       used by the OOM victim's mm did not decrease for one second
> 
> in order to mitigate at least three problems
> 
>   (1) an OOM victim needlessly selects next OOM victim if the OOM-killed
>       processes are using clone(CLONE_VM) without CLONE_THREAD because
>       task_will_free_mem(current) in out_of_memory() returns false when
>       MMF_OOM_SKIP was set before remaining OOM-killed processes reach
>       out_of_memory().
> 
>   (2) an memcg OOM event needlessly selects next OOM victim because we
>       are assuming that the OOM reaper can reclaim majority of the OOM
>       victim's mm, but sometimes we need to wait for completion of
>       free_pgtables() in exit_mmap() in order to reclaim enough memory.
> 
>   (3) an memcg OOM event from a multithreaded process by an unprivileged
>       user can needlessly trigger flooding of "Out of memory and no
>       killable processes..." and dump_header() messages because
>       task_will_free_mem(current) in out_of_memory() returns false when
>       MMF_OOM_SKIP was set before remaining OOM-killed threads reach
>       out_of_memory().
> 
> all caused by setting MMF_OOM_SKIP too early.
> 
> Michal has proposed an attempt to handover setting of MMF_OOM_SKIP to
> the OOM victim's exit path [1] in order to handle (2), but there was no
> feedback (except me) and nobody knows whether it is really safe and is
> worth constrain future changes. Not only that attempt can mitigate only
> portion of exit_mmap() (rather than until the OOM victim thread becomes
> invisible from the OOM killer), that attempt does not help at all for (1)
> and (3) because __mmput() cannot be called.
> 
> I have proposed many patches which mitigate (1) and (3) without using
> timeout based approach, but Michal is rejecting them and wants to address
> the root cause that MMF_OOM_SKIP is set too early. And nobody (including
> Michal) has time to make the OOM reaper reclaim more memory (including
> mlock()ed and shared memory, and mmap_sem contention) before setting
> MMF_OOM_SKIP. We are deadlocked there.
> 
> Michal has been refusing timeout based approach, but I don't think this
> is something we have to be frayed around the edge about possibility of
> overlooking races/bugs just because Michal does not want to use timeout.
> I believe that timeout based back off is the only approach we can use
> for now.
> 

I've proposed patches that have been running for months in a production 
environment that make the oom killer useful without serially killing many 
processes unnecessarily.  At this point, it is *much* easier to just fork 
the oom killer logic rather than continue to invest time into fixing it in 
Linux.  That's unfortunate because I'm sure you realize how problematic 
the current implementation is, how abusive it is, and have seen its 
effects yourself.  I admire your persistance in trying to fix the issues 
surrounding the oom killer, but have come to the conclusion that forking 
it is a much better use of time.

  reply	other threads:[~2018-10-22 21:11 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-20 10:57 Tetsuo Handa
2018-10-22 21:11 ` David Rientjes [this message]
2018-10-23  5:56   ` Michal Hocko
2018-10-24 22:54   ` Andrew Morton
2018-10-25  7:11     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1810221406400.120157@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=yang.s@alibaba-inc.com \
    --subject='Re: [PATCH] mm,oom: Use timeout based back off.' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.