linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: hannes@cmpxchg.org, mhocko@kernel.org
Cc: akpm@linux-foundation.org, mgorman@suse.de, rientjes@google.com,
	torvalds@linux-foundation.org, oleg@redhat.com, hughd@google.com,
	andrea@kernel.org, riel@redhat.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, mhocko@suse.com
Subject: Re: [PATCH 3/2] oom: clear TIF_MEMDIE after oom_reaper managed to unmap the address space
Date: Mon, 15 Feb 2016 19:58:50 +0900	[thread overview]
Message-ID: <201602151958.HCJ48972.FFOFOLMHSQVJtO@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <20160111165214.GA32132@cmpxchg.org>

Andrew Morton wrote:
> 
> The patch titled
>      Subject: mm/oom_kill.c: don't ignore oom score on exiting tasks
> has been removed from the -mm tree.  Its filename was
>      mm-oom_killc-dont-skip-pf_exiting-tasks-when-searching-for-a-victim.patch
> 
> This patch was dropped because an updated version will be merged
> 
> ------------------------------------------------------
> From: Johannes Weiner <hannes@cmpxchg.org>
> Subject: mm/oom_kill.c: don't ignore oom score on exiting tasks
> 
> When the OOM killer scans tasks and encounters a PF_EXITING one, it
> force-selects that one regardless of the score.  Is there a possibility
> that the task might hang after it has set PF_EXITING?  In that case the
> OOM killer should be able to move on to the next task.
> 
> Frankly, I don't even know why we check for exiting tasks in the OOM
> killer.  We've tried direct reclaim at least 15 times by the time we
> decide the system is OOM, there was plenty of time to exit and free
> memory; and a task might exit voluntarily right after we issue a kill. 
> This is testing pure noise.
> 

I can't find updated version of this patch in linux-next. Why don't you submit?
I think the patch description should be updated because this patch solves yet
another silent OOM livelock bug.

Say, there is a process with two threads named Thread1 and Thread2.
Since the OOM killer sets TIF_MEMDIE only on the first non-NULL mm task,
it is possible that Thread2 invokes the OOM killer and Thread1 gets
TIF_MEMDIE (without sending SIGKILL to processes using Thread1's mm).

----------
Thread1                       Thread2
                              Calls mmap()
Calls _exit(0)
                              Arrives at vm_mmap_pgoff()
Arrives at do_exit()
Gets PF_EXITING via exit_signals()
                              Calls down_write(&mm->mmap_sem)
                              Calls do_mmap_pgoff()
Calls down_read(&mm->mmap_sem) from exit_mm()
                              Does a GFP_KERNEL allocation
                              Calls out_of_memory()
                              oom_scan_process_thread(Thread1) returns OOM_SCAN_ABORT

down_read(&mm->mmap_sem) is waiting for Thread2 to call up_write(&mm->mmap_sem)
                              but Thread2 is waiting for Thread1 to set Thread1->mm = NULL ... silent OOM livelock!
----------

The OOM reaper tries to avoid this livelock by using down_read_trylock()
instead of down_read(), but core_state check in exit_mm() cannot avoid this
livelock unless we use non-blocking allocation (i.e. GFP_ATOMIC or GFP_NOWAIT)
for allocations between down_write(&mm->mmap_sem) and up_write(&mm->mmap_sem).

I think that the same problem exists for any task_will_free_mem()-based
optimizations such as

        if (current->mm &&
            (fatal_signal_pending(current) || task_will_free_mem(current))) {
                mark_oom_victim(current);
                return true;
        }

in out_of_memory() and

        task_lock(p);
        if (p->mm && task_will_free_mem(p)) {
                mark_oom_victim(p);
                task_unlock(p);
                put_task_struct(p);
                return;
        }
        task_unlock(p);

in oom_kill_process() and

        if (fatal_signal_pending(current) || task_will_free_mem(current)) {
                mark_oom_victim(current);
                goto unlock;
        }

in mem_cgroup_out_of_memory().

Well, what are possible callers of task_will_free_mem(current) between getting
PF_EXITING and doing current->mm = NULL ? tty_audit_exit() seems to be an example
which does a GFP_KERNEL allocation from tty_audit_log() and can be later blocked
at down_read() in exit_mm() after TIF_MEMDIE is set at tty_audit_log() called from
tty_audit_exit() ?

Is task_will_free_mem(current) possible for mem_cgroup_out_of_memory() case?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2016-02-15 10:59 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-06 15:42 [PATCH 0/2 -mm] oom reaper v4 Michal Hocko
2016-01-06 15:42 ` [PATCH 1/2] mm, oom: introduce oom reaper Michal Hocko
2016-01-07 11:23   ` Tetsuo Handa
2016-01-07 12:30     ` Michal Hocko
2016-01-11 22:54   ` Andrew Morton
2016-01-12  8:16     ` Michal Hocko
2016-01-28  1:28   ` David Rientjes
2016-01-28 21:42     ` Michal Hocko
2016-02-02  3:02       ` David Rientjes
2016-02-02  8:57         ` Michal Hocko
2016-02-02 11:48           ` Tetsuo Handa
2016-02-02 22:55             ` David Rientjes
2016-02-02 22:51           ` David Rientjes
2016-02-03 10:31             ` Tetsuo Handa
2016-01-06 15:42 ` [PATCH 2/2] oom reaper: handle anonymous mlocked pages Michal Hocko
2016-01-07  8:14   ` Michal Hocko
2016-01-11 12:42 ` [PATCH 3/2] oom: clear TIF_MEMDIE after oom_reaper managed to unmap the address space Michal Hocko
2016-01-11 16:52   ` Johannes Weiner
2016-01-11 17:46     ` Michal Hocko
2016-02-15 10:58     ` Tetsuo Handa [this message]
2016-01-18  4:35   ` Tetsuo Handa
2016-01-18 10:22     ` Tetsuo Handa
2016-01-26 16:38     ` Michal Hocko
2016-01-28 11:24       ` Tetsuo Handa
2016-01-28 21:51         ` Michal Hocko
2016-01-28 22:26           ` Tetsuo Handa
2016-01-28 22:36             ` Michal Hocko
2016-01-28 22:33   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201602151958.HCJ48972.FFOFOLMHSQVJtO@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=andrea@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=oleg@redhat.com \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).