linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: linux-mm@kvack.org, rientjes@google.com, oleg@redhat.com,
	vdavydov@parallels.com, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 08/10] mm, oom: task_will_free_mem should skip oom_reaped tasks
Date: Fri, 17 Jun 2016 14:56:53 +0200	[thread overview]
Message-ID: <20160617125653.GG21670@dhcp22.suse.cz> (raw)
In-Reply-To: <201606172035.BCG92033.HtSOFOOMVLJFFQ@I-love.SAKURA.ne.jp>

On Fri 17-06-16 20:35:38, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > 0-day robot has encountered the following:
> > [   82.694232] Out of memory: Kill process 3914 (trinity-c0) score 167 or sacrifice child
> > [   82.695110] Killed process 3914 (trinity-c0) total-vm:55864kB, anon-rss:1512kB, file-rss:1088kB, shmem-rss:25616kB
> > [   82.706724] oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:26488kB
> > [   82.715540] oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:26900kB
> > [   82.717662] oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:26900kB
> > [   82.725804] oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:27296kB
> > [   82.739091] oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:28148kB
> > 
> > oom_reaper is trying to reap the same task again and again. This
> > is possible only when the oom killer is bypassed because of
> > task_will_free_mem because we skip over tasks with MMF_OOM_REAPED
> > already set during select_bad_process. Teach task_will_free_mem to skip
> > over MMF_OOM_REAPED tasks as well because they will be unlikely to free
> > anything more.
> 
> I agree that we need to prevent same mm from being selected forever. But I
> feel worried about this patch. We are reaching a stage what purpose we set
> TIF_MEMDIE for. mark_oom_victim() sets TIF_MEMDIE on a thread with oom_lock
> held. Thus, if a mm which the TIF_MEMDIE thread is using is reapable (likely
> yes), __oom_reap_task() will likely be the next thread which will get that lock
> because __oom_reap_task() uses mutex_lock(&oom_lock) whereas other threads
> using that mm use mutex_trylock(&oom_lock). As a result, regarding CONFIG_MMU=y
> kernels, I guess that
> 
> 	if (task_will_free_mem(current)) {
> 
> shortcut in out_of_memory() likely becomes an useless condition. Since the OOM
> reaper will quickly reap mm and set MMF_OOM_REAPED on that mm and clear
> TIF_MEMDIE, other threads using that mm will fail to get TIF_MEMDIE (because
> task_will_free_mem() will start returning false due to this patch) and proceed
> to next OOM victim selection.

I suspect you are overthinking this. Just try to imagine what would have
to happen in order to get another victim:

CPU1					CPU2
__alloc_pages_slowpath
  __alloc_pages_may_oom
    mutex_lock(oom_lock)
    out_of_memory
      task_will_free_mem
        mark_oom_victim
	wake_oom_reaper
					__oom_reap_task
    mutex_unlock(oom_lock)
    					  mutex_lock(oom_lock)
					  unmap_page_range # For all VMAs
					  tlb_finish_mmu
					  set_bit(MMF_OOM_REAPED)
					  mutex_unlock(oom_lock)

  <back in allocator with access to memory reserves>

  __alloc_pages_may_oom
    mutex_lock()
    out_of_memory
    					exit_oom_victim
      task_will_free_mem # False

There will a large window when the current will have TIF_MEMDIE and
there will be memory freed by the oom reaper to get us out of the
mess. Even if that wasn't the case and the address space is not really
reapable then the victim had quite some time to use memory reserves and
move on. And if even that didn't help then it is really hard to judge
whether the victim would benefit from more time.

That being said even if the TIF_MEMDIE wouldn't be used (which is
unlikely because tearing down the address space is likely to take some
time) then the reaper will be freeing memory in the background to help
get away from OOM.

Or did you have any other scenario in mind?

> The comment

>          * That thread will now get access to memory reserves since it has a
>          * pending fatal signal.
> 
> in oom_kill_process() became almost dead. Since we need a short delay in order
> to allow get_page_from_freelist() to allocate from memory reclaimed by
> __oom_reap_task(), this patch might increase possibility of excessively
> preventing OOM-killed threads from using ALLOC_NO_WATERMARKS via TIF_MEMDIE
> and increase possibility of needlessly selecting next OOM victim.

It seems that you are assuming that the oom reaper will not reclaim much
memory. Even if that was the case (e.g. large amount of memory which is
not not directly bound to mm like socket and other kernel buffers but
even then this would be hardly a new problem introduced by this patch
because many of those resources are deallocated past exit_mm).

> So, maybe we shouldn't let this shortcut to return false as soon as
> MMF_OOM_REAPED is set.

What would be an alternative?

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2016-06-17 12:56 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-09 11:52 [PATCH 0/10 -v4] Handle oom bypass more gracefully Michal Hocko
2016-06-09 11:52 ` [PATCH 01/10] proc, oom: drop bogus task_lock and mm check Michal Hocko
2016-06-09 11:52 ` [PATCH 02/10] proc, oom: drop bogus sighand lock Michal Hocko
2016-06-09 11:52 ` [PATCH 03/10] proc, oom_adj: extract oom_score_adj setting into a helper Michal Hocko
2016-06-09 11:52 ` [PATCH 04/10] mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj Michal Hocko
2016-06-15 15:03   ` Oleg Nesterov
2016-06-09 11:52 ` [PATCH 05/10] mm, oom: skip vforked tasks from being selected Michal Hocko
2016-06-15 14:51   ` Oleg Nesterov
2016-06-16  6:24     ` Michal Hocko
2016-06-09 11:52 ` [PATCH 06/10] mm, oom: kill all tasks sharing the mm Michal Hocko
2016-06-09 11:52 ` [PATCH 07/10] mm, oom: fortify task_will_free_mem Michal Hocko
2016-06-09 13:18   ` Tetsuo Handa
2016-06-09 14:20     ` Michal Hocko
2016-06-11  8:10       ` Tetsuo Handa
2016-06-13 11:27         ` Michal Hocko
2016-06-16 12:54           ` Tetsuo Handa
2016-06-16 14:29             ` Michal Hocko
2016-06-16 15:40               ` Tetsuo Handa
2016-06-16 15:53                 ` Michal Hocko
2016-06-17 11:38                   ` Tetsuo Handa
2016-06-17 12:26                     ` Michal Hocko
2016-06-17 13:12                       ` Tetsuo Handa
2016-06-17 13:29                         ` Michal Hocko
2016-06-09 11:52 ` [PATCH 08/10] mm, oom: task_will_free_mem should skip oom_reaped tasks Michal Hocko
2016-06-17 11:35   ` Tetsuo Handa
2016-06-17 12:56     ` Michal Hocko [this message]
2016-06-09 11:52 ` [PATCH 09/10] mm, oom_reaper: do not attempt to reap a task more than twice Michal Hocko
2016-06-15 14:48   ` Oleg Nesterov
2016-06-16  6:28     ` Michal Hocko
2016-06-09 11:52 ` [PATCH 10/10] mm, oom: hide mm which is shared with kthread or global init Michal Hocko
2016-06-09 15:15   ` Tetsuo Handa
2016-06-09 15:41     ` Michal Hocko
2016-06-16 13:15       ` Tetsuo Handa
2016-06-16 13:36         ` Tetsuo Handa
2016-06-15 14:37   ` Oleg Nesterov
2016-06-16  6:31     ` Michal Hocko
2016-06-13 11:23 ` [PATCH 0/10 -v4] Handle oom bypass more gracefully Michal Hocko
2016-06-13 14:13   ` Michal Hocko
2016-06-14 20:17     ` Oleg Nesterov
2016-06-14 20:44       ` Oleg Nesterov
2016-06-16  6:33       ` Michal Hocko
2016-06-15 15:09 ` Oleg Nesterov
2016-06-16  6:34   ` Michal Hocko
  -- strict thread matches above, loose matches on Subject: below --
2016-06-20 12:43 [PATCH 0/10 -v5] " Michal Hocko
2016-06-20 12:43 ` [PATCH 08/10] mm, oom: task_will_free_mem should skip oom_reaped tasks Michal Hocko
2016-06-03  9:16 [PATCH 0/10 -v3] Handle oom bypass more gracefully Michal Hocko
2016-06-03  9:16 ` [PATCH 08/10] mm, oom: task_will_free_mem should skip oom_reaped tasks Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160617125653.GG21670@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=oleg@redhat.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=rientjes@google.com \
    --cc=vdavydov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).