linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: mhocko@kernel.org, oleg@redhat.com,
	torvalds@linux-foundation.org, kwalker@redhat.com, cl@linux.com,
	akpm@linux-foundation.org, hannes@cmpxchg.org,
	vdavydov@parallels.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, skozina@redhat.com
Subject: Re: can't oom-kill zap the victim's memory?
Date: Tue, 29 Sep 2015 15:56:30 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.10.1509291547560.3375@chino.kir.corp.google.com> (raw)
In-Reply-To: <201509291657.HHD73972.MOFVSHQtOJFOLF@I-love.SAKURA.ne.jp>

On Tue, 29 Sep 2015, Tetsuo Handa wrote:

> Is the story such simple? I think there are factors which disturb memory
> allocation with mmap_sem held for writing.
> 
>   down_write(&mm->mmap_sem);
>   kmalloc(GFP_KERNEL);
>   up_write(&mm->mmap_sem);
> 
> can involve locks inside __alloc_pages_slowpath().
> 
> Say, there are three userspace tasks named P1, P2T1, P2T2 and
> one kernel thread named KT1. Only P2T1 and P2T2 shares the same mm.
> KT1 is a kernel thread for fs writeback (maybe kswapd?).
> I think sequence shown below is possible.
> 
> (1) P1 enters into kernel mode via write() syscall.
> 
> (2) P1 allocates memory for buffered write.
> 
> (3) P2T1 enters into kernel mode and calls kmalloc().
> 
> (4) P2T1 arrives at __alloc_pages_may_oom() because there was no
>     reclaimable memory. (Memory allocated by P1 is not reclaimable
>     as of this moment.)
> 
> (5) P1 dirties memory allocated for buffered write.
> 
> (6) P2T2 enters into kernel mode and calls kmalloc() with
>     mmap_sem held for writing.
> 
> (7) KT1 finds dirtied memory.
> 
> (8) KT1 holds fs's unkillable lock for fs writeback.
> 
> (9) P2T2 is blocked at unkillable lock for fs writeback held by KT1.
> 
> (10) P2T1 calls out_of_memory() and the OOM killer chooses P2T1 and sets
>      TIF_MEMDIE on both P2T1 and P2T2.
> 
> (11) P2T2 got TIF_MEMDIE but is blocked at unkillable lock for fs writeback
>      held by KT1.
> 
> (12) KT1 is trying to allocate memory for fs writeback. But since P2T1 and
>      P2T2 cannot release memory because memory unmapping code cannot hold
>      mmap_sem for reading, KT1 waits forever.... OOM livelock completed!
> 
> I think sequence shown below is also possible. Say, there are three
> userspace tasks named P1, P2, P3 and one kernel thread named KT1.
> 
> (1) P1 enters into kernel mode via write() syscall.
> 
> (2) P1 allocates memory for buffered write.
> 
> (3) P2 enters into kernel mode and holds mmap_sem for writing.
> 
> (4) P3 enters into kernel mode and calls kmalloc().
> 
> (5) P3 arrives at __alloc_pages_may_oom() because there was no
>     reclaimable memory. (Memory allocated by P1 is not reclaimable
>     as of this moment.)
> 
> (6) P1 dirties memory allocated for buffered write.
> 
> (7) KT1 finds dirtied memory.
> 
> (8) KT1 holds fs's unkillable lock for fs writeback.
> 
> (9) P2 calls kmalloc() and is blocked at unkillable lock for fs writeback
>     held by KT1.
> 
> (10) P3 calls out_of_memory() and the OOM killer chooses P2 and sets
>      TIF_MEMDIE on P2.
> 
> (11) P2 got TIF_MEMDIE but is blocked at unkillable lock for fs writeback
>      held by KT1.
> 
> (12) KT1 is trying to allocate memory for fs writeback. But since P2 cannot
>      release memory because memory unmapping code cannot hold mmap_sem for
>      reading, KT1 waits forever.... OOM livelock completed!
> 
> So, allowing all OOM victim threads to use memory reserves does not guarantee
> that a thread which held mmap_sem for writing to make forward progress.
> 

Thank you for writing this all out, it definitely helps to understand the 
concerns.

This, in my understanding, is the same scenario that requires not only oom 
victims to be able to access memory reserves, but also any thread after an 
oom victim has failed to make a timely exit.

I point out mm->mmap_sem as a special case because we have had fixes in 
the past, such as the special fatal_signal_pending() handling in 
__get_user_pages(), that try to ensure forward progress since we know that 
we need exclusive mm->mmap_sem for the victim to make an exit.

I think both of your illustrations show why it is not helpful to kill 
additional processes after a time period has elapsed and a victim has 
failed to exit.  In both of your scenarios, it would require that KT1 be 
killed to allow forward progress and we know that's not possible.

Perhaps this is an argument that we need to provide access to memory 
reserves for threads even for !__GFP_WAIT and !__GFP_FS in such scenarios, 
but I would wait to make that extension until we see it in practice.

Killing all mm->mmap_sem threads certainly isn't meant to solve all oom 
killer livelocks, as you show.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-09-29 22:56 UTC|newest]

Thread overview: 109+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-17 17:59 [PATCH] mm/oom_kill.c: don't kill TASK_UNINTERRUPTIBLE tasks Kyle Walker
2015-09-17 19:22 ` Oleg Nesterov
2015-09-18 15:41   ` Christoph Lameter
2015-09-18 16:24     ` Oleg Nesterov
2015-09-18 16:39       ` Tetsuo Handa
2015-09-18 16:54         ` Oleg Nesterov
2015-09-18 17:00       ` Christoph Lameter
2015-09-18 19:07         ` Oleg Nesterov
2015-09-18 19:19           ` Christoph Lameter
2015-09-18 21:28             ` Kyle Walker
2015-09-18 22:07               ` Christoph Lameter
2015-09-19  8:32         ` Michal Hocko
2015-09-19 14:33           ` Tetsuo Handa
2015-09-19 15:51             ` Michal Hocko
2015-09-21 23:33             ` David Rientjes
2015-09-22  5:33               ` Tetsuo Handa
2015-09-22 23:32                 ` David Rientjes
2015-09-23 12:03                   ` Kyle Walker
2015-09-24 11:50                     ` Tetsuo Handa
2015-09-19 14:44           ` Oleg Nesterov
2015-09-21 23:27         ` David Rientjes
2015-09-19  8:25     ` Michal Hocko
2015-09-19  8:22 ` Michal Hocko
2015-09-21 23:08   ` David Rientjes
2015-09-19 15:03 ` can't oom-kill zap the victim's memory? Oleg Nesterov
2015-09-19 15:10   ` Oleg Nesterov
2015-09-19 15:58   ` Michal Hocko
2015-09-20 13:16     ` Oleg Nesterov
2015-09-19 22:24   ` Linus Torvalds
2015-09-19 22:54     ` Raymond Jennings
2015-09-19 23:00     ` Raymond Jennings
2015-09-19 23:13       ` Linus Torvalds
2015-09-20  9:33     ` Michal Hocko
2015-09-20 13:06       ` Oleg Nesterov
2015-09-20 12:56     ` Oleg Nesterov
2015-09-20 18:05       ` Linus Torvalds
2015-09-20 18:21         ` Raymond Jennings
2015-09-20 18:23         ` Raymond Jennings
2015-09-20 19:07         ` Raymond Jennings
2015-09-21 13:57           ` Oleg Nesterov
2015-09-21 13:44         ` Oleg Nesterov
2015-09-21 14:24           ` Michal Hocko
2015-09-21 15:32             ` Oleg Nesterov
2015-09-21 16:12               ` Michal Hocko
2015-09-22 16:06                 ` Oleg Nesterov
2015-09-22 23:04                   ` David Rientjes
2015-09-23 20:59                   ` Michal Hocko
2015-09-24 21:15                     ` David Rientjes
2015-09-25  9:35                       ` Michal Hocko
2015-09-25 16:14                         ` Tetsuo Handa
2015-09-28 16:18                           ` Tetsuo Handa
2015-09-28 22:28                             ` David Rientjes
2015-10-02 12:36                             ` Michal Hocko
2015-10-02 19:01                               ` Linus Torvalds
2015-10-05 14:44                                 ` Michal Hocko
2015-10-07  5:16                                   ` Vlastimil Babka
2015-10-07 10:43                                     ` Tetsuo Handa
2015-10-08  9:40                                       ` Vlastimil Babka
2015-10-06  7:55                                 ` Eric W. Biederman
2015-10-06  8:49                                   ` Linus Torvalds
2015-10-06  8:55                                     ` Linus Torvalds
2015-10-06 14:52                                       ` Eric W. Biederman
2015-10-03  6:02                               ` Can't we use timeout based OOM warning/killing? Tetsuo Handa
2015-10-06 14:51                                 ` Tetsuo Handa
2015-10-12  6:43                                   ` Tetsuo Handa
2015-10-12 15:25                                     ` Silent hang up caused by pages being not scanned? Tetsuo Handa
2015-10-12 21:23                                       ` Linus Torvalds
2015-10-13 12:21                                         ` Tetsuo Handa
2015-10-13 16:37                                           ` Linus Torvalds
2015-10-14 12:21                                             ` Tetsuo Handa
2015-10-15 13:14                                             ` Michal Hocko
2015-10-16 15:57                                               ` Michal Hocko
2015-10-16 18:34                                                 ` Linus Torvalds
2015-10-16 18:49                                                   ` Tetsuo Handa
2015-10-19 12:57                                                     ` Michal Hocko
2015-10-19 12:53                                                   ` Michal Hocko
2015-10-13 13:32                                       ` Michal Hocko
2015-10-13 16:19                                         ` Tetsuo Handa
2015-10-14 13:22                                           ` Michal Hocko
2015-10-14 14:38                                             ` Tetsuo Handa
2015-10-14 14:59                                               ` Michal Hocko
2015-10-14 15:06                                                 ` Tetsuo Handa
2015-10-26 11:44                                     ` Newbie's question: memory allocation when reclaiming memory Tetsuo Handa
2015-11-05  8:46                                       ` Vlastimil Babka
2015-10-06 15:25                                 ` Can't we use timeout based OOM warning/killing? Linus Torvalds
2015-10-08 15:33                                   ` Tetsuo Handa
2015-10-10 12:50                                 ` Tetsuo Handa
2015-09-28 22:24                         ` can't oom-kill zap the victim's memory? David Rientjes
2015-09-29  7:57                           ` Tetsuo Handa
2015-09-29 22:56                             ` David Rientjes [this message]
2015-09-30  4:25                               ` Tetsuo Handa
2015-09-30 10:21                                 ` Tetsuo Handa
2015-09-30 21:11                                 ` David Rientjes
2015-10-01 12:13                                   ` Tetsuo Handa
2015-10-01 14:48                           ` Michal Hocko
2015-10-02 13:06                             ` Tetsuo Handa
2015-10-06 18:45                     ` Oleg Nesterov
2015-10-07 11:03                       ` Tetsuo Handa
2015-10-07 12:00                         ` Oleg Nesterov
2015-10-08 14:04                           ` Michal Hocko
2015-10-08 14:01                       ` Michal Hocko
2015-09-21 16:51               ` Tetsuo Handa
2015-09-22 12:43                 ` Oleg Nesterov
2015-09-22 14:30                   ` Tetsuo Handa
2015-09-22 14:45                     ` Oleg Nesterov
2015-09-21 23:42               ` David Rientjes
2015-09-21 16:55           ` Linus Torvalds
2015-09-20 14:50   ` Tetsuo Handa
2015-09-20 14:55     ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.10.1509291547560.3375@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=hannes@cmpxchg.org \
    --cc=kwalker@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=oleg@redhat.com \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=skozina@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vdavydov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).