linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
To: mhocko@kernel.org, rientjes@google.com
Cc: oleg@redhat.com, torvalds@linux-foundation.org,
	kwalker@redhat.com, cl@linux.com, akpm@linux-foundation.org,
	hannes@cmpxchg.org, vdavydov@parallels.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, skozina@redhat.com
Subject: Re: can't oom-kill zap the victim's memory?
Date: Tue, 29 Sep 2015 01:18:00 +0900	[thread overview]
Message-ID: <201509290118.BCJ43256.tSFFFMOLHVOJOQ@I-love.SAKURA.ne.jp> (raw)
In-Reply-To: <201509260114.ADI35946.OtHOVFOMJQFLFS@I-love.SAKURA.ne.jp>

Michal Hocko wrote:
> The point I've tried to made is that oom unmapper running in a detached
> context (e.g. kernel thread) vs. directly in the oom context doesn't
> make any difference wrt. lock because the holders of the lock would loop
> inside the allocator anyway because we do not fail small allocations.

We tried to allow small allocations to fail. It resulted in unstable system
with obscure bugs.

We tried to allow small !__GFP_FS allocations to fail. It failed to fail by
effectively __GFP_NOFAIL allocations.

We are now trying to allow zapping OOM victim's mm. Michal is already
skeptical about this approach due to lock dependency.

We already spent 9 months on this OOM livelock. No silver bullet yet.
Proposed approaches are too drastic to backport for existing users.
I think we are out of bullet.

Until we complete adding/testing __GFP_NORETRY (or __GFP_KILLABLE) to most
of callsites, timeout based workaround will be the only bullet we can use.

Michal's panic_on_oom_timeout and David's "global access to memory reserves"
will be acceptable for some users if these approaches are used as opt-in.
Likewise, my memdie_task_skip_secs / memdie_task_panic_secs will be
acceptable for those who want to retry a bit more rather than panic on
accidental livelock if this approach is used as opt-in.

Tetsuo Handa wrote:
> Excuse me, but thinking about CLONE_VM without CLONE_THREAD case...
> Isn't there possibility of hitting livelocks at
> 
>         /*
>          * If current has a pending SIGKILL or is exiting, then automatically
>          * select it.  The goal is to allow it to allocate so that it may
>          * quickly exit and free its memory.
>          *
>          * But don't select if current has already released its mm and cleared
>          * TIF_MEMDIE flag at exit_mm(), otherwise an OOM livelock may occur.
>          */
>         if (current->mm &&
>             (fatal_signal_pending(current) || task_will_free_mem(current))) {
>                 mark_oom_victim(current);
>                 return true;
>         }
> 
> if current thread receives SIGKILL just before reaching here, for we don't
> send SIGKILL to all threads sharing the mm?

Seems that CLONE_VM without CLONE_THREAD is irrelevant here.
We have sequences like

  Do a GFP_KENREL allocation.
  Hold a lock.
  Do a GFP_NOFS allocation.
  Release a lock.

where an example is seen in VFS operations which receive pathname from
user space using getname() and then call VFS functions and filesystem
code takes locks which can contend with other threads.

------------------------------------------------------------
diff --git a/fs/namei.c b/fs/namei.c
index d68c21f..d51c333 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4005,6 +4005,8 @@ int vfs_symlink(struct inode *dir, struct dentry *dentry, const char *oldname)
        if (error)
                return error;

+       if (fatal_signal_pending(current))
+               printk(KERN_INFO "Calling symlink with SIGKILL pending\n");
        error = dir->i_op->symlink(dir, dentry, oldname);
        if (!error)
                fsnotify_create(dir, dentry);
@@ -4021,6 +4023,10 @@ SYSCALL_DEFINE3(symlinkat, const char __user *, oldname,
        struct path path;
        unsigned int lookup_flags = 0;

+       if (!strcmp(current->comm, "a.out")) {
+               printk(KERN_INFO "Sending SIGKILL to current thread\n");
+               do_send_sig_info(SIGKILL, SEND_SIG_FORCED, current, true);
+       }
        from = getname(oldname);
        if (IS_ERR(from))
                return PTR_ERR(from);
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 996481e..2b6faa5 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -240,6 +240,8 @@ xfs_symlink(
        if (error)
                goto out_trans_cancel;

+       if (fatal_signal_pending(current))
+               printk(KERN_INFO "Calling xfs_ilock() with SIGKILL pending\n");
        xfs_ilock(dp, XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL |
                      XFS_IOLOCK_PARENT | XFS_ILOCK_PARENT);
        unlock_dp_on_error = true;
------------------------------------------------------------

[  119.534976] Sending SIGKILL to current thread
[  119.535898] Calling symlink with SIGKILL pending
[  119.536870] Calling xfs_ilock() with SIGKILL pending

Any program can potentially hit this silent livelock. We can't predict
what locks the OOM victim threads will depend on after TIF_MEMDIE was
set by the OOM killer. Therefore, I think that TIF_MEMDIE disables the
OOM killer indefinitely is one of possible causes regarding silent
hangup troubles.

Michal Hocko wrote:
> I really hate to do "easy" things now just to feel better about
> particular case which will kick us back little bit later. And from my
> own experience I can tell you that a more non-deterministic OOM behavior
> is thing people complain about.

I believe that not waiting for TIF_MEMDIE thread indefinitely is the first
choice we can propose people to try. From my own experience I can tell you
that some customers are really sensitive about bugs which halt their systems
(e.g. https://access.redhat.com/solutions/68466 ).
Opt-in version of TIF_MEMDIE timeout should be acceptable for people
who prefer avoiding silent hangup over non-deterministic OOM behavior if
they were explained about the truth of current memory allocator's behavior.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-09-28 16:18 UTC|newest]

Thread overview: 109+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-17 17:59 [PATCH] mm/oom_kill.c: don't kill TASK_UNINTERRUPTIBLE tasks Kyle Walker
2015-09-17 19:22 ` Oleg Nesterov
2015-09-18 15:41   ` Christoph Lameter
2015-09-18 16:24     ` Oleg Nesterov
2015-09-18 16:39       ` Tetsuo Handa
2015-09-18 16:54         ` Oleg Nesterov
2015-09-18 17:00       ` Christoph Lameter
2015-09-18 19:07         ` Oleg Nesterov
2015-09-18 19:19           ` Christoph Lameter
2015-09-18 21:28             ` Kyle Walker
2015-09-18 22:07               ` Christoph Lameter
2015-09-19  8:32         ` Michal Hocko
2015-09-19 14:33           ` Tetsuo Handa
2015-09-19 15:51             ` Michal Hocko
2015-09-21 23:33             ` David Rientjes
2015-09-22  5:33               ` Tetsuo Handa
2015-09-22 23:32                 ` David Rientjes
2015-09-23 12:03                   ` Kyle Walker
2015-09-24 11:50                     ` Tetsuo Handa
2015-09-19 14:44           ` Oleg Nesterov
2015-09-21 23:27         ` David Rientjes
2015-09-19  8:25     ` Michal Hocko
2015-09-19  8:22 ` Michal Hocko
2015-09-21 23:08   ` David Rientjes
2015-09-19 15:03 ` can't oom-kill zap the victim's memory? Oleg Nesterov
2015-09-19 15:10   ` Oleg Nesterov
2015-09-19 15:58   ` Michal Hocko
2015-09-20 13:16     ` Oleg Nesterov
2015-09-19 22:24   ` Linus Torvalds
2015-09-19 22:54     ` Raymond Jennings
2015-09-19 23:00     ` Raymond Jennings
2015-09-19 23:13       ` Linus Torvalds
2015-09-20  9:33     ` Michal Hocko
2015-09-20 13:06       ` Oleg Nesterov
2015-09-20 12:56     ` Oleg Nesterov
2015-09-20 18:05       ` Linus Torvalds
2015-09-20 18:21         ` Raymond Jennings
2015-09-20 18:23         ` Raymond Jennings
2015-09-20 19:07         ` Raymond Jennings
2015-09-21 13:57           ` Oleg Nesterov
2015-09-21 13:44         ` Oleg Nesterov
2015-09-21 14:24           ` Michal Hocko
2015-09-21 15:32             ` Oleg Nesterov
2015-09-21 16:12               ` Michal Hocko
2015-09-22 16:06                 ` Oleg Nesterov
2015-09-22 23:04                   ` David Rientjes
2015-09-23 20:59                   ` Michal Hocko
2015-09-24 21:15                     ` David Rientjes
2015-09-25  9:35                       ` Michal Hocko
2015-09-25 16:14                         ` Tetsuo Handa
2015-09-28 16:18                           ` Tetsuo Handa [this message]
2015-09-28 22:28                             ` David Rientjes
2015-10-02 12:36                             ` Michal Hocko
2015-10-02 19:01                               ` Linus Torvalds
2015-10-05 14:44                                 ` Michal Hocko
2015-10-07  5:16                                   ` Vlastimil Babka
2015-10-07 10:43                                     ` Tetsuo Handa
2015-10-08  9:40                                       ` Vlastimil Babka
2015-10-06  7:55                                 ` Eric W. Biederman
2015-10-06  8:49                                   ` Linus Torvalds
2015-10-06  8:55                                     ` Linus Torvalds
2015-10-06 14:52                                       ` Eric W. Biederman
2015-10-03  6:02                               ` Can't we use timeout based OOM warning/killing? Tetsuo Handa
2015-10-06 14:51                                 ` Tetsuo Handa
2015-10-12  6:43                                   ` Tetsuo Handa
2015-10-12 15:25                                     ` Silent hang up caused by pages being not scanned? Tetsuo Handa
2015-10-12 21:23                                       ` Linus Torvalds
2015-10-13 12:21                                         ` Tetsuo Handa
2015-10-13 16:37                                           ` Linus Torvalds
2015-10-14 12:21                                             ` Tetsuo Handa
2015-10-15 13:14                                             ` Michal Hocko
2015-10-16 15:57                                               ` Michal Hocko
2015-10-16 18:34                                                 ` Linus Torvalds
2015-10-16 18:49                                                   ` Tetsuo Handa
2015-10-19 12:57                                                     ` Michal Hocko
2015-10-19 12:53                                                   ` Michal Hocko
2015-10-13 13:32                                       ` Michal Hocko
2015-10-13 16:19                                         ` Tetsuo Handa
2015-10-14 13:22                                           ` Michal Hocko
2015-10-14 14:38                                             ` Tetsuo Handa
2015-10-14 14:59                                               ` Michal Hocko
2015-10-14 15:06                                                 ` Tetsuo Handa
2015-10-26 11:44                                     ` Newbie's question: memory allocation when reclaiming memory Tetsuo Handa
2015-11-05  8:46                                       ` Vlastimil Babka
2015-10-06 15:25                                 ` Can't we use timeout based OOM warning/killing? Linus Torvalds
2015-10-08 15:33                                   ` Tetsuo Handa
2015-10-10 12:50                                 ` Tetsuo Handa
2015-09-28 22:24                         ` can't oom-kill zap the victim's memory? David Rientjes
2015-09-29  7:57                           ` Tetsuo Handa
2015-09-29 22:56                             ` David Rientjes
2015-09-30  4:25                               ` Tetsuo Handa
2015-09-30 10:21                                 ` Tetsuo Handa
2015-09-30 21:11                                 ` David Rientjes
2015-10-01 12:13                                   ` Tetsuo Handa
2015-10-01 14:48                           ` Michal Hocko
2015-10-02 13:06                             ` Tetsuo Handa
2015-10-06 18:45                     ` Oleg Nesterov
2015-10-07 11:03                       ` Tetsuo Handa
2015-10-07 12:00                         ` Oleg Nesterov
2015-10-08 14:04                           ` Michal Hocko
2015-10-08 14:01                       ` Michal Hocko
2015-09-21 16:51               ` Tetsuo Handa
2015-09-22 12:43                 ` Oleg Nesterov
2015-09-22 14:30                   ` Tetsuo Handa
2015-09-22 14:45                     ` Oleg Nesterov
2015-09-21 23:42               ` David Rientjes
2015-09-21 16:55           ` Linus Torvalds
2015-09-20 14:50   ` Tetsuo Handa
2015-09-20 14:55     ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201509290118.BCJ43256.tSFFFMOLHVOJOQ@I-love.SAKURA.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=hannes@cmpxchg.org \
    --cc=kwalker@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=oleg@redhat.com \
    --cc=rientjes@google.com \
    --cc=skozina@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vdavydov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).