linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Kyle Walker <kwalker@redhat.com>,
	Christoph Lameter <cl@linux.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Rientjes <rientjes@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vladimir Davydov <vdavydov@parallels.com>,
	linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Stanislav Kozina <skozina@redhat.com>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Subject: Re: can't oom-kill zap the victim's memory?
Date: Tue, 22 Sep 2015 18:06:08 +0200	[thread overview]
Message-ID: <20150922160608.GA2716@redhat.com> (raw)
In-Reply-To: <20150921161203.GD19811@dhcp22.suse.cz>

On 09/21, Michal Hocko wrote:
>
> On Mon 21-09-15 17:32:52, Oleg Nesterov wrote:
> > On 09/21, Michal Hocko wrote:
> > >
> > > On Mon 21-09-15 15:44:14, Oleg Nesterov wrote:
> > > [...]
> > > > So yes, in general oom_kill_process() can't call oom_unmap_func() directly.
> > > > That is why the patch uses queue_work(oom_unmap_func). The workqueue thread
> > > > takes mmap_sem and frees the memory allocated by user space.
> > >
> > > OK, this might have been a bit confusing. I didn't mean you cannot use
> > > mmap_sem directly from the workqueue context. You _can_ AFAICS. But I've
> > > mentioned that you _shouldn't_ use workqueue context in the first place
> > > because all the workers might be blocked on locks and new workers cannot
> > > be created due to memory pressure.
> >
> > Yes, yes, and I already tried to comment this part.
>
> OK then we are on the same page, good.

Yes, yes.

> > We probably need a
> > dedicated kernel thread, but I still think (although I am not sure) that
> > initial change can use workueue. In the likely case system_unbound_wq pool
> > should have an idle thread, if not - OK, this change won't help in this
> > case. This is minor.
>
> The point is that the implementation should be robust from the very
> beginning.

OK, let it be a kthread from the very beginning, I won't argue. This
is really minor compared to other problems.

> > > So I think we probably need to do this in the OOM killer context (with
> > > try_lock)
> >
> > Yes we should try to do this in the OOM killer context, and in this case
> > (of course) we need trylock. Let me quote my previous email:
> >
> > 	And we want to avoid using workqueues when the caller can do this
> > 	directly. And in this case we certainly need trylock. But this needs
> > 	some refactoring: we do not want to do this under oom_lock,
>
> Why do you think oom_lock would be a big deal?

I don't really know... This doesn't look sane to me, but perhaps this
is just because I don't understand this code enough.

And note that the caller can held other locks we do not even know about.
Most probably we should not deadlock, at least if we only unmap the anon
pages, but still this doesn't look safe.

But I agree, this probably needs more discussion.

> Address space of the
> victim might be really large but we can back off after a batch of
> unmapped pages.

Hmm. If we already have mmap_sem and started zap_page_range() then
I do not think it makes sense to stop until we free everything we can.

> I definitely agree with the simplicity for the first iteration. That
> means only unmap private exclusive pages and release at most few megs of
> them.

See above, I am not sure this makes sense. And in any case this will
complicate the initial changes, not simplify.

> I am still not sure about some details, e.g. futex sitting in such
> a memory. Wouldn't threads blow up when they see an unmapped futex page,
> try to page it in and it would be in an uninitialized state? Maybe this
> is safe

But this must be safe.

We do not care about userspace (assuming that all mm users have a
pending SIGKILL).

If this can (say) crash the kernel somehow, then we have a bug which
should be fixed. Simply because userspace can exploit this bug doing
MADV_DONTEED from another thread or CLONE_VM process.



Finally. Whatever we do, we need to change oom_kill_process() first,
and I think we should do this regardless. The "Kill all user processes
sharing victim->mm" logic looks wrong and suboptimal/overcomplicated.
I'll try to make some patches tomorrow if I have time...

But. Can't we just remove another ->oom_score_adj check when we try
to kill all mm users (the last for_each_process loop). If yes, this
all can be simplified.

I guess we can't and its a pity. Because it looks simply pointless
to not kill all mm users. This just means the select_bad_process()
picked the wrong task.


Say, vfork(). OK, it is possible that parent is OOM_SCORE_ADJ_MIN and
the child has already updated its oom_score_adj before exec. Now if
we to kill the child we will only upset the parent for no reason, this
won't help to free the memory.



And while this completely offtopic... why does it take task_lock()
to protect ->comm? Sure, without task_lock() we can print garbage.
Is it really that important? I am asking because sometime people
think that it is not safe to use ->comm lockless, but this is not
true.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-09-22 16:09 UTC|newest]

Thread overview: 109+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-17 17:59 [PATCH] mm/oom_kill.c: don't kill TASK_UNINTERRUPTIBLE tasks Kyle Walker
2015-09-17 19:22 ` Oleg Nesterov
2015-09-18 15:41   ` Christoph Lameter
2015-09-18 16:24     ` Oleg Nesterov
2015-09-18 16:39       ` Tetsuo Handa
2015-09-18 16:54         ` Oleg Nesterov
2015-09-18 17:00       ` Christoph Lameter
2015-09-18 19:07         ` Oleg Nesterov
2015-09-18 19:19           ` Christoph Lameter
2015-09-18 21:28             ` Kyle Walker
2015-09-18 22:07               ` Christoph Lameter
2015-09-19  8:32         ` Michal Hocko
2015-09-19 14:33           ` Tetsuo Handa
2015-09-19 15:51             ` Michal Hocko
2015-09-21 23:33             ` David Rientjes
2015-09-22  5:33               ` Tetsuo Handa
2015-09-22 23:32                 ` David Rientjes
2015-09-23 12:03                   ` Kyle Walker
2015-09-24 11:50                     ` Tetsuo Handa
2015-09-19 14:44           ` Oleg Nesterov
2015-09-21 23:27         ` David Rientjes
2015-09-19  8:25     ` Michal Hocko
2015-09-19  8:22 ` Michal Hocko
2015-09-21 23:08   ` David Rientjes
2015-09-19 15:03 ` can't oom-kill zap the victim's memory? Oleg Nesterov
2015-09-19 15:10   ` Oleg Nesterov
2015-09-19 15:58   ` Michal Hocko
2015-09-20 13:16     ` Oleg Nesterov
2015-09-19 22:24   ` Linus Torvalds
2015-09-19 22:54     ` Raymond Jennings
2015-09-19 23:00     ` Raymond Jennings
2015-09-19 23:13       ` Linus Torvalds
2015-09-20  9:33     ` Michal Hocko
2015-09-20 13:06       ` Oleg Nesterov
2015-09-20 12:56     ` Oleg Nesterov
2015-09-20 18:05       ` Linus Torvalds
2015-09-20 18:21         ` Raymond Jennings
2015-09-20 18:23         ` Raymond Jennings
2015-09-20 19:07         ` Raymond Jennings
2015-09-21 13:57           ` Oleg Nesterov
2015-09-21 13:44         ` Oleg Nesterov
2015-09-21 14:24           ` Michal Hocko
2015-09-21 15:32             ` Oleg Nesterov
2015-09-21 16:12               ` Michal Hocko
2015-09-22 16:06                 ` Oleg Nesterov [this message]
2015-09-22 23:04                   ` David Rientjes
2015-09-23 20:59                   ` Michal Hocko
2015-09-24 21:15                     ` David Rientjes
2015-09-25  9:35                       ` Michal Hocko
2015-09-25 16:14                         ` Tetsuo Handa
2015-09-28 16:18                           ` Tetsuo Handa
2015-09-28 22:28                             ` David Rientjes
2015-10-02 12:36                             ` Michal Hocko
2015-10-02 19:01                               ` Linus Torvalds
2015-10-05 14:44                                 ` Michal Hocko
2015-10-07  5:16                                   ` Vlastimil Babka
2015-10-07 10:43                                     ` Tetsuo Handa
2015-10-08  9:40                                       ` Vlastimil Babka
2015-10-06  7:55                                 ` Eric W. Biederman
2015-10-06  8:49                                   ` Linus Torvalds
2015-10-06  8:55                                     ` Linus Torvalds
2015-10-06 14:52                                       ` Eric W. Biederman
2015-10-03  6:02                               ` Can't we use timeout based OOM warning/killing? Tetsuo Handa
2015-10-06 14:51                                 ` Tetsuo Handa
2015-10-12  6:43                                   ` Tetsuo Handa
2015-10-12 15:25                                     ` Silent hang up caused by pages being not scanned? Tetsuo Handa
2015-10-12 21:23                                       ` Linus Torvalds
2015-10-13 12:21                                         ` Tetsuo Handa
2015-10-13 16:37                                           ` Linus Torvalds
2015-10-14 12:21                                             ` Tetsuo Handa
2015-10-15 13:14                                             ` Michal Hocko
2015-10-16 15:57                                               ` Michal Hocko
2015-10-16 18:34                                                 ` Linus Torvalds
2015-10-16 18:49                                                   ` Tetsuo Handa
2015-10-19 12:57                                                     ` Michal Hocko
2015-10-19 12:53                                                   ` Michal Hocko
2015-10-13 13:32                                       ` Michal Hocko
2015-10-13 16:19                                         ` Tetsuo Handa
2015-10-14 13:22                                           ` Michal Hocko
2015-10-14 14:38                                             ` Tetsuo Handa
2015-10-14 14:59                                               ` Michal Hocko
2015-10-14 15:06                                                 ` Tetsuo Handa
2015-10-26 11:44                                     ` Newbie's question: memory allocation when reclaiming memory Tetsuo Handa
2015-11-05  8:46                                       ` Vlastimil Babka
2015-10-06 15:25                                 ` Can't we use timeout based OOM warning/killing? Linus Torvalds
2015-10-08 15:33                                   ` Tetsuo Handa
2015-10-10 12:50                                 ` Tetsuo Handa
2015-09-28 22:24                         ` can't oom-kill zap the victim's memory? David Rientjes
2015-09-29  7:57                           ` Tetsuo Handa
2015-09-29 22:56                             ` David Rientjes
2015-09-30  4:25                               ` Tetsuo Handa
2015-09-30 10:21                                 ` Tetsuo Handa
2015-09-30 21:11                                 ` David Rientjes
2015-10-01 12:13                                   ` Tetsuo Handa
2015-10-01 14:48                           ` Michal Hocko
2015-10-02 13:06                             ` Tetsuo Handa
2015-10-06 18:45                     ` Oleg Nesterov
2015-10-07 11:03                       ` Tetsuo Handa
2015-10-07 12:00                         ` Oleg Nesterov
2015-10-08 14:04                           ` Michal Hocko
2015-10-08 14:01                       ` Michal Hocko
2015-09-21 16:51               ` Tetsuo Handa
2015-09-22 12:43                 ` Oleg Nesterov
2015-09-22 14:30                   ` Tetsuo Handa
2015-09-22 14:45                     ` Oleg Nesterov
2015-09-21 23:42               ` David Rientjes
2015-09-21 16:55           ` Linus Torvalds
2015-09-20 14:50   ` Tetsuo Handa
2015-09-20 14:55     ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150922160608.GA2716@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=hannes@cmpxchg.org \
    --cc=kwalker@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=rientjes@google.com \
    --cc=skozina@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vdavydov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).