linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Petr Mladek <pmladek@suse.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH (resend)] mm,oom: Defer dump_tasks() output.
Date: Tue, 10 Sep 2019 20:00:22 +0900	[thread overview]
Message-ID: <5bbcd93f-aa42-6c62-897a-d7b94aacdb87@i-love.sakura.ne.jp> (raw)
In-Reply-To: <20190909130435.GO27159@dhcp22.suse.cz>

On 2019/09/09 22:04, Michal Hocko wrote:
> On Mon 09-09-19 21:40:24, Tetsuo Handa wrote:
>> On 2019/09/09 20:36, Michal Hocko wrote:
>>> This is not an improvement. It detaches the oom report and tasks_dump
>>> for an arbitrary amount of time because the worder context might be
>>> stalled for an arbitrary time. Even long after the oom is resolved.
>>
>> A new worker thread is created if all existing worker threads are busy
>> because this patch solves OOM situation quickly when a new worker thread
>> cannot be created due to OOM situation.
>>
>> Also, if a worker thread cannot run due to CPU starvation, the same thing
>> applies to dump_tasks(). In other words, dump_tasks() cannot complete due
>> to CPU starvation, which results in more costly and serious consequences.
>> Being able to send SIGKILL and reclaim memory as soon as possible is
>> an improvement.
> 
> There might be zillion workers waiting to make a forward progress and
> you cannot expect any timing here. Just remember your own experiments
> with xfs and low memory conditions.

Even if there were zillion workers waiting to make a forward progress, the
worker for processing dump_tasks() output can make a forward progress. That's
how workqueue works. (If you still don't trust workqueue, I can update my patch
to use a kernel thread.) And if there were zillion workers waiting to make a
forward progress, completing the OOM killer quickly will be more important than
keep blocking zillion workers waiting for the OOM killer to solve OOM situation.
Preempting a thread calling out_of_memory() by zillion workers is a nightmare. ;-)

> 
>>> Not to mention that 1:1 (oom to tasks) information dumping is
>>> fundamentally broken. Any task might be on an oom list of different
>>> OOM contexts in different oom scopes (think of OOM happening in disjunct
>>> NUMA sets).
>>
>> I can't understand what you are talking about. This patch just defers
>> printk() from /proc/sys/vm/oom_dump_tasks != 0. Please look at the patch
>> carefully. If you are saying that it is bad that OOM victim candidates for
>> OOM domain B, C, D ... cannot be printed if printing of OOM victim candidates
>> for OOM domain A has not finished, I can update this patch to print them.
> 
> You would have to track each ongoing oom context separately.

I can update my patch to track each OOM context separately. But

>                                                              And not
> only those from different oom scopes because as a matter of fact a new
> OOM might trigger before the previous dump_tasks managed to be handled.

please be aware that we are already dropping OOM messages from different scopes
due to __ratelimit(&oom_rs). The difference is, given that __ratelimit(&oom_rs)
can work, nothing but which OOM messages will be dropped when cluster of OOM
events from multiple different scopes happened.

And "OOM events from multiple different scopes can trivially happen" is a
violation for commit dc56401fc9f25e8f ("mm: oom_kill: simplify OOM killer
locking") saying

    However, the OOM killer is a fairly cold error path, there is really no
    reason to optimize for highly performant and concurrent OOM kills.

where we will need "per an OOM scope locking mechanism" in order to avoid
deferring OOM killer event in current thread's OOM scope due to processing
OOM killer events in other threads' OOM scopes.


  reply	other threads:[~2019-09-10 11:00 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1567159493-5232-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp>
2019-09-07 10:54 ` [PATCH (resend)] mm,oom: Defer dump_tasks() output Tetsuo Handa
2019-09-09 11:36   ` Michal Hocko
2019-09-09 12:40     ` Tetsuo Handa
2019-09-09 13:04       ` Michal Hocko
2019-09-10 11:00         ` Tetsuo Handa [this message]
2019-09-14  6:15           ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5bbcd93f-aa42-6c62-897a-d7b94aacdb87@i-love.sakura.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=pmladek@suse.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).