All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Petr Mladek <pmladek@suse.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH (resend)] mm,oom: Defer dump_tasks() output.
Date: Mon, 9 Sep 2019 15:04:35 +0200	[thread overview]
Message-ID: <20190909130435.GO27159@dhcp22.suse.cz> (raw)
In-Reply-To: <579a27d2-52fb-207e-9278-fc20a2154394@i-love.sakura.ne.jp>

On Mon 09-09-19 21:40:24, Tetsuo Handa wrote:
> On 2019/09/09 20:36, Michal Hocko wrote:
> > On Sat 07-09-19 19:54:32, Tetsuo Handa wrote:
> >> (Resending to LKML as linux-mm ML dropped my posts.)
> >>
> >> If /proc/sys/vm/oom_dump_tasks != 0, dump_header() can become very slow
> >> because dump_tasks() synchronously reports all OOM victim candidates, and
> >> as a result ratelimit test for dump_header() cannot work as expected.
> >>
> >> This patch defers dump_tasks() output till oom_lock is released. As a
> >> result of this patch, the latency between out_of_memory() is called and
> >> SIGKILL is sent (and the OOM reaper starts reclaiming memory) will be
> >> significantly reduced.
> >>
> >> Since CONFIG_PRINTK_CALLER was introduced, concurrent printk() became less
> >> problematic. But we still need to correlate synchronously printed messages
> >> and asynchronously printed messages if we defer dump_tasks() messages.
> >> Thus, this patch also prefixes OOM killer messages using "OOM[$serial]:"
> >> format. As a result, OOM killer messages would look like below.
> >>
> >>   [   31.935015][   T71] OOM[1]: kworker/4:1 invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=-1, oom_score_adj=0
> >>   (...snipped....)
> >>   [   32.052635][   T71] OOM[1]: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),global_oom,task_memcg=/,task=firewalld,pid=737,uid=0
> >>   [   32.056886][   T71] OOM[1]: Out of memory: Killed process 737 (firewalld) total-vm:358672kB, anon-rss:22640kB, file-rss:12328kB, shmem-rss:0kB, UID:0 pgtables:421888kB oom_score_adj:0
> >>   [   32.064291][   T71] OOM[1]: Tasks state (memory values in pages):
> >>   [   32.067807][   T71] OOM[1]: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
> >>   [   32.070057][   T54] oom_reaper: reaped process 737 (firewalld), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> >>   [   32.072417][   T71] OOM[1]: [    548]     0   548     9772     1172   110592        0             0 systemd-journal
> >>   (...snipped....)
> >>   [   32.139566][   T71] OOM[1]: [    737]     0   737    89668     8742   421888        0             0 firewalld
> >>   (...snipped....)
> >>   [   32.221990][   T71] OOM[1]: [   1300]    48  1300    63025     1788   532480        0             0 httpd
> >>
> >> This patch might affect panic behavior triggered by panic_on_oom or no
> >> OOM-killable tasks, for dump_header(oc, NULL) will not report OOM victim
> >> candidates if there are not-yet-reported OOM victim candidates from past
> >> rounds of OOM killer invocations. I don't know if that matters.
> >>
> >> For now this patch embeds "struct oom_task_info" into each
> >> "struct task_struct". In order to avoid bloating "struct task_struct",
> >> future patch might detach from "struct task_struct" because one
> >> "struct oom_task_info" for one "struct signal_struct" will be enough.
> > 
> > This is not an improvement. It detaches the oom report and tasks_dump
> > for an arbitrary amount of time because the worder context might be
> > stalled for an arbitrary time. Even long after the oom is resolved.
> 
> A new worker thread is created if all existing worker threads are busy
> because this patch solves OOM situation quickly when a new worker thread
> cannot be created due to OOM situation.
> 
> Also, if a worker thread cannot run due to CPU starvation, the same thing
> applies to dump_tasks(). In other words, dump_tasks() cannot complete due
> to CPU starvation, which results in more costly and serious consequences.
> Being able to send SIGKILL and reclaim memory as soon as possible is
> an improvement.

There might be zillion workers waiting to make a forward progress and
you cannot expect any timing here. Just remember your own experiments
with xfs and low memory conditions.

> > Not to mention that 1:1 (oom to tasks) information dumping is
> > fundamentally broken. Any task might be on an oom list of different
> > OOM contexts in different oom scopes (think of OOM happening in disjunct
> > NUMA sets).
> 
> I can't understand what you are talking about. This patch just defers
> printk() from /proc/sys/vm/oom_dump_tasks != 0. Please look at the patch
> carefully. If you are saying that it is bad that OOM victim candidates for
> OOM domain B, C, D ... cannot be printed if printing of OOM victim candidates
> for OOM domain A has not finished, I can update this patch to print them.

You would have to track each ongoing oom context separately. And not
only those from different oom scopes because as a matter of fact a new
OOM might trigger before the previous dump_tasks managed to be handled.

> > This is just adding more kludges and making the code more complex
> > without trying to address an underlying problems. So
> > Nacked-by: Michal Hocko <mhocko@suse.com>
> 
> Since I'm sure that you are misunderstanding, this Nacked-by is invalid.

Thank you very much for your consideration and evaluation of my review.
It seems that I am only burning my time responding to your emails. As
you seem to know the best, right?
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2019-09-09 13:04 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1567159493-5232-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp>
2019-08-30 10:35 ` [PATCH] mm,oom: Defer dump_tasks() output Michal Hocko
     [not found]   ` <f69d1b83-aee4-8b00-81f6-adbe6121eb99@i-love.sakura.ne.jp>
2019-09-02  6:06     ` Michal Hocko
     [not found]       ` <cba675c7-88a2-0c5b-c97b-8d5c77eaa8ef@i-love.sakura.ne.jp>
2019-09-03 14:29         ` Michal Hocko
2019-09-04  8:13         ` Petr Mladek
2019-09-07 10:54 ` [PATCH (resend)] " Tetsuo Handa
2019-09-09 11:36   ` Michal Hocko
2019-09-09 12:40     ` Tetsuo Handa
2019-09-09 13:04       ` Michal Hocko [this message]
2019-09-10 11:00         ` Tetsuo Handa
2019-09-14  6:15           ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190909130435.GO27159@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=pmladek@suse.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.