linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: Dmitry Vyukov <dvyukov@google.com>, Michal Hocko <mhocko@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	syzbot <syzbot+77e6b28a7a7106ad0def@syzkaller.appspotmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	guro@fb.com,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	David Rientjes <rientjes@google.com>,
	syzkaller-bugs <syzkaller-bugs@googlegroups.com>,
	Yang Shi <yang.s@alibaba-inc.com>,
	Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Petr Mladek <pmladek@suse.com>
Subject: Re: INFO: rcu detected stall in shmem_fault
Date: Wed, 10 Oct 2018 22:10:31 +0900	[thread overview]
Message-ID: <b7727ff0-b34f-25a2-b9e7-56e70d9349c4@i-love.sakura.ne.jp> (raw)
In-Reply-To: <CACT4Y+bqJeKum7jessccWQF+4BmabnVy48aqHEOypioKwQAMTQ@mail.gmail.com>

On 2018/10/10 21:36, Dmitry Vyukov wrote:
> On Wed, Oct 10, 2018 at 2:29 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Wed, Oct 10, 2018 at 2:25 PM, Michal Hocko <mhocko@kernel.org> wrote:
>>> On Wed 10-10-18 20:48:33, Sergey Senozhatsky wrote:
>>>> On (10/10/18 13:35), Michal Hocko wrote:
>>>>>> Just flooding out of memory messages can trigger RCU stall problems.
>>>>>> For example, a severe skbuff_head_cache or kmalloc-512 leak bug is causing
>>>>>
>>>>> [...]
>>>>>
>>>>> Quite some of them, indeed! I guess we want to rate limit the output.
>>>>> What about the following?
>>>>
>>>> A bit unrelated, but while we are at it:
>>>>
>>>>   I like it when we rate-limit printk-s that lookup the system.
>>>> But it seems that default rate-limit values are not always good enough,
>>>> DEFAULT_RATELIMIT_INTERVAL / DEFAULT_RATELIMIT_BURST can still be too
>>>> verbose. For instance, when we have a very slow IPMI emulated serial
>>>> console -- e.g. baud rate at 57600. DEFAULT_RATELIMIT_INTERVAL and
>>>> DEFAULT_RATELIMIT_BURST can add new OOM headers and backtraces faster
>>>> than we evict them.
>>>>
>>>> Does it sound reasonable enough to use larger than default rate-limits
>>>> for printk-s in OOM print-outs? OOM reports tend to be somewhat large
>>>> and the reported numbers are not always *very* unique.
>>>>
>>>> What do you think?
>>>
>>> I do not really care about the current inerval/burst values. This change
>>> should be done seprately and ideally with some numbers.
>>
>> I think Sergey meant that this place may need to use
>> larger-than-default values because it prints lots of output per
>> instance (whereas the default limit is more tuned for cases that print
>> just 1 line).

Yes. The OOM killer tends to print a lot of messages (and I estimate that
mutex_trylock(&oom_lock) accelerates wasting more CPU consumption by
preemption).

>>
>> I've found at least 1 place that uses DEFAULT_RATELIMIT_INTERVAL*10:
>> https://elixir.bootlin.com/linux/latest/source/fs/btrfs/extent-tree.c#L8365
>> Probably we need something similar here.

Since printk() is a significantly CPU consuming operation, I think that what
we need to guarantee is interval between the end of an OOM killer messages
and the beginning of next OOM killer messages is large enough. For example,
setup a timer with 5 seconds timeout upon the end of an OOM killer messages
and check whether the timer already fired upon the beginning of next OOM killer
messages.

> 
> 
> In parallel with the kernel changes I've also made a change to
> syzkaller that (1) makes it not use oom_score_adj=-1000, this hard
> killing limit looks like quite risky thing, (2) increase memcg size
> beyond expected KASAN quarantine size:
> https://github.com/google/syzkaller/commit/adedaf77a18f3d03d695723c86fc083c3551ff5b
> If this will stop the flow of hang/stall reports, then we can just
> close all old reports as invalid.

I don't think so. Only this report was different from others because printk()
in this report was from memcg OOM events without eligible tasks whereas printk()
in others are from global OOM events triggered by severe slab memory leak.

  reply	other threads:[~2018-10-10 13:10 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-10  0:08 INFO: rcu detected stall in shmem_fault syzbot
2018-10-10  0:12 ` Tetsuo Handa
2018-10-10  4:11   ` David Rientjes
2018-10-10  7:55     ` Dmitry Vyukov
2018-10-10  9:13       ` Michal Hocko
2018-10-10  9:33         ` Dmitry Vyukov
2018-10-10  9:02     ` Michal Hocko
2018-10-10  8:59   ` Michal Hocko
2018-10-10 10:43     ` Tetsuo Handa
2018-10-10 11:35       ` Michal Hocko
2018-10-10 11:48         ` Sergey Senozhatsky
2018-10-10 12:25           ` Michal Hocko
2018-10-10 12:29             ` Dmitry Vyukov
2018-10-10 12:36               ` Dmitry Vyukov
2018-10-10 13:10                 ` Tetsuo Handa [this message]
2018-10-10 13:17                   ` Dmitry Vyukov
2018-10-11  1:17                   ` Sergey Senozhatsky
2018-10-10 15:17               ` Sergey Senozhatsky
2018-10-10 14:19         ` Tetsuo Handa
2018-10-10 15:11 ` [RFC PATCH] memcg, oom: throttle dump_header for memcg ooms without eligible tasks Michal Hocko
2018-10-11  6:37   ` Tetsuo Handa
2018-10-12 10:47     ` Tetsuo Handa
2018-10-12 11:20   ` Johannes Weiner
2018-10-12 12:08     ` Michal Hocko
2018-10-12 12:10       ` Tetsuo Handa
2018-10-12 12:41         ` Johannes Weiner
2018-10-12 12:58           ` Tetsuo Handa
2018-10-13 11:09             ` Tetsuo Handa
2018-10-13 11:22               ` Johannes Weiner
2018-10-13 11:28                 ` Tetsuo Handa
2018-10-15  8:19                   ` Michal Hocko
2018-10-15 10:57                     ` Tetsuo Handa
2018-10-15 11:24                       ` Michal Hocko
2018-10-15 12:47                         ` Tetsuo Handa
2018-10-15 13:35                           ` Michal Hocko
2018-10-16  0:55                             ` Tetsuo Handa
2018-10-16  9:20                               ` Michal Hocko
2018-10-16 11:05                                 ` Tetsuo Handa
2018-10-16 11:17                                   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b7727ff0-b34f-25a2-b9e7-56e70d9349c4@i-love.sakura.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=dvyukov@google.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=pmladek@suse.com \
    --cc=rientjes@google.com \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=syzbot+77e6b28a7a7106ad0def@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=yang.s@alibaba-inc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).