All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: Michal Hocko <mhocko@kernel.org>
Cc: syzbot <syzbot+77e6b28a7a7106ad0def@syzkaller.appspotmail.com>,
	hannes@cmpxchg.org, akpm@linux-foundation.org, guro@fb.com,
	kirill.shutemov@linux.intel.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, rientjes@google.com,
	syzkaller-bugs@googlegroups.com, yang.s@alibaba-inc.com,
	Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Petr Mladek <pmladek@suse.com>
Subject: Re: INFO: rcu detected stall in shmem_fault
Date: Wed, 10 Oct 2018 19:43:38 +0900	[thread overview]
Message-ID: <e72f799e-0634-f958-1af0-291f8577f4e8@i-love.sakura.ne.jp> (raw)
In-Reply-To: <20181010085945.GC5873@dhcp22.suse.cz>

On 2018/10/10 17:59, Michal Hocko wrote:
> On Wed 10-10-18 09:12:45, Tetsuo Handa wrote:
>> syzbot is hitting RCU stall due to memcg-OOM event.
>> https://syzkaller.appspot.com/bug?id=4ae3fff7fcf4c33a47c1192d2d62d2e03efffa64
> 
> This is really interesting. If we do not have any eligible oom victim we
> simply force the charge (allow to proceed and go over the hard limit)
> and break the isolation. That means that the caller gets back to running
> and realease all locks take on the way.

What happens if the caller continued trying to allocate more memory
because the caller cannot be noticed by SIGKILL from the OOM killer?

>                                         I am wondering how come we are
> seeing the RCU stall. Whole is holding the rcu lock? Certainly not the
> charge patch and neither should the caller because you have to be in a
> sleepable context to trigger the OOM killer. So there must be something
> more going on.

Just flooding out of memory messages can trigger RCU stall problems.
For example, a severe skbuff_head_cache or kmalloc-512 leak bug is causing

  INFO: rcu detected stall in filemap_fault
  https://syzkaller.appspot.com/bug?id=8e7f5412a78197a2e0f848fa513c2e7f0071ffa2

  INFO: rcu detected stall in show_free_areas
  https://syzkaller.appspot.com/bug?id=b2cc06dd0a76e7ca92aa8d13ef4227cb7fd0d217

  INFO: rcu detected stall in proc_reg_read
  https://syzkaller.appspot.com/bug?id=0d6a21d39c8ef7072c695dea11095df6c07c79af

  INFO: rcu detected stall in call_timer_fn
  https://syzkaller.appspot.com/bug?id=88a07e525266567efe221f7a4a05511c032e5822

  INFO: rcu detected stall in br_multicast_port_group_expired (2)
  https://syzkaller.appspot.com/bug?id=15c7ad8cf35a07059e8a697a22527e11d294bc94

  INFO: rcu detected stall in br_multicast_port_group_expired (2)
  https://syzkaller.appspot.com/bug?id=15c7ad8cf35a07059e8a697a22527e11d294bc94

  INFO: rcu detected stall in tun_chr_close
  https://syzkaller.appspot.com/bug?id=6c50618bde03e5a2eefdd0269cf9739c5ebb8270

  INFO: rcu detected stall in discover_timer
  https://syzkaller.appspot.com/bug?id=55da031ddb910e58ab9c6853a5784efd94f03b54

  INFO: rcu detected stall in ret_from_fork (2)
  https://syzkaller.appspot.com/bug?id=c83129a6683b44b39f5b8864a1325893c9218363

  INFO: rcu detected stall in addrconf_rs_timer
  https://syzkaller.appspot.com/bug?id=21c029af65f81488edbc07a10ed20792444711b6

  INFO: rcu detected stall in kthread (2)
  https://syzkaller.appspot.com/bug?id=6accd1ed11c31110fed1982f6ad38cc9676477d2

  INFO: rcu detected stall in ext4_filemap_fault
  https://syzkaller.appspot.com/bug?id=817e38d20e9ee53390ac361bf0fd2007eaf188af

  INFO: rcu detected stall in run_timer_softirq (2)
  https://syzkaller.appspot.com/bug?id=f5a230a3ff7822f8d39fddf8485931bd06ae47fe

  INFO: rcu detected stall in bpf_prog_ADDR
  https://syzkaller.appspot.com/bug?id=fb4911fd0e861171cc55124e209f810a0dd68744

  INFO: rcu detected stall in __run_timers (2)
  https://syzkaller.appspot.com/bug?id=65416569ddc8d2feb8f19066aa761f5a47f7451a

reports.

> 
>> What should we do if memcg-OOM found no killable task because the allocating task
>> was oom_score_adj == -1000 ? Flooding printk() until RCU stall watchdog fires 
>> (which seems to be caused by commit 3100dab2aa09dc6e ("mm: memcontrol: print proper
>> OOM header when no eligible victim left") because syzbot was terminating the test
>> upon WARN(1) removed by that commit) is not a good behavior.
> 
> We definitely want to inform about ineligible oom victim. We might
> consider some rate limiting for the memcg state but that is a valuable
> information to see under normal situation (when you do not have floods
> of these situations).
> 

But if the caller cannot be noticed by SIGKILL from the OOM killer,
allowing the caller to trigger the OOM killer again and again (until
global OOM killer triggers) is bad.


  reply	other threads:[~2018-10-10 10:43 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-10  0:08 syzbot
2018-10-10  0:12 ` Tetsuo Handa
2018-10-10  4:11   ` David Rientjes
2018-10-10  7:55     ` Dmitry Vyukov
2018-10-10  9:13       ` Michal Hocko
2018-10-10  9:33         ` Dmitry Vyukov
2018-10-10  9:02     ` Michal Hocko
2018-10-10  8:59   ` Michal Hocko
2018-10-10 10:43     ` Tetsuo Handa [this message]
2018-10-10 11:35       ` Michal Hocko
2018-10-10 11:48         ` Sergey Senozhatsky
2018-10-10 12:25           ` Michal Hocko
2018-10-10 12:29             ` Dmitry Vyukov
2018-10-10 12:36               ` Dmitry Vyukov
2018-10-10 13:10                 ` Tetsuo Handa
2018-10-10 13:17                   ` Dmitry Vyukov
2018-10-11  1:17                   ` Sergey Senozhatsky
2018-10-10 15:17               ` Sergey Senozhatsky
2018-10-10 14:19         ` Tetsuo Handa
2018-10-10 15:11 ` [RFC PATCH] memcg, oom: throttle dump_header for memcg ooms without eligible tasks Michal Hocko
2018-10-10 15:11   ` Michal Hocko
2018-10-11  6:37   ` Tetsuo Handa
2018-10-11  6:37     ` Tetsuo Handa
2018-10-12 10:47     ` Tetsuo Handa
2018-10-12 10:47       ` Tetsuo Handa
2018-10-12 11:20   ` Johannes Weiner
2018-10-12 12:08     ` Michal Hocko
2018-10-12 12:10       ` Tetsuo Handa
2018-10-12 12:41         ` Johannes Weiner
2018-10-12 12:58           ` Tetsuo Handa
2018-10-13 11:09             ` Tetsuo Handa
2018-10-13 11:22               ` Johannes Weiner
2018-10-13 11:28                 ` Tetsuo Handa
2018-10-15  8:19                   ` Michal Hocko
2018-10-15 10:57                     ` Tetsuo Handa
2018-10-15 11:24                       ` Michal Hocko
2018-10-15 12:47                         ` Tetsuo Handa
2018-10-15 13:35                           ` Michal Hocko
2018-10-16  0:55                             ` Tetsuo Handa
2018-10-16  9:20                               ` Michal Hocko
2018-10-16 11:05                                 ` Tetsuo Handa
2018-10-16 11:17                                   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e72f799e-0634-f958-1af0-291f8577f4e8@i-love.sakura.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=pmladek@suse.com \
    --cc=rientjes@google.com \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=syzbot+77e6b28a7a7106ad0def@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=yang.s@alibaba-inc.com \
    --subject='Re: INFO: rcu detected stall in shmem_fault' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.