linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dmitry Vyukov <dvyukov@google.com>
To: "Theodore Y. Ts'o" <tytso@mit.edu>,
	syzbot <syzbot+7d19c5fe6a3f1161abb7@syzkaller.appspotmail.com>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	linux-ext4@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	syzkaller-bugs <syzkaller-bugs@googlegroups.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>
Subject: Re: INFO: rcu detected stall in ext4_file_write_iter
Date: Wed, 27 Feb 2019 10:58:50 +0100	[thread overview]
Message-ID: <CACT4Y+acnJnZ4rN3Ta-6ftFv4A=1q7hsVQSVJeh9PGCK0e4BbA@mail.gmail.com> (raw)
In-Reply-To: <20190226151738.GA6430@mit.edu>

On Tue, Feb 26, 2019 at 4:17 PM Theodore Y. Ts'o <tytso@mit.edu> wrote:
>
> TL;DR: This doesn't appear to be ext4 specific, and seems to involve
> an unholy combination of the perf_event_open(2) and sendfile(2) system
> calls.
>
> On Mon, Feb 25, 2019 at 10:50:05PM -0800, syzbot wrote:
> > syzbot found the following crash on:
> >
> > HEAD commit:    8a61716ff2ab Merge tag 'ceph-for-5.0-rc8' of git://github...
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=161b71d4c00000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=7132344728e7ec3f
> > dashboard link: https://syzkaller.appspot.com/bug?extid=7d19c5fe6a3f1161abb7
> > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=103908f8c00000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=105e5cd0c00000
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+7d19c5fe6a3f1161abb7@syzkaller.appspotmail.com
> >
> > audit: type=1400 audit(1550814986.750:36): avc:  denied  { map } for
> > pid=8058 comm="syz-executor004" path="/root/syz-executor004991115"
> > dev="sda1" ino=1426 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
> > tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1
> > hrtimer: interrupt took 42841 ns
> > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > rcu:  (detected by 1, t=10502 jiffies, g=5873, q=2)
> > rcu: All QSes seen, last rcu_preempt kthread activity 10502
> > (4295059997-4295049495), jiffies_till_next_fqs=1, root ->qsmask 0x0
> > syz-executor004 R  running task    26448  8069   8060 0x00000000
>
> This particular repro seems to induce similar failures when I tried it
> with xfs and btrfs as well as ext4.
>
> The repro seems to involve the perf_event_open(2) and sendfile(2)
> system calls, and killing the process which is performing the
> sendfile(2).  The repro also uses the sched_setattr(2) system call,
> but when I commented it out, the failure still happened, so this
> appears to be another case of "Syzkaller?  We don't need to bug
> developers with a minimal test case!  Open source developers are a
> free unlimited resource, after all!"  :-)
>
> Commenting out the perf_event_open(2) does seem to make the problem go
> away.
>
> Since there are zillions of ways to self-DOS a Linux server without
> having to resert to exotic combination of system calls, this isn't
> something I'm going to prioritize for myself, but I'm hoping someone
> else has time and curiosity.

Peter, Ingo, do you have any updates on the
perf_event_open/sched_setattr stalls? This bug cause assorted hangs
throughout kernel and so is nasty.

syzkaller tries to remove all syscalls from reproducers one-by-one.
Somehow without sched_setattr the hang did not reproduce (a bunch of
repros have perf_event_open+sched_setattr so somehow they seem to be
related). Kernel is not as simple as a single-threaded deterministic
fully-reproducible user-space xml parsing library, more (almost all)
aspects are flaky and non-deterministic and thus require more human
intelligence. But even with perfect repros machines still won't be
able to tell in all cases that even though the hang happened in ext4
code, the root cause is actually another scheduler-related system
call. So thanks for looking into this.

  reply	other threads:[~2019-02-27  9:59 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <0000000000009a01370582c6772a@google.com>
2019-02-26 15:17 ` INFO: rcu detected stall in ext4_file_write_iter Theodore Y. Ts'o
2019-02-27  9:58   ` Dmitry Vyukov [this message]
2019-02-27 21:57     ` Theodore Y. Ts'o
2019-02-28  9:34       ` Dmitry Vyukov
2019-03-21 17:30 ` syzbot
2020-09-11 19:33 ` syzbot
2020-09-11 20:03   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACT4Y+acnJnZ4rN3Ta-6ftFv4A=1q7hsVQSVJeh9PGCK0e4BbA@mail.gmail.com' \
    --to=dvyukov@google.com \
    --cc=acme@kernel.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=syzbot+7d19c5fe6a3f1161abb7@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).