netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dmitry Vyukov <dvyukov@google.com>
To: "Theodore Ts'o" <tytso@mit.edu>,
	syzbot <syzbot+4bfbbf28a2e50ab07368@syzkaller.appspotmail.com>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	David Miller <davem@davemloft.net>,
	eladr@mellanox.com, Ido Schimmel <idosch@mellanox.com>,
	Jiri Pirko <jiri@mellanox.com>,
	John Stultz <john.stultz@linaro.org>,
	linux-ext4@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
	netdev <netdev@vger.kernel.org>,
	syzkaller-bugs <syzkaller-bugs@googlegroups.com>,
	Thomas Gleixner <tglx@linutronix.de>
Cc: syzkaller <syzkaller@googlegroups.com>
Subject: Re: INFO: rcu detected stall in ext4_write_checks
Date: Fri, 5 Jul 2019 15:18:06 +0200	[thread overview]
Message-ID: <CACT4Y+aHgz9cPa7OnVsNeHim72i6zVdjnbvVb0Z1oN2B8QLZqg@mail.gmail.com> (raw)
In-Reply-To: <20190626184251.GE3116@mit.edu>

On Wed, Jun 26, 2019 at 8:43 PM Theodore Ts'o <tytso@mit.edu> wrote:
>
> On Wed, Jun 26, 2019 at 10:27:08AM -0700, syzbot wrote:
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:    abf02e29 Merge tag 'pm-5.2-rc6' of git://git.kernel.org/pu..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=1435aaf6a00000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=e5c77f8090a3b96b
> > dashboard link: https://syzkaller.appspot.com/bug?extid=4bfbbf28a2e50ab07368
> > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=11234c41a00000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15d7f026a00000
> >
> > The bug was bisected to:
> >
> > commit 0c81ea5db25986fb2a704105db454a790c59709c
> > Author: Elad Raz <eladr@mellanox.com>
> > Date:   Fri Oct 28 19:35:58 2016 +0000
> >
> >     mlxsw: core: Add port type (Eth/IB) set API
>
> Um, so this doesn't pass the laugh test.
>
> > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=10393a89a00000
>
> It looks like the automated bisection machinery got confused by two
> failures getting triggered by the same repro; the symptoms changed
> over time.  Initially, the failure was:
>
> crashed: INFO: rcu detected stall in {sys_sendfile64,ext4_file_write_iter}
>
> Later, the failure changed to something completely different, and much
> earlier (before the test was even started):
>
> run #5: basic kernel testing failed: failed to copy test binary to VM: failed to run ["scp" "-P" "22" "-F" "/dev/null" "-o" "UserKnownHostsFile=/dev/null" "-o" "BatchMode=yes" "-o" "IdentitiesOnly=yes" "-o" "StrictHostKeyChecking=no" "-o" "ConnectTimeout=10" "-i" "/syzkaller/jobs/linux/workdir/image/key" "/tmp/syz-executor216456474" "root@10.128.15.205:./syz-executor216456474"]: exit status 1
> Connection timed out during banner exchange
> lost connection
>
> Looks like an opportunity to improve the bisection engine?

Hi Ted,

Yes, these infrastructure errors plague bisections episodically.
That's https://github.com/google/syzkaller/issues/1250

It did not confuse bisection explicitly as it understands that these
are infrastructure failures rather then a kernel crash, e.g. here you
may that it correctly identified that this run was OK and started
bisection in v4.10 v4.9 range besides 2 scp failures:

testing release v4.9
testing commit 69973b830859bc6529a7a0468ba0d80ee5117826 with gcc (GCC) 5.5.0
run #0: basic kernel testing failed: failed to copy test binary to VM:
failed to run ["scp" ...]: exit status 1
Connection timed out during banner exchange
run #1: basic kernel testing failed: failed to copy test binary to VM:
failed to run ["scp" ....]: exit status 1
Connection timed out during banner exchange
run #2: OK
run #3: OK
run #4: OK
run #5: OK
run #6: OK
run #7: OK
run #8: OK
run #9: OK
# git bisect start v4.10 v4.9

Though, of course, it may confuse bisection indirectly by reducing
number of tests per commit.

So far I wasn't able to gather any significant info about these
failures. We gather console logs, but on these runs they are empty.
It's easy to blame everything onto GCE but I don't have any bit of
information that would point either way. These failures just appear
randomly in production and usually in batches...

      parent reply	other threads:[~2019-07-05 13:18 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-26 17:27 INFO: rcu detected stall in ext4_write_checks syzbot
2019-06-26 18:42 ` Theodore Ts'o
2019-06-26 21:03   ` Theodore Ts'o
2019-06-26 22:47     ` Theodore Ts'o
2019-07-05 13:24       ` Dmitry Vyukov
2019-07-05 15:16         ` Paul E. McKenney
2019-07-05 15:47           ` Amir Goldstein
2019-07-05 15:48           ` Dmitry Vyukov
2019-07-05 19:10             ` Paul E. McKenney
2019-07-06  4:28               ` Theodore Ts'o
2019-07-06  6:16                 ` Paul E. McKenney
2019-07-06 15:02                   ` Theodore Ts'o
2019-07-06 18:03                     ` Paul E. McKenney
2019-07-07  1:16                       ` Paul E. McKenney
2019-07-14 14:48                         ` Dmitry Vyukov
2019-07-14 18:49                           ` Paul E. McKenney
2019-07-15 13:29                             ` Peter Zijlstra
2019-07-15 13:33                               ` Dmitry Vyukov
2019-07-15 13:46                                 ` Peter Zijlstra
2019-07-15 14:02                                   ` Paul E. McKenney
2019-07-22 10:03                                   ` Dmitry Vyukov
2019-07-23  8:51                                     ` Dmitry Vyukov
2019-07-14 19:05                           ` Theodore Ts'o
2019-07-14 19:29                             ` Paul E. McKenney
2019-07-15  3:10                               ` Paul E. McKenney
2019-07-15 13:01                                 ` Paul E. McKenney
2019-07-15 13:29                                   ` Dmitry Vyukov
2019-07-15 13:39                                   ` Peter Zijlstra
2019-07-15 14:03                                     ` Paul E. McKenney
2019-07-15 13:22                         ` Peter Zijlstra
2019-07-05 13:18   ` Dmitry Vyukov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACT4Y+aHgz9cPa7OnVsNeHim72i6zVdjnbvVb0Z1oN2B8QLZqg@mail.gmail.com \
    --to=dvyukov@google.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=davem@davemloft.net \
    --cc=eladr@mellanox.com \
    --cc=idosch@mellanox.com \
    --cc=jiri@mellanox.com \
    --cc=john.stultz@linaro.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=syzbot+4bfbbf28a2e50ab07368@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=syzkaller@googlegroups.com \
    --cc=tglx@linutronix.de \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).