linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dmitry Vyukov <dvyukov@google.com>
To: Qian Cai <cai@lca.pw>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Xu <peterx@redhat.com>, LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>, Jens Axboe <axboe@kernel.dk>,
	Christoph Lameter <cl@linux.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	syzkaller <syzkaller@googlegroups.com>,
	Dan Rue <dan.rue@linaro.org>
Subject: Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
Date: Tue, 14 Apr 2020 13:12:50 +0200	[thread overview]
Message-ID: <CACT4Y+ZE1XhYpTsjP1J1PyUsEHYKvchww71aHb7UnSk5=4xUrw@mail.gmail.com> (raw)
In-Reply-To: <7325374A-6072-44E4-85EE-F97FC7E8565F@lca.pw>

On Tue, Apr 14, 2020 at 12:06 AM Qian Cai <cai@lca.pw> wrote:
> > On Apr 9, 2020, at 7:29 PM, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> > Hi Linus,
> >
> > On Thu, 9 Apr 2020 09:32:32 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >>
> >> On Thu, Apr 9, 2020 at 5:55 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> >>>
> >>> linux-next is boot-broken for more than a month and bugs are piling
> >>> onto bugs, I've seen at least 3 different ones.
> >>> syzbot can't get any working linux-next build for testing for a very
> >>> long time now.
> >>
> >> Ouch.
> >>
> >> Ok, that's not good. It means that linux-next has basically only done
> >> build-testing this whole cycle.
> >
> > Well, there are other CI's beyond syzbot .. Does syzbot only build/test
> > a single kernel arch/config?
> >
> >> Stephen, Dmitry - is there some way linux-next could possibly kick out
> >> trees more aggressively if syzbot can't even boot?

Hello all,

Sorry for corona/holiday-delays. I will try to answer/comment on all
things in this thread in this email.

AI: we need to CC linux-next@ on linux-next build/boot failures. I
will work on this.
We have functionality to CC given emails on _all_ bugs on the given
tree, but we don't have this for build/boot bugs only. I will try to
add this soon.
Stephen, do you want to be CCed as well? Or just linux-next@?

> So old bugs generally should be aged out

This actually happens now.
Bugs without reproducers are auto-closed after 60-120 days since last
occurrence (based on past frequency). And for linux-next the range is
40-60 days.
Bugs with reproducers are not auto-closed. But they are fix bisected
and cause bisected, both of which are only ~66% correct, but still
frequently provide a useful signal. Also bugs with reproducers are
just generally easier to handle.

Another important distinction from Bugzilla is that syzbot dashboard
has up-to-date "Last crash time" information. Click on the "Last"
column here:
https://syzkaller.appspot.com/upstream
It's very easy to ignore everything that happened months ago for
starters, if that's the concern.

So it's not as perfect as it would be with a dedicated human team
attached, but I would say it's now in a reasonable shape with ~400
open bugs that happened within the last month.
And now we have data to confirm that "old" does not mean "irrelevant".
Our leader:
BUG: please report to dccp@vger.kernel.org => prev = 0, last = 0 at
net/dccp/ccids/lib/packet_history.c:LINE/tfrc_rx_hist_sample_rtt()
https://syzkaller.appspot.com/bug?id=0881c535c265ca965edc49c0ac3d0a9850d26eb1
was first triggered 964 days ago, but pretty much still there all that time.

> It would be nice if there was some way we could triage Syzkaller
> bugs into different buckets.

Though, yes, I am afraid of stepping onto the slippery slope of
implementing a full-fledged bug tracking system, I think syzbot will
gather more bug tracker features and tags will happen. We still have
https://github.com/google/syzkaller/issues/608 open and it's mainly
the question of allocating resources for implementation and figuring
out the actual tags hierarchy.
For login and credentials, I guess we will go with just "whoever can
send emails is a root" because we are doing this already anyways
(closing a bug is more critical than changing a tag) :)

Re panic_on_warn.
We don't have a dedicated engineer to sheriff and give manual
consideration and judgement to each case. And as Qian noted, in such
circumstances it's reasonable to don't trust anything after a warning.
Some notorious examples: LOCKDEP warnings disable LOCKDEP; so if we
boot in such state with eyes closed and then try to do fuzzing, or
"better" test a patch for a LOCKDEP error, or do bisection of a
LOCKDEP error, we will immediately give bogus testing results or
bisection culprit.
Or a warning about hung task may re-appear later during testing and
confuse results again.
Or if we ignore KASAN warning, we boot potentially with corrupted
memory with who-knows-what consequences.
A "normal" WARNING may be benign (misuse of WARNING), or maybe not.
Impossible to figure out automatically. And in the end, if we ignore
that, who/when will notice and fix that?
We get this far with this black-and-white criteria for kernel bugs. I
think it had some positive effects on a number of areas, as we go
forward I think it's better to extend panic_on_warn to more testing
systems. Then non-fatal bugs will be no different from fatal bugs
during boot, which we need to handle in a reasonable timeframe anyway.
Which gets me to the next "interesting" point.

> Well, there are other CI's beyond syzbot.
> On the other hand, this makes me worry who is testing on linux-next every day.

How do these use-after-free's and locking bugs get past the
unit-testing systems (which syzbot is not) and remain unnoticed for so
long?...
syzbot uses the dumbest VMs (GCE), so everything it triggers during
boot should be triggerable pretty much everywhere.
It seems to be an action point for the testing systems. "Boot to ssh"
is not the best criteria. Again if there is a LOCKDEP error, we are
not catching any more LOCKDEP errors during subsequent testing. If
there is a use-after-free, that's a serious error on its own and KASAN
produces only 1 error by default as well. And as far as I understand,
lots of kernel testing systems don't even enable KASAN, which is very
wrong.
I've talked to +Dan Rue re this few days ago. Hopefully LKFT will
start catching these as part of unit testing. Which should help with
syzbot testing as well.

  parent reply	other threads:[~2020-04-14 11:13 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-08  1:40 [PATCH 0/2] mm: Two small fixes for recent syzbot reports Peter Xu
2020-04-08  1:40 ` [PATCH 1/2] mm/mempolicy: Allow lookup_node() to handle fatal signal Peter Xu
2020-04-08 10:21   ` Michal Hocko
2020-04-08 14:20     ` Peter Xu
2020-04-08 14:30       ` Michal Hocko
2020-04-08 15:24         ` Peter Xu
2020-04-08 15:26           ` Michal Hocko
2020-04-09  7:02   ` Michal Hocko
2020-04-09 12:52     ` Peter Xu
2020-04-09 13:00       ` Peter Xu
2020-04-09 13:53       ` Michal Hocko
2020-04-09 16:42     ` Linus Torvalds
2020-04-14 11:04       ` Michal Hocko
2020-04-14 13:49         ` Peter Xu
2020-04-14 14:18           ` Michal Hocko
2020-04-20 12:47         ` Michal Hocko
2020-04-20 17:31           ` Linus Torvalds
2020-04-21  7:09             ` Michal Hocko
2020-04-08  1:40 ` [PATCH 2/2] mm/gup: Mark lock taken only after a successful retake Peter Xu
2020-04-09  0:47 ` [PATCH 0/2] mm: Two small fixes for recent syzbot reports Andrew Morton
2020-04-09 11:49   ` Matthew Wilcox
2020-04-09 13:00     ` Dmitry Vyukov
2020-04-09 18:16       ` Andrew Morton
2020-04-09 18:53         ` Linus Torvalds
2020-04-09 19:12           ` Andrew Morton
2020-04-09 19:46             ` Linus Torvalds
2020-04-09 19:56               ` Matthew Wilcox
2020-04-09 19:58                 ` Linus Torvalds
2020-04-09 20:27                   ` Eric Biggers
2020-04-09 20:34                     ` Linus Torvalds
2020-04-09 23:34                       ` Stephen Rothwell
2020-04-10  1:11                       ` Theodore Y. Ts'o
2020-04-09 12:55   ` Dmitry Vyukov
2020-04-09 16:32     ` Linus Torvalds
2020-04-09 16:58       ` Qian Cai
2020-04-09 17:05         ` Linus Torvalds
2020-04-09 17:58           ` Qian Cai
2020-04-09 18:06             ` Linus Torvalds
2020-04-09 21:14               ` Qian Cai
2020-04-10 13:12                 ` Tetsuo Handa
2020-04-10 14:26                   ` Qian Cai
2020-04-10 17:26                     ` Andrew Morton
2020-04-10 19:46                       ` Qian Cai
2020-04-09 23:29       ` Stephen Rothwell
2020-04-13 22:06         ` Qian Cai
2020-04-13 23:05           ` Jens Axboe
2020-04-14 11:12           ` Dmitry Vyukov [this message]
2020-04-14 11:59             ` Qian Cai
2020-04-14 12:05               ` Dmitry Vyukov
2020-04-14 19:28             ` Dan Rue
2020-04-15 11:09               ` Dmitry Vyukov
2020-04-15 16:23                 ` Dan Rue
2020-04-16  0:34             ` Stephen Rothwell
2020-05-11 15:29               ` Dmitry Vyukov
     [not found] <20200414040717.22040-1-hdanton@sina.com>
2020-04-14  4:31 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACT4Y+ZE1XhYpTsjP1J1PyUsEHYKvchww71aHb7UnSk5=4xUrw@mail.gmail.com' \
    --to=dvyukov@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=cai@lca.pw \
    --cc=cl@linux.com \
    --cc=dan.rue@linaro.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterx@redhat.com \
    --cc=sfr@canb.auug.org.au \
    --cc=syzkaller@googlegroups.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).