Re: [PATCH] KVM: X86: Fix scan ioapic use-before-initialization

From: Dmitry Vyukov <dvyukov@google.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Paolo Bonzini" <pbonzini@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	"Wanpeng Li" <kernellwp@gmail.com>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	dledford@redhat.com, "KVM list" <kvm@vger.kernel.org>,
	"Radim Krčmář" <rkrcmar@redhat.com>, "Wei Wu" <ww9210@gmail.com>,
	"Kostya Serebryany" <kcc@google.com>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	syzkaller <syzkaller@googlegroups.com>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Chris Mason" <clm@fb.com>, "Jonathan Corbet" <corbet@lwn.net>,
	"Kees Cook" <keescook@google.com>,
	"Laura Abbott" <labbott@redhat.com>,
	"Olof Johansson" <olofj@google.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Theodore Tso" <tytso@google.com>,
	Tim.Bird@sony.com
Subject: Re: [PATCH] KVM: X86: Fix scan ioapic use-before-initialization
Date: Fri, 28 Dec 2018 10:43:11 +0100	[thread overview]
Message-ID: <CACT4Y+YRs3HLf7-0AhMPMuvHGvKHge_7R5Qo6Q9H1A2w1fQ39A@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=whZ3_T9b=pac=H1tvdjgX0vjE7FDsC=LfQYDmiY5Aq_kg@mail.gmail.com>

On Thu, Dec 27, 2018 at 6:00 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Thu, Dec 27, 2018 at 6:28 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > Lots of kernel bug reports routinely get lost on mailing lists, which is bad.
>
> Nobody reads the kernel mailing list directly - there's just too much traffic.
>
> And honestly, even fewer people then read the syzbot reports, because
> they are so illegible and inhuman. They're better than they used to
> be, but they are still basically impossible to parse without a lot of
> effort.
>
> And no, syzbot didn't really report the bug with any specificity - it
> wasn't clear *which* commit it was that caused it, so reading that
> syzbot report, at no point was it then obvious that the original patch
> had issues.
>
> See the problem?
>
> So the issue seems to be that syzbot is simply not useful enough. It's
> output is too rough for people to take it seriously. You see how the
> report by Wei Wu then got traction, because Wei took a syzbot report
> and added some human background and distilled it down to not be
> "here's a big dump of random information".
>
> So I suspect syzbot should strive to make for a much stronger
> signal-to-noise ratio. For example, if syzbot had actually bisected
> the bug it reported, that would have been quite a strong signal.
>
> Compare these two emails:
>
>     https://lore.kernel.org/lkml/1542702858-4318-1-git-send-email-wanpengli@tencent.com/
>     https://lore.kernel.org/lkml/0000000000001c7a5c0573607583@google.com/
>
> and note the absolutely huge difference in actual *information* (as
> opposed to raw data).
>
> Any possibility that syzbot would actually do the bisection once it
> finds a problem, and write a report based on the commit that caused
> the problem rather than just a problem dump?

Hi Linus,

I agree there are things to improve in syzbot. Bisection is useful and
we will implement it. This is a popular user request, we keep track of
all them, so nothing is lost:
https://github.com/google/syzkaller/issues?q=label%3A"syzbot+user+request"

But let's not reduce the discussion to syzbot improvements and
distract it from the main point, which is:

> Nobody reads the kernel mailing list directly - there's just too much traffic.

As the result bug reports and patches got lots and this is bad and it
would be useful to stop it from happening and there are known ways for
this.

syzbot not doing bisection is not the root cause of this and most of
what you said does not have place.

1. I specifically added a case where it happens the other way around:
human report was ignored, then syzbot report fixed.

2. syzbot reports are not worse then average human reports, frequently better.
What linked to is not the human report, it's a reply from a developer
that includes a fix with explanation. If you look at the original
human reports, then can see that they miss kernel config, full console
output (sometimes there is some useful information before the crash):
https://www.spinics.net/lists/kvm/msg177705.html
https://www.spinics.net/lists/kvm/msg177704.html
syzbot reports are also better formatted, as one does not need to
parse custom prose to digest information:
https://groups.google.com/forum/#!msg/syzkaller/40Ts5kOqJlo/tEYv9j-3AQAJ

3. Bisection is useful, but not important in most cases.
First of all, both human reports did not contain bisection info. Which
clearly means that bisection is not the reason syzbot reports were not
acted on.
We see fix rate of 75% for reports without bisection. Lots of bugs
don't require even a reproducer (e.g. a wrong local if condition), fix
rate for such reports is 66% for an absolute number of hundreds. For
simple bugs nothing other then a crash message is required. For more
complex ones there is an infinite tail of custom information. E.g.
bisection may not help when a latent bug is unmasked, or when it's
bisection just to addition of WARN_ON. Say, for kvm bugs a critical
piece may be cpu stepping.

4. syzbot reports are useful and signal-to-noise ratio is high:
https://syzkaller.appspot.com/?fixed=upstream
You can also ask developers who fixed dozens of syzbot reports.

5. Developers who look at syzbot reports acknowledge that they are
lost because of the kernel development process.
This one that I linked:
https://groups.google.com/d/msg/syzkaller-bugs/o_-OeMyoTwg/UOZv1d2IAgAJ
Steven Rostedt says that it wasn't lost because it did not contain
bisection information, but because "Yeah, that time was quite busy for
me. I guess I failed to get time to look into it when it was first
reported [and then it was simply lost with no chances of recovering]".
Here the bug was acknowledged:
https://groups.google.com/d/msg/syzkaller/WA6MdAfCYS0/1rSe_qDeAgAJ
but then simply lost for half a year:
https://groups.google.com/forum/#!msg/syzkaller-bugs/wFUedfOK2Rw/waUrQYOxAQAJ

So while I see potential for syzbot improvements, I see the problem
that leads to lost reports/patches in the kernel development process.