netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: kernel panic: corrupted stack end in wb_workfn
       [not found] <000000000000b05d0c057e492e33@google.com>
@ 2019-03-17 20:49 ` syzbot
  2019-03-19 18:03   ` Xin Long
  2019-03-20  9:56   ` Andrey Ryabinin
  0 siblings, 2 replies; 14+ messages in thread
From: syzbot @ 2019-03-17 20:49 UTC (permalink / raw)
  To: akpm, aryabinin, cai, davem, dvyukov, guro, hannes, jbacik,
	ktkhai, linux-kernel, linux-mm, linux-sctp, mgorman, mhocko,
	netdev, nhorman, shakeelb, syzkaller-bugs, viro, vyasevich,
	willy

syzbot has bisected this bug to:

commit c981f254cc82f50f8cb864ce6432097b23195b9c
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sun Jan 7 18:19:09 2018 +0000

     sctp: use vmemdup_user() rather than badly open-coding memdup_user()

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=137bcecf200000
start commit:   c981f254 sctp: use vmemdup_user() rather than badly open-c..
git tree:       upstream
final crash:    https://syzkaller.appspot.com/x/report.txt?x=10fbcecf200000
console output: https://syzkaller.appspot.com/x/log.txt?x=177bcecf200000
kernel config:  https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7
dashboard link: https://syzkaller.appspot.com/bug?extid=ec1b7575afef85a0e5ca
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16a9a84b400000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17199bb3400000

Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com
Fixes: c981f254 ("sctp: use vmemdup_user() rather than badly open-coding  
memdup_user()")

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-17 20:49 ` kernel panic: corrupted stack end in wb_workfn syzbot
@ 2019-03-19 18:03   ` Xin Long
  2019-03-20  9:56   ` Andrey Ryabinin
  1 sibling, 0 replies; 14+ messages in thread
From: Xin Long @ 2019-03-19 18:03 UTC (permalink / raw)
  To: syzbot
  Cc: akpm, aryabinin, cai, davem, Dmitry Vyukov, guro, hannes, jbacik,
	Kirill Tkhai, LKML, linux-mm, linux-sctp, mgorman, mhocko,
	network dev, Neil Horman, shakeelb, syzkaller-bugs, viro,
	Vlad Yasevich, willy

On Mon, Mar 18, 2019 at 4:49 AM syzbot
<syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com> wrote:
>
> syzbot has bisected this bug to:
>
> commit c981f254cc82f50f8cb864ce6432097b23195b9c
> Author: Al Viro <viro@zeniv.linux.org.uk>
> Date:   Sun Jan 7 18:19:09 2018 +0000
>
>      sctp: use vmemdup_user() rather than badly open-coding memdup_user()
'addrs_size' is passed from users, we actually used GFP_USER to
put some more restrictions on it in this commit:

commit cacc06215271104b40773c99547c506095db6ad4
Author: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date:   Mon Nov 30 14:32:54 2015 -0200

    sctp: use GFP_USER for user-controlled kmalloc

However, vmemdup_user() will 'ignore' this flag when going to vmalloc_*(),
So we probably should fix it by using memdup_user() to avoid that
open-coding part instead:

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index ea95cd4..e5bcade 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -999,7 +999,7 @@ static int sctp_setsockopt_bindx(struct sock *sk,
        if (unlikely(addrs_size <= 0))
                return -EINVAL;

-       kaddrs = vmemdup_user(addrs, addrs_size);
+       kaddrs = memdup_user(addrs, addrs_size);

>
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=137bcecf200000
> start commit:   c981f254 sctp: use vmemdup_user() rather than badly open-c..
> git tree:       upstream
> final crash:    https://syzkaller.appspot.com/x/report.txt?x=10fbcecf200000
> console output: https://syzkaller.appspot.com/x/log.txt?x=177bcecf200000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7
> dashboard link: https://syzkaller.appspot.com/bug?extid=ec1b7575afef85a0e5ca
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16a9a84b400000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17199bb3400000
>
> Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com
> Fixes: c981f254 ("sctp: use vmemdup_user() rather than badly open-coding
> memdup_user()")

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-17 20:49 ` kernel panic: corrupted stack end in wb_workfn syzbot
  2019-03-19 18:03   ` Xin Long
@ 2019-03-20  9:56   ` Andrey Ryabinin
  2019-03-20  9:59     ` Dmitry Vyukov
  1 sibling, 1 reply; 14+ messages in thread
From: Andrey Ryabinin @ 2019-03-20  9:56 UTC (permalink / raw)
  To: syzbot, akpm, cai, davem, dvyukov, guro, hannes, jbacik, ktkhai,
	linux-kernel, linux-mm, linux-sctp, mgorman, mhocko, netdev,
	nhorman, shakeelb, syzkaller-bugs, viro, vyasevich, willy
  Cc: Xin Long



On 3/17/19 11:49 PM, syzbot wrote:
> syzbot has bisected this bug to:
> 
> commit c981f254cc82f50f8cb864ce6432097b23195b9c
> Author: Al Viro <viro@zeniv.linux.org.uk>
> Date:   Sun Jan 7 18:19:09 2018 +0000
> 
>     sctp: use vmemdup_user() rather than badly open-coding memdup_user()
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=137bcecf200000
> start commit:   c981f254 sctp: use vmemdup_user() rather than badly open-c..
> git tree:       upstream
> final crash:    https://syzkaller.appspot.com/x/report.txt?x=10fbcecf200000
> console output: https://syzkaller.appspot.com/x/log.txt?x=177bcecf200000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7
> dashboard link: https://syzkaller.appspot.com/bug?extid=ec1b7575afef85a0e5ca
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16a9a84b400000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17199bb3400000
> 
> Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com
> Fixes: c981f254 ("sctp: use vmemdup_user() rather than badly open-coding memdup_user()")

From bisection log:

	testing release v4.17
	testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
	run #0: crashed: kernel panic: corrupted stack end in wb_workfn
	run #1: crashed: kernel panic: corrupted stack end in worker_thread
	run #2: crashed: kernel panic: Out of memory and no killable processes...
	run #3: crashed: kernel panic: corrupted stack end in wb_workfn
	run #4: crashed: kernel panic: corrupted stack end in wb_workfn
	run #5: crashed: kernel panic: corrupted stack end in wb_workfn
	run #6: crashed: kernel panic: corrupted stack end in wb_workfn
	run #7: crashed: kernel panic: corrupted stack end in wb_workfn
	run #8: crashed: kernel panic: Out of memory and no killable processes...
	run #9: crashed: kernel panic: corrupted stack end in wb_workfn
	testing release v4.16
	testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
	run #0: OK
	run #1: OK
	run #2: OK
	run #3: OK
	run #4: OK
	run #5: crashed: kernel panic: Out of memory and no killable processes...
	run #6: OK
	run #7: crashed: kernel panic: Out of memory and no killable processes...
	run #8: OK
	run #9: OK
	testing release v4.15
	testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
	all runs: OK
	# git bisect start v4.16 v4.15

Why bisect started between 4.16 4.15 instead of 4.17 4.16?


	testing commit c14376de3a1befa70d9811ca2872d47367b48767 with gcc (GCC) 8.1.0
	run #0: crashed: kernel panic: Out of memory and no killable processes...
	run #1: crashed: kernel panic: Out of memory and no killable processes...
	run #2: crashed: kernel panic: Out of memory and no killable processes...
	run #3: crashed: kernel panic: Out of memory and no killable processes...
	run #4: OK
	run #5: OK
	run #6: crashed: WARNING: ODEBUG bug in netdev_freemem
	run #7: crashed: no output from test machine
	run #8: OK
	run #9: OK
	# git bisect bad c14376de3a1befa70d9811ca2872d47367b48767

Why c14376de3a1befa70d9811ca2872d47367b48767 is bad? There was no stack corruption.
It looks like the syzbot were bisecting a different bug - "kernel panic: Out of memory and no killable processes..."
And bisection for that bug seems to be correct. kvmalloc() in vmemdup_user() may eat up all memory unlike kmalloc which is limited by KMALLOC_MAX_SIZE (4MB usually).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-20  9:56   ` Andrey Ryabinin
@ 2019-03-20  9:59     ` Dmitry Vyukov
  2019-03-20 10:23       ` Tetsuo Handa
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Vyukov @ 2019-03-20  9:59 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: syzbot, Andrew Morton, Qian Cai, David Miller, guro,
	Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Wed, Mar 20, 2019 at 10:56 AM Andrey Ryabinin
<aryabinin@virtuozzo.com> wrote:
>
> On 3/17/19 11:49 PM, syzbot wrote:
> > syzbot has bisected this bug to:
> >
> > commit c981f254cc82f50f8cb864ce6432097b23195b9c
> > Author: Al Viro <viro@zeniv.linux.org.uk>
> > Date:   Sun Jan 7 18:19:09 2018 +0000
> >
> >     sctp: use vmemdup_user() rather than badly open-coding memdup_user()
> >
> > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=137bcecf200000
> > start commit:   c981f254 sctp: use vmemdup_user() rather than badly open-c..
> > git tree:       upstream
> > final crash:    https://syzkaller.appspot.com/x/report.txt?x=10fbcecf200000
> > console output: https://syzkaller.appspot.com/x/log.txt?x=177bcecf200000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7
> > dashboard link: https://syzkaller.appspot.com/bug?extid=ec1b7575afef85a0e5ca
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16a9a84b400000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17199bb3400000
> >
> > Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com
> > Fixes: c981f254 ("sctp: use vmemdup_user() rather than badly open-coding memdup_user()")
>
> From bisection log:
>
>         testing release v4.17
>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
>         run #2: crashed: kernel panic: Out of memory and no killable processes...
>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
>         run #8: crashed: kernel panic: Out of memory and no killable processes...
>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
>         testing release v4.16
>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
>         run #0: OK
>         run #1: OK
>         run #2: OK
>         run #3: OK
>         run #4: OK
>         run #5: crashed: kernel panic: Out of memory and no killable processes...
>         run #6: OK
>         run #7: crashed: kernel panic: Out of memory and no killable processes...
>         run #8: OK
>         run #9: OK
>         testing release v4.15
>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
>         all runs: OK
>         # git bisect start v4.16 v4.15
>
> Why bisect started between 4.16 4.15 instead of 4.17 4.16?

Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
looks like the right range, no?


>         testing commit c14376de3a1befa70d9811ca2872d47367b48767 with gcc (GCC) 8.1.0
>         run #0: crashed: kernel panic: Out of memory and no killable processes...
>         run #1: crashed: kernel panic: Out of memory and no killable processes...
>         run #2: crashed: kernel panic: Out of memory and no killable processes...
>         run #3: crashed: kernel panic: Out of memory and no killable processes...
>         run #4: OK
>         run #5: OK
>         run #6: crashed: WARNING: ODEBUG bug in netdev_freemem
>         run #7: crashed: no output from test machine
>         run #8: OK
>         run #9: OK
>         # git bisect bad c14376de3a1befa70d9811ca2872d47367b48767
>
> Why c14376de3a1befa70d9811ca2872d47367b48767 is bad? There was no stack corruption.
> It looks like the syzbot were bisecting a different bug - "kernel panic: Out of memory and no killable processes..."
> And bisection for that bug seems to be correct. kvmalloc() in vmemdup_user() may eat up all memory unlike kmalloc which is limited by KMALLOC_MAX_SIZE (4MB usually).

Please see https://github.com/google/syzkaller/blob/master/docs/syzbot.md#bisection
for answer.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-20  9:59     ` Dmitry Vyukov
@ 2019-03-20 10:23       ` Tetsuo Handa
  2019-03-20 10:38         ` Dmitry Vyukov
  0 siblings, 1 reply; 14+ messages in thread
From: Tetsuo Handa @ 2019-03-20 10:23 UTC (permalink / raw)
  To: Dmitry Vyukov, Andrey Ryabinin
  Cc: syzbot, Andrew Morton, Qian Cai, David Miller, guro,
	Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On 2019/03/20 18:59, Dmitry Vyukov wrote:
>> From bisection log:
>>
>>         testing release v4.17
>>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
>>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
>>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
>>         run #2: crashed: kernel panic: Out of memory and no killable processes...
>>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
>>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
>>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
>>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
>>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
>>         run #8: crashed: kernel panic: Out of memory and no killable processes...
>>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
>>         testing release v4.16
>>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
>>         run #0: OK
>>         run #1: OK
>>         run #2: OK
>>         run #3: OK
>>         run #4: OK
>>         run #5: crashed: kernel panic: Out of memory and no killable processes...
>>         run #6: OK
>>         run #7: crashed: kernel panic: Out of memory and no killable processes...
>>         run #8: OK
>>         run #9: OK
>>         testing release v4.15
>>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
>>         all runs: OK
>>         # git bisect start v4.16 v4.15
>>
>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> 
> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> looks like the right range, no?

No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
"Stack corruption" can't manifest as "Out of memory and no killable processes".

"kernel panic: Out of memory and no killable processes..." is completely
unrelated to "kernel panic: corrupted stack end in wb_workfn".

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-20 10:23       ` Tetsuo Handa
@ 2019-03-20 10:38         ` Dmitry Vyukov
  2019-03-20 10:42           ` Dmitry Vyukov
  2019-03-20 13:34           ` Andrey Ryabinin
  0 siblings, 2 replies; 14+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 10:38 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> >> From bisection log:
> >>
> >>         testing release v4.17
> >>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> >>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
> >>         run #2: crashed: kernel panic: Out of memory and no killable processes...
> >>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         run #8: crashed: kernel panic: Out of memory and no killable processes...
> >>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         testing release v4.16
> >>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> >>         run #0: OK
> >>         run #1: OK
> >>         run #2: OK
> >>         run #3: OK
> >>         run #4: OK
> >>         run #5: crashed: kernel panic: Out of memory and no killable processes...
> >>         run #6: OK
> >>         run #7: crashed: kernel panic: Out of memory and no killable processes...
> >>         run #8: OK
> >>         run #9: OK
> >>         testing release v4.15
> >>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> >>         all runs: OK
> >>         # git bisect start v4.16 v4.15
> >>
> >> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> >
> > Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > looks like the right range, no?
>
> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> "Stack corruption" can't manifest as "Out of memory and no killable processes".
>
> "kernel panic: Out of memory and no killable processes..." is completely
> unrelated to "kernel panic: corrupted stack end in wb_workfn".


Do you think this predicate is possible to code? Looking at the
examples we have, distinguishing different bugs does not look feasible
to me. If the predicate is not accurate, you just trade one set of
false positives to another set of false positives and then you at the
beginning of an infinite slippery slope refining it.
Also, if we see a different bug (assuming we can distinguish them),
does it mean that the original bug is not present? Or it's also
present, but we just hit the other one first? This also does not look
feasible to answer. And if you give a wrong answer, bisection goes the
wrong way and we are where we started. Just with more complex code and
things being even harder to explain to other people.
I mean, yes, I agree, kernel bug bisection won't be perfect. But do
you see anything actionable here?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-20 10:38         ` Dmitry Vyukov
@ 2019-03-20 10:42           ` Dmitry Vyukov
  2019-03-20 10:58             ` Tetsuo Handa
  2019-03-20 13:34           ` Andrey Ryabinin
  1 sibling, 1 reply; 14+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 10:42 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Wed, Mar 20, 2019 at 11:38 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >
> > On 2019/03/20 18:59, Dmitry Vyukov wrote:
> > >> From bisection log:
> > >>
> > >>         testing release v4.17
> > >>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> > >>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
> > >>         run #2: crashed: kernel panic: Out of memory and no killable processes...
> > >>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         run #8: crashed: kernel panic: Out of memory and no killable processes...
> > >>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         testing release v4.16
> > >>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> > >>         run #0: OK
> > >>         run #1: OK
> > >>         run #2: OK
> > >>         run #3: OK
> > >>         run #4: OK
> > >>         run #5: crashed: kernel panic: Out of memory and no killable processes...
> > >>         run #6: OK
> > >>         run #7: crashed: kernel panic: Out of memory and no killable processes...
> > >>         run #8: OK
> > >>         run #9: OK
> > >>         testing release v4.15
> > >>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> > >>         all runs: OK
> > >>         # git bisect start v4.16 v4.15
> > >>
> > >> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> > >
> > > Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > > looks like the right range, no?
> >
> > No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> > "Stack corruption" can't manifest as "Out of memory and no killable processes".
> >
> > "kernel panic: Out of memory and no killable processes..." is completely
> > unrelated to "kernel panic: corrupted stack end in wb_workfn".
>
>
> Do you think this predicate is possible to code? Looking at the
> examples we have, distinguishing different bugs does not look feasible
> to me. If the predicate is not accurate, you just trade one set of
> false positives to another set of false positives and then you at the
> beginning of an infinite slippery slope refining it.
> Also, if we see a different bug (assuming we can distinguish them),
> does it mean that the original bug is not present? Or it's also
> present, but we just hit the other one first? This also does not look
> feasible to answer. And if you give a wrong answer, bisection goes the
> wrong way and we are where we started. Just with more complex code and
> things being even harder to explain to other people.
> I mean, yes, I agree, kernel bug bisection won't be perfect. But do
> you see anything actionable here?

I see the larger long term bisection quality improvement (for syzbot
and for everybody else) in doing some actual testing for each kernel
commit before it's being merged into any kernel tree, so that we have
less of these a single program triggers 3 different bugs, stray
unrelated bugs, broken release boots, etc. I don't see how reliable
bisection is possible without that.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-20 10:42           ` Dmitry Vyukov
@ 2019-03-20 10:58             ` Tetsuo Handa
  2019-03-20 13:59               ` Dmitry Vyukov
  0 siblings, 1 reply; 14+ messages in thread
From: Tetsuo Handa @ 2019-03-20 10:58 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On 2019/03/20 19:42, Dmitry Vyukov wrote:
>> I mean, yes, I agree, kernel bug bisection won't be perfect. But do
>> you see anything actionable here?

Allow users to manually tell bisection range when
automatic bisection found a wrong commit.

Also, allow users to specify reproducer program
when automatic bisection found a wrong commit.

Yes, this is anti automation. But since automation can't become perfect,
I'm suggesting manual adjustment. Even if we involve manual adjustment,
the syzbot's plenty CPU resources for building/testing kernels is highly
appreciated (compared to doing manual bisection by building/testing kernels
on personal PC environments).

> 
> I see the larger long term bisection quality improvement (for syzbot
> and for everybody else) in doing some actual testing for each kernel
> commit before it's being merged into any kernel tree, so that we have
> less of these a single program triggers 3 different bugs, stray
> unrelated bugs, broken release boots, etc. I don't see how reliable
> bisection is possible without that.
> 

syzbot currently cannot test kernels with custom patches (unless "#syz test:" requests).
Are you saying that syzbot will become be able to test kernels with custom patches?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-20 10:38         ` Dmitry Vyukov
  2019-03-20 10:42           ` Dmitry Vyukov
@ 2019-03-20 13:34           ` Andrey Ryabinin
  2019-03-20 13:57             ` Dmitry Vyukov
  1 sibling, 1 reply; 14+ messages in thread
From: Andrey Ryabinin @ 2019-03-20 13:34 UTC (permalink / raw)
  To: Dmitry Vyukov, Tetsuo Handa
  Cc: syzbot, Andrew Morton, Qian Cai, David Miller, guro,
	Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long



On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>
>> On 2019/03/20 18:59, Dmitry Vyukov wrote:
>>>> From bisection log:
>>>>
>>>>         testing release v4.17
>>>>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
>>>>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
>>>>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
>>>>         run #2: crashed: kernel panic: Out of memory and no killable processes...
>>>>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
>>>>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
>>>>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
>>>>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
>>>>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
>>>>         run #8: crashed: kernel panic: Out of memory and no killable processes...
>>>>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
>>>>         testing release v4.16
>>>>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
>>>>         run #0: OK
>>>>         run #1: OK
>>>>         run #2: OK
>>>>         run #3: OK
>>>>         run #4: OK
>>>>         run #5: crashed: kernel panic: Out of memory and no killable processes...
>>>>         run #6: OK
>>>>         run #7: crashed: kernel panic: Out of memory and no killable processes...
>>>>         run #8: OK
>>>>         run #9: OK
>>>>         testing release v4.15
>>>>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
>>>>         all runs: OK
>>>>         # git bisect start v4.16 v4.15
>>>>
>>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
>>>
>>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
>>> looks like the right range, no?
>>
>> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
>> "Stack corruption" can't manifest as "Out of memory and no killable processes".
>>
>> "kernel panic: Out of memory and no killable processes..." is completely
>> unrelated to "kernel panic: corrupted stack end in wb_workfn".
> 
> 
> Do you think this predicate is possible to code?

Something like bellow probably would work better than current behavior.

For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.
syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.
Also it might be worth to experiment with using neural networks to identify duplicates.


target_crash = 'kernel panic: corrupted stack end in wb_workfn'
test commit:
	bad = false;
	skip = true;
	foreach run:
		run_started, crashed, crash := run_repro();

		//kernel built, booted, reproducer launched successfully
		if (run_started)
			skip = false;
		if (crashed && is_duplicates(crash, target_crash))
			bad = true;
	
	if (skip)
		git bisect skip;
	else if (bad)
		git bisect bad;
	else
		git bisect good;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-20 13:34           ` Andrey Ryabinin
@ 2019-03-20 13:57             ` Dmitry Vyukov
  2019-03-21  9:45               ` Dmitry Vyukov
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 13:57 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Wed, Mar 20, 2019 at 2:33 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
>
>
>
> On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> > On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >>
> >> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> >>>> From bisection log:
> >>>>
> >>>>         testing release v4.17
> >>>>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> >>>>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
> >>>>         run #2: crashed: kernel panic: Out of memory and no killable processes...
> >>>>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         run #8: crashed: kernel panic: Out of memory and no killable processes...
> >>>>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         testing release v4.16
> >>>>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> >>>>         run #0: OK
> >>>>         run #1: OK
> >>>>         run #2: OK
> >>>>         run #3: OK
> >>>>         run #4: OK
> >>>>         run #5: crashed: kernel panic: Out of memory and no killable processes...
> >>>>         run #6: OK
> >>>>         run #7: crashed: kernel panic: Out of memory and no killable processes...
> >>>>         run #8: OK
> >>>>         run #9: OK
> >>>>         testing release v4.15
> >>>>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> >>>>         all runs: OK
> >>>>         # git bisect start v4.16 v4.15
> >>>>
> >>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> >>>
> >>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> >>> looks like the right range, no?
> >>
> >> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> >> "Stack corruption" can't manifest as "Out of memory and no killable processes".
> >>
> >> "kernel panic: Out of memory and no killable processes..." is completely
> >> unrelated to "kernel panic: corrupted stack end in wb_workfn".
> >
> >
> > Do you think this predicate is possible to code?
>
> Something like bellow probably would work better than current behavior.
>
> For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.

Lots of bugs (half?) manifest differently. On top of this, titles
change as we go back in history. On top of this, if we see a different
bug, it does not mean that the original bug is also not there.
This will sure solve some subset of cases better then the current
logic. But I feel that that subset is smaller then what the current
logic solves.

> syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.

This is very limited set of info. And in the end I think we've seen
all bug types being duped on all other bugs types pair-wise, and at
the same time we've seen all bug types being not dups to all other bug
types. So I don't see where this gets us.
And again as we go back in history all these titles change.

> Also it might be worth to experiment with using neural networks to identify duplicates.
>
>
> target_crash = 'kernel panic: corrupted stack end in wb_workfn'
> test commit:
>         bad = false;
>         skip = true;
>         foreach run:
>                 run_started, crashed, crash := run_repro();
>
>                 //kernel built, booted, reproducer launched successfully
>                 if (run_started)
>                         skip = false;
>                 if (crashed && is_duplicates(crash, target_crash))
>                         bad = true;
>
>         if (skip)
>                 git bisect skip;
>         else if (bad)
>                 git bisect bad;
>         else
>                 git bisect good;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-20 10:58             ` Tetsuo Handa
@ 2019-03-20 13:59               ` Dmitry Vyukov
  0 siblings, 0 replies; 14+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 13:59 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Wed, Mar 20, 2019 at 11:59 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2019/03/20 19:42, Dmitry Vyukov wrote:
> >> I mean, yes, I agree, kernel bug bisection won't be perfect. But do
> >> you see anything actionable here?
>
> Allow users to manually tell bisection range when
> automatic bisection found a wrong commit.
>
> Also, allow users to specify reproducer program
> when automatic bisection found a wrong commit.
>
> Yes, this is anti automation. But since automation can't become perfect,
> I'm suggesting manual adjustment. Even if we involve manual adjustment,
> the syzbot's plenty CPU resources for building/testing kernels is highly
> appreciated (compared to doing manual bisection by building/testing kernels
> on personal PC environments).

FTR: provided an extended answer here:
https://groups.google.com/d/msg/syzkaller-bugs/1BSkmb_fawo/DOcDxv_KAgAJ


> > I see the larger long term bisection quality improvement (for syzbot
> > and for everybody else) in doing some actual testing for each kernel
> > commit before it's being merged into any kernel tree, so that we have
> > less of these a single program triggers 3 different bugs, stray
> > unrelated bugs, broken release boots, etc. I don't see how reliable
> > bisection is possible without that.
> >
>
> syzbot currently cannot test kernels with custom patches (unless "#syz test:" requests).
> Are you saying that syzbot will become be able to test kernels with custom patches?

I mean if we start improving kernel quality over time so that we have
less of these a single program triggers 3 different bugs, stray
unrelated bugs, broken release boots, etc, it will improve bisection
quality for everybody (beside being hugely useful in itself).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-20 13:57             ` Dmitry Vyukov
@ 2019-03-21  9:45               ` Dmitry Vyukov
  2019-03-21  9:51                 ` Dmitry Vyukov
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Vyukov @ 2019-03-21  9:45 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Wed, Mar 20, 2019 at 2:57 PM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Mar 20, 2019 at 2:33 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
> >
> >
> >
> > On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> > > On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> > > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > >>
> > >> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> > >>>> From bisection log:
> > >>>>
> > >>>>         testing release v4.17
> > >>>>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> > >>>>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
> > >>>>         run #2: crashed: kernel panic: Out of memory and no killable processes...
> > >>>>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         run #8: crashed: kernel panic: Out of memory and no killable processes...
> > >>>>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         testing release v4.16
> > >>>>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> > >>>>         run #0: OK
> > >>>>         run #1: OK
> > >>>>         run #2: OK
> > >>>>         run #3: OK
> > >>>>         run #4: OK
> > >>>>         run #5: crashed: kernel panic: Out of memory and no killable processes...
> > >>>>         run #6: OK
> > >>>>         run #7: crashed: kernel panic: Out of memory and no killable processes...
> > >>>>         run #8: OK
> > >>>>         run #9: OK
> > >>>>         testing release v4.15
> > >>>>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> > >>>>         all runs: OK
> > >>>>         # git bisect start v4.16 v4.15
> > >>>>
> > >>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> > >>>
> > >>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > >>> looks like the right range, no?
> > >>
> > >> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> > >> "Stack corruption" can't manifest as "Out of memory and no killable processes".
> > >>
> > >> "kernel panic: Out of memory and no killable processes..." is completely
> > >> unrelated to "kernel panic: corrupted stack end in wb_workfn".
> > >
> > >
> > > Do you think this predicate is possible to code?
> >
> > Something like bellow probably would work better than current behavior.
> >
> > For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.
>
> Lots of bugs (half?) manifest differently. On top of this, titles
> change as we go back in history. On top of this, if we see a different
> bug, it does not mean that the original bug is also not there.
> This will sure solve some subset of cases better then the current
> logic. But I feel that that subset is smaller then what the current
> logic solves.

Counter-examples come up in basically every other bisection.
For example:

bisecting cause commit starting from ccda4af0f4b92f7b4c308d3acc262f4a7e3affad
building syzkaller on 5f5f6d14e80b8bd6b42db961118e902387716bcb
testing commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
testing release v4.19
testing commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
testing release v4.18
testing commit 94710cac0ef4ee177a63b5227664b38c95bbf703 with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
testing release v4.17
testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test

That's a different crash title, unless somebody explicitly code this case.

Or, what crash is this?

testing commit 52358cb5a310990ea5069f986bdab3620e01181f with gcc (GCC) 8.1.0
run #1: crashed: general protection fault in cpuacct_charge
run #2: crashed: WARNING: suspicious RCU usage in corrupted
run #3: crashed: general protection fault in cpuacct_charge
run #4: crashed: BUG: unable to handle kernel paging request in ipt_do_table
run #5: crashed: KASAN: stack-out-of-bounds Read in cpuacct_charge
run #6: crashed: WARNING: suspicious RCU usage
run #7: crashed: no output from test machine
run #8: crashed: no output from test machine


Or, that "INFO: trying to register non-static key in can_notifier"
does not do any testing, but is "WARNING in dma_buf_vunmap" still
there or not?

testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
all runs: crashed: WARNING in dma_buf_vunmap
testing release v4.11
testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
all runs: OK
# git bisect start v4.12 v4.11
Bisecting: 7831 revisions left to test after this (roughly 13 steps)
[2bd80401743568ced7d303b008ae5298ce77e695] Merge tag 'gpio-v4.12-1' of
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
testing commit 2bd80401743568ced7d303b008ae5298ce77e695 with gcc (GCC) 7.3.0
all runs: crashed: INFO: trying to register non-static key in can_notifier
# git bisect bad 2bd80401743568ced7d303b008ae5298ce77e695
Bisecting: 3853 revisions left to test after this (roughly 12 steps)
[8d65b08debc7e62b2c6032d7fe7389d895b92cbc] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
testing commit 8d65b08debc7e62b2c6032d7fe7389d895b92cbc with gcc (GCC) 7.3.0
all runs: crashed: INFO: trying to register non-static key in can_notifier
# git bisect bad 8d65b08debc7e62b2c6032d7fe7389d895b92cbc
Bisecting: 2022 revisions left to test after this (roughly 11 steps)
[cec381919818a9a0cb85600b3c82404bdd38cf36] Merge tag
'mac80211-next-for-davem-2017-04-28' of
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
testing commit cec381919818a9a0cb85600b3c82404bdd38cf36 with gcc (GCC) 5.5.0
all runs: crashed: INFO: trying to register non-static key in can_notifier






> > syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.
>
> This is very limited set of info. And in the end I think we've seen
> all bug types being duped on all other bugs types pair-wise, and at
> the same time we've seen all bug types being not dups to all other bug
> types. So I don't see where this gets us.
> And again as we go back in history all these titles change.
>
> > Also it might be worth to experiment with using neural networks to identify duplicates.
> >
> >
> > target_crash = 'kernel panic: corrupted stack end in wb_workfn'
> > test commit:
> >         bad = false;
> >         skip = true;
> >         foreach run:
> >                 run_started, crashed, crash := run_repro();
> >
> >                 //kernel built, booted, reproducer launched successfully
> >                 if (run_started)
> >                         skip = false;
> >                 if (crashed && is_duplicates(crash, target_crash))
> >                         bad = true;
> >
> >         if (skip)
> >                 git bisect skip;
> >         else if (bad)
> >                 git bisect bad;
> >         else
> >                 git bisect good;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-21  9:45               ` Dmitry Vyukov
@ 2019-03-21  9:51                 ` Dmitry Vyukov
  2019-03-21 11:41                   ` Tetsuo Handa
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Vyukov @ 2019-03-21  9:51 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Thu, Mar 21, 2019 at 10:45 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Mar 20, 2019 at 2:57 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > On Wed, Mar 20, 2019 at 2:33 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
> > >
> > >
> > >
> > > On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> > > > On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> > > > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > > >>
> > > >> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> > > >>>> From bisection log:
> > > >>>>
> > > >>>>         testing release v4.17
> > > >>>>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> > > >>>>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
> > > >>>>         run #2: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         run #8: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         testing release v4.16
> > > >>>>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> > > >>>>         run #0: OK
> > > >>>>         run #1: OK
> > > >>>>         run #2: OK
> > > >>>>         run #3: OK
> > > >>>>         run #4: OK
> > > >>>>         run #5: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>>         run #6: OK
> > > >>>>         run #7: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>>         run #8: OK
> > > >>>>         run #9: OK
> > > >>>>         testing release v4.15
> > > >>>>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> > > >>>>         all runs: OK
> > > >>>>         # git bisect start v4.16 v4.15
> > > >>>>
> > > >>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> > > >>>
> > > >>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > > >>> looks like the right range, no?
> > > >>
> > > >> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> > > >> "Stack corruption" can't manifest as "Out of memory and no killable processes".
> > > >>
> > > >> "kernel panic: Out of memory and no killable processes..." is completely
> > > >> unrelated to "kernel panic: corrupted stack end in wb_workfn".
> > > >
> > > >
> > > > Do you think this predicate is possible to code?
> > >
> > > Something like bellow probably would work better than current behavior.
> > >
> > > For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.
> >
> > Lots of bugs (half?) manifest differently. On top of this, titles
> > change as we go back in history. On top of this, if we see a different
> > bug, it does not mean that the original bug is also not there.
> > This will sure solve some subset of cases better then the current
> > logic. But I feel that that subset is smaller then what the current
> > logic solves.
>
> Counter-examples come up in basically every other bisection.
> For example:
>
> bisecting cause commit starting from ccda4af0f4b92f7b4c308d3acc262f4a7e3affad
> building syzkaller on 5f5f6d14e80b8bd6b42db961118e902387716bcb
> testing commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
> testing release v4.19
> testing commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
> testing release v4.18
> testing commit 94710cac0ef4ee177a63b5227664b38c95bbf703 with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
> testing release v4.17
> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test


And to make things even more interesting, this later changes to "BUG:
unable to handle kernel NULL pointer dereference in vb2_vmalloc_put":

testing release v4.12
testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
all runs: crashed: general protection fault in refcount_sub_and_test
testing release v4.11
testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
all runs: crashed: BUG: unable to handle kernel NULL pointer
dereference in vb2_vmalloc_put

And since the original bug is in vb2 subsystem
(https://syzkaller.appspot.com/bug?id=17535f4bf5b322437f7c639b59161ce343fc55a9),
it's actually not clear even for me, if we should treat it as the same
bug or not. May be different manifestation of the same root cause, or
a different bug around.





> That's a different crash title, unless somebody explicitly code this case.
>
> Or, what crash is this?
>
> testing commit 52358cb5a310990ea5069f986bdab3620e01181f with gcc (GCC) 8.1.0
> run #1: crashed: general protection fault in cpuacct_charge
> run #2: crashed: WARNING: suspicious RCU usage in corrupted
> run #3: crashed: general protection fault in cpuacct_charge
> run #4: crashed: BUG: unable to handle kernel paging request in ipt_do_table
> run #5: crashed: KASAN: stack-out-of-bounds Read in cpuacct_charge
> run #6: crashed: WARNING: suspicious RCU usage
> run #7: crashed: no output from test machine
> run #8: crashed: no output from test machine
>
>
> Or, that "INFO: trying to register non-static key in can_notifier"
> does not do any testing, but is "WARNING in dma_buf_vunmap" still
> there or not?
>
> testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
> all runs: crashed: WARNING in dma_buf_vunmap
> testing release v4.11
> testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
> all runs: OK
> # git bisect start v4.12 v4.11
> Bisecting: 7831 revisions left to test after this (roughly 13 steps)
> [2bd80401743568ced7d303b008ae5298ce77e695] Merge tag 'gpio-v4.12-1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
> testing commit 2bd80401743568ced7d303b008ae5298ce77e695 with gcc (GCC) 7.3.0
> all runs: crashed: INFO: trying to register non-static key in can_notifier
> # git bisect bad 2bd80401743568ced7d303b008ae5298ce77e695
> Bisecting: 3853 revisions left to test after this (roughly 12 steps)
> [8d65b08debc7e62b2c6032d7fe7389d895b92cbc] Merge
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
> testing commit 8d65b08debc7e62b2c6032d7fe7389d895b92cbc with gcc (GCC) 7.3.0
> all runs: crashed: INFO: trying to register non-static key in can_notifier
> # git bisect bad 8d65b08debc7e62b2c6032d7fe7389d895b92cbc
> Bisecting: 2022 revisions left to test after this (roughly 11 steps)
> [cec381919818a9a0cb85600b3c82404bdd38cf36] Merge tag
> 'mac80211-next-for-davem-2017-04-28' of
> git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
> testing commit cec381919818a9a0cb85600b3c82404bdd38cf36 with gcc (GCC) 5.5.0
> all runs: crashed: INFO: trying to register non-static key in can_notifier
>
>
>
>
>
>
> > > syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.
> >
> > This is very limited set of info. And in the end I think we've seen
> > all bug types being duped on all other bugs types pair-wise, and at
> > the same time we've seen all bug types being not dups to all other bug
> > types. So I don't see where this gets us.
> > And again as we go back in history all these titles change.
> >
> > > Also it might be worth to experiment with using neural networks to identify duplicates.
> > >
> > >
> > > target_crash = 'kernel panic: corrupted stack end in wb_workfn'
> > > test commit:
> > >         bad = false;
> > >         skip = true;
> > >         foreach run:
> > >                 run_started, crashed, crash := run_repro();
> > >
> > >                 //kernel built, booted, reproducer launched successfully
> > >                 if (run_started)
> > >                         skip = false;
> > >                 if (crashed && is_duplicates(crash, target_crash))
> > >                         bad = true;
> > >
> > >         if (skip)
> > >                 git bisect skip;
> > >         else if (bad)
> > >                 git bisect bad;
> > >         else
> > >                 git bisect good;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-21  9:51                 ` Dmitry Vyukov
@ 2019-03-21 11:41                   ` Tetsuo Handa
  0 siblings, 0 replies; 14+ messages in thread
From: Tetsuo Handa @ 2019-03-21 11:41 UTC (permalink / raw)
  To: Dmitry Vyukov, Andrey Ryabinin
  Cc: syzbot, Andrew Morton, Qian Cai, David Miller, guro,
	Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On 2019/03/21 18:51, Dmitry Vyukov wrote:
>>> Lots of bugs (half?) manifest differently. On top of this, titles
>>> change as we go back in history. On top of this, if we see a different
>>> bug, it does not mean that the original bug is also not there.
>>> This will sure solve some subset of cases better then the current
>>> logic. But I feel that that subset is smaller then what the current
>>> logic solves.
>>
>> Counter-examples come up in basically every other bisection.
>> For example:
>>
>> bisecting cause commit starting from ccda4af0f4b92f7b4c308d3acc262f4a7e3affad
>> building syzkaller on 5f5f6d14e80b8bd6b42db961118e902387716bcb
>> testing commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad with gcc (GCC) 8.1.0
>> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
>> testing release v4.19
>> testing commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d with gcc (GCC) 8.1.0
>> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
>> testing release v4.18
>> testing commit 94710cac0ef4ee177a63b5227664b38c95bbf703 with gcc (GCC) 8.1.0
>> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
>> testing release v4.17
>> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
>> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
> 
> 
> And to make things even more interesting, this later changes to "BUG:
> unable to handle kernel NULL pointer dereference in vb2_vmalloc_put":
> 
> testing release v4.12
> testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
> all runs: crashed: general protection fault in refcount_sub_and_test
> testing release v4.11
> testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
> all runs: crashed: BUG: unable to handle kernel NULL pointer
> dereference in vb2_vmalloc_put
> 
> And since the original bug is in vb2 subsystem
> (https://syzkaller.appspot.com/bug?id=17535f4bf5b322437f7c639b59161ce343fc55a9),
> it's actually not clear even for me, if we should treat it as the same
> bug or not. May be different manifestation of the same root cause, or
> a different bug around.
> 

Well, maybe we should use reproducers for checking whether each not-yet-fixed
problem is reproducible with old kernels rather than finding specific commit
that is causing specific problem?

I think there are two patterns syzbot starts reporting.

  (a) a commit which causes one or more problems is merged into a codebase where
      syzbot was already testing because syzbot already knew what/how should
      that codebase be tested.

  (b) a commit which causes one or more problems was already there in a codebase
      where syzbot did not know until now what/how should that codebase be tested.

(a) tends to require testing new kernels (i.e. bisection range is narrow) whereas
(b) tends to require testing old kernels (i.e. bisection range is wide).

Regarding case (b), it is difficult for developers to guess when the problem
started, and I think that (b) tends to confuse automatic bisection attempts.

Therefore, instead of trying to find specific commit for specific problem using
"git bisect" approach, try running all reproducers (gathered from all problems)
on each release (e.g. each git tag) and append reproduced crashes to the

  Manager Time Kernel Commit Syzkaller Config Log Report Syz repro C repro Maintainers

table for each not-yet-fixed problem of dashboard interface. That is, if running a
repro1 from problem1 on some old kernel reproduced a crash for problem2, append the
crash to the problem2's table. Maybe we want to use a new table with only

  Kernel Commit Syzkaller Config Log Report Syz repro C repro

entries because what we want to know is the oldest kernel release which helps
guessing when the problem started.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-03-21 11:44 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <000000000000b05d0c057e492e33@google.com>
2019-03-17 20:49 ` kernel panic: corrupted stack end in wb_workfn syzbot
2019-03-19 18:03   ` Xin Long
2019-03-20  9:56   ` Andrey Ryabinin
2019-03-20  9:59     ` Dmitry Vyukov
2019-03-20 10:23       ` Tetsuo Handa
2019-03-20 10:38         ` Dmitry Vyukov
2019-03-20 10:42           ` Dmitry Vyukov
2019-03-20 10:58             ` Tetsuo Handa
2019-03-20 13:59               ` Dmitry Vyukov
2019-03-20 13:34           ` Andrey Ryabinin
2019-03-20 13:57             ` Dmitry Vyukov
2019-03-21  9:45               ` Dmitry Vyukov
2019-03-21  9:51                 ` Dmitry Vyukov
2019-03-21 11:41                   ` Tetsuo Handa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).