* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-20 10:38 ` Dmitry Vyukov
0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 10:38 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
Matthew Wilcox, Xin Long
On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> >> From bisection log:
> >>
> >> testing release v4.17
> >> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> >> run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> >> run #1: crashed: kernel panic: corrupted stack end in worker_thread
> >> run #2: crashed: kernel panic: Out of memory and no killable processes...
> >> run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> >> run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> >> run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> >> run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> >> run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> >> run #8: crashed: kernel panic: Out of memory and no killable processes...
> >> run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> >> testing release v4.16
> >> testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> >> run #0: OK
> >> run #1: OK
> >> run #2: OK
> >> run #3: OK
> >> run #4: OK
> >> run #5: crashed: kernel panic: Out of memory and no killable processes...
> >> run #6: OK
> >> run #7: crashed: kernel panic: Out of memory and no killable processes...
> >> run #8: OK
> >> run #9: OK
> >> testing release v4.15
> >> testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> >> all runs: OK
> >> # git bisect start v4.16 v4.15
> >>
> >> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> >
> > Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > looks like the right range, no?
>
> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> "Stack corruption" can't manifest as "Out of memory and no killable processes".
>
> "kernel panic: Out of memory and no killable processes..." is completely
> unrelated to "kernel panic: corrupted stack end in wb_workfn".
Do you think this predicate is possible to code? Looking at the
examples we have, distinguishing different bugs does not look feasible
to me. If the predicate is not accurate, you just trade one set of
false positives to another set of false positives and then you at the
beginning of an infinite slippery slope refining it.
Also, if we see a different bug (assuming we can distinguish them),
does it mean that the original bug is not present? Or it's also
present, but we just hit the other one first? This also does not look
feasible to answer. And if you give a wrong answer, bisection goes the
wrong way and we are where we started. Just with more complex code and
things being even harder to explain to other people.
I mean, yes, I agree, kernel bug bisection won't be perfect. But do
you see anything actionable here?
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-20 10:38 ` Dmitry Vyukov
0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 10:38 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
Matthew Wilcox, Xin Long
On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> >> From bisection log:
> >>
> >> testing release v4.17
> >> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> >> run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> >> run #1: crashed: kernel panic: corrupted stack end in worker_thread
> >> run #2: crashed: kernel panic: Out of memory and no killable processes...
> >> run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> >> run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> >> run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> >> run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> >> run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> >> run #8: crashed: kernel panic: Out of memory and no killable processes...
> >> run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> >> testing release v4.16
> >> testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> >> run #0: OK
> >> run #1: OK
> >> run #2: OK
> >> run #3: OK
> >> run #4: OK
> >> run #5: crashed: kernel panic: Out of memory and no killable processes...
> >> run #6: OK
> >> run #7: crashed: kernel panic: Out of memory and no killable processes...
> >> run #8: OK
> >> run #9: OK
> >> testing release v4.15
> >> testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> >> all runs: OK
> >> # git bisect start v4.16 v4.15
> >>
> >> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> >
> > Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > looks like the right range, no?
>
> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> "Stack corruption" can't manifest as "Out of memory and no killable processes".
>
> "kernel panic: Out of memory and no killable processes..." is completely
> unrelated to "kernel panic: corrupted stack end in wb_workfn".
Do you think this predicate is possible to code? Looking at the
examples we have, distinguishing different bugs does not look feasible
to me. If the predicate is not accurate, you just trade one set of
false positives to another set of false positives and then you at the
beginning of an infinite slippery slope refining it.
Also, if we see a different bug (assuming we can distinguish them),
does it mean that the original bug is not present? Or it's also
present, but we just hit the other one first? This also does not look
feasible to answer. And if you give a wrong answer, bisection goes the
wrong way and we are where we started. Just with more complex code and
things being even harder to explain to other people.
I mean, yes, I agree, kernel bug bisection won't be perfect. But do
you see anything actionable here?
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: kernel panic: corrupted stack end in wb_workfn
2019-03-20 10:38 ` Dmitry Vyukov
(?)
@ 2019-03-20 10:42 ` Dmitry Vyukov
-1 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 10:42 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
Matthew Wilcox, Xin Long
On Wed, Mar 20, 2019 at 11:38 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >
> > On 2019/03/20 18:59, Dmitry Vyukov wrote:
> > >> From bisection log:
> > >>
> > >> testing release v4.17
> > >> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> > >> run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> > >> run #1: crashed: kernel panic: corrupted stack end in worker_thread
> > >> run #2: crashed: kernel panic: Out of memory and no killable processes...
> > >> run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> > >> run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> > >> run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> > >> run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> > >> run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> > >> run #8: crashed: kernel panic: Out of memory and no killable processes...
> > >> run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> > >> testing release v4.16
> > >> testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> > >> run #0: OK
> > >> run #1: OK
> > >> run #2: OK
> > >> run #3: OK
> > >> run #4: OK
> > >> run #5: crashed: kernel panic: Out of memory and no killable processes...
> > >> run #6: OK
> > >> run #7: crashed: kernel panic: Out of memory and no killable processes...
> > >> run #8: OK
> > >> run #9: OK
> > >> testing release v4.15
> > >> testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> > >> all runs: OK
> > >> # git bisect start v4.16 v4.15
> > >>
> > >> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> > >
> > > Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > > looks like the right range, no?
> >
> > No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> > "Stack corruption" can't manifest as "Out of memory and no killable processes".
> >
> > "kernel panic: Out of memory and no killable processes..." is completely
> > unrelated to "kernel panic: corrupted stack end in wb_workfn".
>
>
> Do you think this predicate is possible to code? Looking at the
> examples we have, distinguishing different bugs does not look feasible
> to me. If the predicate is not accurate, you just trade one set of
> false positives to another set of false positives and then you at the
> beginning of an infinite slippery slope refining it.
> Also, if we see a different bug (assuming we can distinguish them),
> does it mean that the original bug is not present? Or it's also
> present, but we just hit the other one first? This also does not look
> feasible to answer. And if you give a wrong answer, bisection goes the
> wrong way and we are where we started. Just with more complex code and
> things being even harder to explain to other people.
> I mean, yes, I agree, kernel bug bisection won't be perfect. But do
> you see anything actionable here?
I see the larger long term bisection quality improvement (for syzbot
and for everybody else) in doing some actual testing for each kernel
commit before it's being merged into any kernel tree, so that we have
less of these a single program triggers 3 different bugs, stray
unrelated bugs, broken release boots, etc. I don't see how reliable
bisection is possible without that.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-20 10:42 ` Dmitry Vyukov
0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 10:42 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
Matthew Wilcox, Xin Long
On Wed, Mar 20, 2019 at 11:38 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >
> > On 2019/03/20 18:59, Dmitry Vyukov wrote:
> > >> From bisection log:
> > >>
> > >> testing release v4.17
> > >> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> > >> run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> > >> run #1: crashed: kernel panic: corrupted stack end in worker_thread
> > >> run #2: crashed: kernel panic: Out of memory and no killable processes...
> > >> run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> > >> run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> > >> run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> > >> run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> > >> run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> > >> run #8: crashed: kernel panic: Out of memory and no killable processes...
> > >> run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> > >> testing release v4.16
> > >> testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> > >> run #0: OK
> > >> run #1: OK
> > >> run #2: OK
> > >> run #3: OK
> > >> run #4: OK
> > >> run #5: crashed: kernel panic: Out of memory and no killable processes...
> > >> run #6: OK
> > >> run #7: crashed: kernel panic: Out of memory and no killable processes...
> > >> run #8: OK
> > >> run #9: OK
> > >> testing release v4.15
> > >> testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> > >> all runs: OK
> > >> # git bisect start v4.16 v4.15
> > >>
> > >> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> > >
> > > Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > > looks like the right range, no?
> >
> > No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> > "Stack corruption" can't manifest as "Out of memory and no killable processes".
> >
> > "kernel panic: Out of memory and no killable processes..." is completely
> > unrelated to "kernel panic: corrupted stack end in wb_workfn".
>
>
> Do you think this predicate is possible to code? Looking at the
> examples we have, distinguishing different bugs does not look feasible
> to me. If the predicate is not accurate, you just trade one set of
> false positives to another set of false positives and then you at the
> beginning of an infinite slippery slope refining it.
> Also, if we see a different bug (assuming we can distinguish them),
> does it mean that the original bug is not present? Or it's also
> present, but we just hit the other one first? This also does not look
> feasible to answer. And if you give a wrong answer, bisection goes the
> wrong way and we are where we started. Just with more complex code and
> things being even harder to explain to other people.
> I mean, yes, I agree, kernel bug bisection won't be perfect. But do
> you see anything actionable here?
I see the larger long term bisection quality improvement (for syzbot
and for everybody else) in doing some actual testing for each kernel
commit before it's being merged into any kernel tree, so that we have
less of these a single program triggers 3 different bugs, stray
unrelated bugs, broken release boots, etc. I don't see how reliable
bisection is possible without that.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-20 10:42 ` Dmitry Vyukov
0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 10:42 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
Matthew Wilcox, Xin Long
On Wed, Mar 20, 2019 at 11:38 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >
> > On 2019/03/20 18:59, Dmitry Vyukov wrote:
> > >> From bisection log:
> > >>
> > >> testing release v4.17
> > >> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> > >> run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> > >> run #1: crashed: kernel panic: corrupted stack end in worker_thread
> > >> run #2: crashed: kernel panic: Out of memory and no killable processes...
> > >> run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> > >> run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> > >> run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> > >> run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> > >> run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> > >> run #8: crashed: kernel panic: Out of memory and no killable processes...
> > >> run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> > >> testing release v4.16
> > >> testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> > >> run #0: OK
> > >> run #1: OK
> > >> run #2: OK
> > >> run #3: OK
> > >> run #4: OK
> > >> run #5: crashed: kernel panic: Out of memory and no killable processes...
> > >> run #6: OK
> > >> run #7: crashed: kernel panic: Out of memory and no killable processes...
> > >> run #8: OK
> > >> run #9: OK
> > >> testing release v4.15
> > >> testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> > >> all runs: OK
> > >> # git bisect start v4.16 v4.15
> > >>
> > >> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> > >
> > > Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > > looks like the right range, no?
> >
> > No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> > "Stack corruption" can't manifest as "Out of memory and no killable processes".
> >
> > "kernel panic: Out of memory and no killable processes..." is completely
> > unrelated to "kernel panic: corrupted stack end in wb_workfn".
>
>
> Do you think this predicate is possible to code? Looking at the
> examples we have, distinguishing different bugs does not look feasible
> to me. If the predicate is not accurate, you just trade one set of
> false positives to another set of false positives and then you at the
> beginning of an infinite slippery slope refining it.
> Also, if we see a different bug (assuming we can distinguish them),
> does it mean that the original bug is not present? Or it's also
> present, but we just hit the other one first? This also does not look
> feasible to answer. And if you give a wrong answer, bisection goes the
> wrong way and we are where we started. Just with more complex code and
> things being even harder to explain to other people.
> I mean, yes, I agree, kernel bug bisection won't be perfect. But do
> you see anything actionable here?
I see the larger long term bisection quality improvement (for syzbot
and for everybody else) in doing some actual testing for each kernel
commit before it's being merged into any kernel tree, so that we have
less of these a single program triggers 3 different bugs, stray
unrelated bugs, broken release boots, etc. I don't see how reliable
bisection is possible without that.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: kernel panic: corrupted stack end in wb_workfn
2019-03-20 10:42 ` Dmitry Vyukov
@ 2019-03-20 10:58 ` Tetsuo Handa
-1 siblings, 0 replies; 42+ messages in thread
From: Tetsuo Handa @ 2019-03-20 10:58 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
Matthew Wilcox, Xin Long
On 2019/03/20 19:42, Dmitry Vyukov wrote:
>> I mean, yes, I agree, kernel bug bisection won't be perfect. But do
>> you see anything actionable here?
Allow users to manually tell bisection range when
automatic bisection found a wrong commit.
Also, allow users to specify reproducer program
when automatic bisection found a wrong commit.
Yes, this is anti automation. But since automation can't become perfect,
I'm suggesting manual adjustment. Even if we involve manual adjustment,
the syzbot's plenty CPU resources for building/testing kernels is highly
appreciated (compared to doing manual bisection by building/testing kernels
on personal PC environments).
>
> I see the larger long term bisection quality improvement (for syzbot
> and for everybody else) in doing some actual testing for each kernel
> commit before it's being merged into any kernel tree, so that we have
> less of these a single program triggers 3 different bugs, stray
> unrelated bugs, broken release boots, etc. I don't see how reliable
> bisection is possible without that.
>
syzbot currently cannot test kernels with custom patches (unless "#syz test:" requests).
Are you saying that syzbot will become be able to test kernels with custom patches?
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-20 10:58 ` Tetsuo Handa
0 siblings, 0 replies; 42+ messages in thread
From: Tetsuo Handa @ 2019-03-20 10:58 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
Matthew Wilcox, Xin Long
On 2019/03/20 19:42, Dmitry Vyukov wrote:
>> I mean, yes, I agree, kernel bug bisection won't be perfect. But do
>> you see anything actionable here?
Allow users to manually tell bisection range when
automatic bisection found a wrong commit.
Also, allow users to specify reproducer program
when automatic bisection found a wrong commit.
Yes, this is anti automation. But since automation can't become perfect,
I'm suggesting manual adjustment. Even if we involve manual adjustment,
the syzbot's plenty CPU resources for building/testing kernels is highly
appreciated (compared to doing manual bisection by building/testing kernels
on personal PC environments).
>
> I see the larger long term bisection quality improvement (for syzbot
> and for everybody else) in doing some actual testing for each kernel
> commit before it's being merged into any kernel tree, so that we have
> less of these a single program triggers 3 different bugs, stray
> unrelated bugs, broken release boots, etc. I don't see how reliable
> bisection is possible without that.
>
syzbot currently cannot test kernels with custom patches (unless "#syz test:" requests).
Are you saying that syzbot will become be able to test kernels with custom patches?
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: kernel panic: corrupted stack end in wb_workfn
2019-03-20 10:58 ` Tetsuo Handa
(?)
@ 2019-03-20 13:59 ` Dmitry Vyukov
-1 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 13:59 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
Matthew Wilcox, Xin Long
On Wed, Mar 20, 2019 at 11:59 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2019/03/20 19:42, Dmitry Vyukov wrote:
> >> I mean, yes, I agree, kernel bug bisection won't be perfect. But do
> >> you see anything actionable here?
>
> Allow users to manually tell bisection range when
> automatic bisection found a wrong commit.
>
> Also, allow users to specify reproducer program
> when automatic bisection found a wrong commit.
>
> Yes, this is anti automation. But since automation can't become perfect,
> I'm suggesting manual adjustment. Even if we involve manual adjustment,
> the syzbot's plenty CPU resources for building/testing kernels is highly
> appreciated (compared to doing manual bisection by building/testing kernels
> on personal PC environments).
FTR: provided an extended answer here:
https://groups.google.com/d/msg/syzkaller-bugs/1BSkmb_fawo/DOcDxv_KAgAJ
> > I see the larger long term bisection quality improvement (for syzbot
> > and for everybody else) in doing some actual testing for each kernel
> > commit before it's being merged into any kernel tree, so that we have
> > less of these a single program triggers 3 different bugs, stray
> > unrelated bugs, broken release boots, etc. I don't see how reliable
> > bisection is possible without that.
> >
>
> syzbot currently cannot test kernels with custom patches (unless "#syz test:" requests).
> Are you saying that syzbot will become be able to test kernels with custom patches?
I mean if we start improving kernel quality over time so that we have
less of these a single program triggers 3 different bugs, stray
unrelated bugs, broken release boots, etc, it will improve bisection
quality for everybody (beside being hugely useful in itself).
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-20 13:59 ` Dmitry Vyukov
0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 13:59 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
Matthew Wilcox, Xin Long
On Wed, Mar 20, 2019 at 11:59 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2019/03/20 19:42, Dmitry Vyukov wrote:
> >> I mean, yes, I agree, kernel bug bisection won't be perfect. But do
> >> you see anything actionable here?
>
> Allow users to manually tell bisection range when
> automatic bisection found a wrong commit.
>
> Also, allow users to specify reproducer program
> when automatic bisection found a wrong commit.
>
> Yes, this is anti automation. But since automation can't become perfect,
> I'm suggesting manual adjustment. Even if we involve manual adjustment,
> the syzbot's plenty CPU resources for building/testing kernels is highly
> appreciated (compared to doing manual bisection by building/testing kernels
> on personal PC environments).
FTR: provided an extended answer here:
https://groups.google.com/d/msg/syzkaller-bugs/1BSkmb_fawo/DOcDxv_KAgAJ
> > I see the larger long term bisection quality improvement (for syzbot
> > and for everybody else) in doing some actual testing for each kernel
> > commit before it's being merged into any kernel tree, so that we have
> > less of these a single program triggers 3 different bugs, stray
> > unrelated bugs, broken release boots, etc. I don't see how reliable
> > bisection is possible without that.
> >
>
> syzbot currently cannot test kernels with custom patches (unless "#syz test:" requests).
> Are you saying that syzbot will become be able to test kernels with custom patches?
I mean if we start improving kernel quality over time so that we have
less of these a single program triggers 3 different bugs, stray
unrelated bugs, broken release boots, etc, it will improve bisection
quality for everybody (beside being hugely useful in itself).
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-20 13:59 ` Dmitry Vyukov
0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 13:59 UTC (permalink / raw)
To: Tetsuo Handa
Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
Matthew Wilcox, Xin Long
On Wed, Mar 20, 2019 at 11:59 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2019/03/20 19:42, Dmitry Vyukov wrote:
> >> I mean, yes, I agree, kernel bug bisection won't be perfect. But do
> >> you see anything actionable here?
>
> Allow users to manually tell bisection range when
> automatic bisection found a wrong commit.
>
> Also, allow users to specify reproducer program
> when automatic bisection found a wrong commit.
>
> Yes, this is anti automation. But since automation can't become perfect,
> I'm suggesting manual adjustment. Even if we involve manual adjustment,
> the syzbot's plenty CPU resources for building/testing kernels is highly
> appreciated (compared to doing manual bisection by building/testing kernels
> on personal PC environments).
FTR: provided an extended answer here:
https://groups.google.com/d/msg/syzkaller-bugs/1BSkmb_fawo/DOcDxv_KAgAJ
> > I see the larger long term bisection quality improvement (for syzbot
> > and for everybody else) in doing some actual testing for each kernel
> > commit before it's being merged into any kernel tree, so that we have
> > less of these a single program triggers 3 different bugs, stray
> > unrelated bugs, broken release boots, etc. I don't see how reliable
> > bisection is possible without that.
> >
>
> syzbot currently cannot test kernels with custom patches (unless "#syz test:" requests).
> Are you saying that syzbot will become be able to test kernels with custom patches?
I mean if we start improving kernel quality over time so that we have
less of these a single program triggers 3 different bugs, stray
unrelated bugs, broken release boots, etc, it will improve bisection
quality for everybody (beside being hugely useful in itself).
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: kernel panic: corrupted stack end in wb_workfn
2019-03-20 10:38 ` Dmitry Vyukov
@ 2019-03-20 13:34 ` Andrey Ryabinin
-1 siblings, 0 replies; 42+ messages in thread
From: Andrey Ryabinin @ 2019-03-20 13:34 UTC (permalink / raw)
To: Dmitry Vyukov, Tetsuo Handa
Cc: syzbot, Andrew Morton, Qian Cai, David Miller, guro,
Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
Matthew Wilcox, Xin Long
On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>
>> On 2019/03/20 18:59, Dmitry Vyukov wrote:
>>>> From bisection log:
>>>>
>>>> testing release v4.17
>>>> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
>>>> run #0: crashed: kernel panic: corrupted stack end in wb_workfn
>>>> run #1: crashed: kernel panic: corrupted stack end in worker_thread
>>>> run #2: crashed: kernel panic: Out of memory and no killable processes...
>>>> run #3: crashed: kernel panic: corrupted stack end in wb_workfn
>>>> run #4: crashed: kernel panic: corrupted stack end in wb_workfn
>>>> run #5: crashed: kernel panic: corrupted stack end in wb_workfn
>>>> run #6: crashed: kernel panic: corrupted stack end in wb_workfn
>>>> run #7: crashed: kernel panic: corrupted stack end in wb_workfn
>>>> run #8: crashed: kernel panic: Out of memory and no killable processes...
>>>> run #9: crashed: kernel panic: corrupted stack end in wb_workfn
>>>> testing release v4.16
>>>> testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
>>>> run #0: OK
>>>> run #1: OK
>>>> run #2: OK
>>>> run #3: OK
>>>> run #4: OK
>>>> run #5: crashed: kernel panic: Out of memory and no killable processes...
>>>> run #6: OK
>>>> run #7: crashed: kernel panic: Out of memory and no killable processes...
>>>> run #8: OK
>>>> run #9: OK
>>>> testing release v4.15
>>>> testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
>>>> all runs: OK
>>>> # git bisect start v4.16 v4.15
>>>>
>>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
>>>
>>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
>>> looks like the right range, no?
>>
>> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
>> "Stack corruption" can't manifest as "Out of memory and no killable processes".
>>
>> "kernel panic: Out of memory and no killable processes..." is completely
>> unrelated to "kernel panic: corrupted stack end in wb_workfn".
>
>
> Do you think this predicate is possible to code?
Something like bellow probably would work better than current behavior.
For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.
syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.
Also it might be worth to experiment with using neural networks to identify duplicates.
target_crash = 'kernel panic: corrupted stack end in wb_workfn'
test commit:
bad = false;
skip = true;
foreach run:
run_started, crashed, crash := run_repro();
//kernel built, booted, reproducer launched successfully
if (run_started)
skip = false;
if (crashed && is_duplicates(crash, target_crash))
bad = true;
if (skip)
git bisect skip;
else if (bad)
git bisect bad;
else
git bisect good;
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-20 13:34 ` Andrey Ryabinin
0 siblings, 0 replies; 42+ messages in thread
From: Andrey Ryabinin @ 2019-03-20 13:34 UTC (permalink / raw)
To: Dmitry Vyukov, Tetsuo Handa
Cc: syzbot, Andrew Morton, Qian Cai, David Miller, guro,
Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
Matthew Wilcox, Xin Long
On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>
>> On 2019/03/20 18:59, Dmitry Vyukov wrote:
>>>> From bisection log:
>>>>
>>>> testing release v4.17
>>>> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
>>>> run #0: crashed: kernel panic: corrupted stack end in wb_workfn
>>>> run #1: crashed: kernel panic: corrupted stack end in worker_thread
>>>> run #2: crashed: kernel panic: Out of memory and no killable processes...
>>>> run #3: crashed: kernel panic: corrupted stack end in wb_workfn
>>>> run #4: crashed: kernel panic: corrupted stack end in wb_workfn
>>>> run #5: crashed: kernel panic: corrupted stack end in wb_workfn
>>>> run #6: crashed: kernel panic: corrupted stack end in wb_workfn
>>>> run #7: crashed: kernel panic: corrupted stack end in wb_workfn
>>>> run #8: crashed: kernel panic: Out of memory and no killable processes...
>>>> run #9: crashed: kernel panic: corrupted stack end in wb_workfn
>>>> testing release v4.16
>>>> testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
>>>> run #0: OK
>>>> run #1: OK
>>>> run #2: OK
>>>> run #3: OK
>>>> run #4: OK
>>>> run #5: crashed: kernel panic: Out of memory and no killable processes...
>>>> run #6: OK
>>>> run #7: crashed: kernel panic: Out of memory and no killable processes...
>>>> run #8: OK
>>>> run #9: OK
>>>> testing release v4.15
>>>> testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
>>>> all runs: OK
>>>> # git bisect start v4.16 v4.15
>>>>
>>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
>>>
>>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
>>> looks like the right range, no?
>>
>> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
>> "Stack corruption" can't manifest as "Out of memory and no killable processes".
>>
>> "kernel panic: Out of memory and no killable processes..." is completely
>> unrelated to "kernel panic: corrupted stack end in wb_workfn".
>
>
> Do you think this predicate is possible to code?
Something like bellow probably would work better than current behavior.
For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.
syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.
Also it might be worth to experiment with using neural networks to identify duplicates.
target_crash = 'kernel panic: corrupted stack end in wb_workfn'
test commit:
bad = false;
skip = true;
foreach run:
run_started, crashed, crash := run_repro();
//kernel built, booted, reproducer launched successfully
if (run_started)
skip = false;
if (crashed && is_duplicates(crash, target_crash))
bad = true;
if (skip)
git bisect skip;
else if (bad)
git bisect bad;
else
git bisect good;
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: kernel panic: corrupted stack end in wb_workfn
2019-03-20 13:34 ` Andrey Ryabinin
(?)
@ 2019-03-20 13:57 ` Dmitry Vyukov
-1 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 13:57 UTC (permalink / raw)
To: Andrey Ryabinin
Cc: Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller,
guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
Matthew Wilcox, Xin Long
On Wed, Mar 20, 2019 at 2:33 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
>
>
>
> On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> > On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >>
> >> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> >>>> From bisection log:
> >>>>
> >>>> testing release v4.17
> >>>> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> >>>> run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>> run #1: crashed: kernel panic: corrupted stack end in worker_thread
> >>>> run #2: crashed: kernel panic: Out of memory and no killable processes...
> >>>> run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>> run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>> run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>> run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>> run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>> run #8: crashed: kernel panic: Out of memory and no killable processes...
> >>>> run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>> testing release v4.16
> >>>> testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> >>>> run #0: OK
> >>>> run #1: OK
> >>>> run #2: OK
> >>>> run #3: OK
> >>>> run #4: OK
> >>>> run #5: crashed: kernel panic: Out of memory and no killable processes...
> >>>> run #6: OK
> >>>> run #7: crashed: kernel panic: Out of memory and no killable processes...
> >>>> run #8: OK
> >>>> run #9: OK
> >>>> testing release v4.15
> >>>> testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> >>>> all runs: OK
> >>>> # git bisect start v4.16 v4.15
> >>>>
> >>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> >>>
> >>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> >>> looks like the right range, no?
> >>
> >> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> >> "Stack corruption" can't manifest as "Out of memory and no killable processes".
> >>
> >> "kernel panic: Out of memory and no killable processes..." is completely
> >> unrelated to "kernel panic: corrupted stack end in wb_workfn".
> >
> >
> > Do you think this predicate is possible to code?
>
> Something like bellow probably would work better than current behavior.
>
> For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.
Lots of bugs (half?) manifest differently. On top of this, titles
change as we go back in history. On top of this, if we see a different
bug, it does not mean that the original bug is also not there.
This will sure solve some subset of cases better then the current
logic. But I feel that that subset is smaller then what the current
logic solves.
> syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.
This is very limited set of info. And in the end I think we've seen
all bug types being duped on all other bugs types pair-wise, and at
the same time we've seen all bug types being not dups to all other bug
types. So I don't see where this gets us.
And again as we go back in history all these titles change.
> Also it might be worth to experiment with using neural networks to identify duplicates.
>
>
> target_crash = 'kernel panic: corrupted stack end in wb_workfn'
> test commit:
> bad = false;
> skip = true;
> foreach run:
> run_started, crashed, crash := run_repro();
>
> //kernel built, booted, reproducer launched successfully
> if (run_started)
> skip = false;
> if (crashed && is_duplicates(crash, target_crash))
> bad = true;
>
> if (skip)
> git bisect skip;
> else if (bad)
> git bisect bad;
> else
> git bisect good;
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-20 13:57 ` Dmitry Vyukov
0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 13:57 UTC (permalink / raw)
To: Andrey Ryabinin
Cc: Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller,
guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
Matthew Wilcox, Xin Long
On Wed, Mar 20, 2019 at 2:33 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
>
>
>
> On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> > On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >>
> >> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> >>>> From bisection log:
> >>>>
> >>>> testing release v4.17
> >>>> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> >>>> run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>> run #1: crashed: kernel panic: corrupted stack end in worker_thread
> >>>> run #2: crashed: kernel panic: Out of memory and no killable processes...
> >>>> run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>> run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>> run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>> run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>> run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>> run #8: crashed: kernel panic: Out of memory and no killable processes...
> >>>> run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>> testing release v4.16
> >>>> testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> >>>> run #0: OK
> >>>> run #1: OK
> >>>> run #2: OK
> >>>> run #3: OK
> >>>> run #4: OK
> >>>> run #5: crashed: kernel panic: Out of memory and no killable processes...
> >>>> run #6: OK
> >>>> run #7: crashed: kernel panic: Out of memory and no killable processes...
> >>>> run #8: OK
> >>>> run #9: OK
> >>>> testing release v4.15
> >>>> testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> >>>> all runs: OK
> >>>> # git bisect start v4.16 v4.15
> >>>>
> >>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> >>>
> >>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> >>> looks like the right range, no?
> >>
> >> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> >> "Stack corruption" can't manifest as "Out of memory and no killable processes".
> >>
> >> "kernel panic: Out of memory and no killable processes..." is completely
> >> unrelated to "kernel panic: corrupted stack end in wb_workfn".
> >
> >
> > Do you think this predicate is possible to code?
>
> Something like bellow probably would work better than current behavior.
>
> For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.
Lots of bugs (half?) manifest differently. On top of this, titles
change as we go back in history. On top of this, if we see a different
bug, it does not mean that the original bug is also not there.
This will sure solve some subset of cases better then the current
logic. But I feel that that subset is smaller then what the current
logic solves.
> syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.
This is very limited set of info. And in the end I think we've seen
all bug types being duped on all other bugs types pair-wise, and at
the same time we've seen all bug types being not dups to all other bug
types. So I don't see where this gets us.
And again as we go back in history all these titles change.
> Also it might be worth to experiment with using neural networks to identify duplicates.
>
>
> target_crash = 'kernel panic: corrupted stack end in wb_workfn'
> test commit:
> bad = false;
> skip = true;
> foreach run:
> run_started, crashed, crash := run_repro();
>
> //kernel built, booted, reproducer launched successfully
> if (run_started)
> skip = false;
> if (crashed && is_duplicates(crash, target_crash))
> bad = true;
>
> if (skip)
> git bisect skip;
> else if (bad)
> git bisect bad;
> else
> git bisect good;
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-20 13:57 ` Dmitry Vyukov
0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 13:57 UTC (permalink / raw)
To: Andrey Ryabinin
Cc: Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller,
guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
Matthew Wilcox, Xin Long
On Wed, Mar 20, 2019 at 2:33 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
>
>
>
> On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> > On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >>
> >> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> >>>> From bisection log:
> >>>>
> >>>> testing release v4.17
> >>>> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> >>>> run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>> run #1: crashed: kernel panic: corrupted stack end in worker_thread
> >>>> run #2: crashed: kernel panic: Out of memory and no killable processes...
> >>>> run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>> run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>> run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>> run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>> run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>> run #8: crashed: kernel panic: Out of memory and no killable processes...
> >>>> run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>> testing release v4.16
> >>>> testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> >>>> run #0: OK
> >>>> run #1: OK
> >>>> run #2: OK
> >>>> run #3: OK
> >>>> run #4: OK
> >>>> run #5: crashed: kernel panic: Out of memory and no killable processes...
> >>>> run #6: OK
> >>>> run #7: crashed: kernel panic: Out of memory and no killable processes...
> >>>> run #8: OK
> >>>> run #9: OK
> >>>> testing release v4.15
> >>>> testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> >>>> all runs: OK
> >>>> # git bisect start v4.16 v4.15
> >>>>
> >>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> >>>
> >>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> >>> looks like the right range, no?
> >>
> >> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> >> "Stack corruption" can't manifest as "Out of memory and no killable processes".
> >>
> >> "kernel panic: Out of memory and no killable processes..." is completely
> >> unrelated to "kernel panic: corrupted stack end in wb_workfn".
> >
> >
> > Do you think this predicate is possible to code?
>
> Something like bellow probably would work better than current behavior.
>
> For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.
Lots of bugs (half?) manifest differently. On top of this, titles
change as we go back in history. On top of this, if we see a different
bug, it does not mean that the original bug is also not there.
This will sure solve some subset of cases better then the current
logic. But I feel that that subset is smaller then what the current
logic solves.
> syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.
This is very limited set of info. And in the end I think we've seen
all bug types being duped on all other bugs types pair-wise, and at
the same time we've seen all bug types being not dups to all other bug
types. So I don't see where this gets us.
And again as we go back in history all these titles change.
> Also it might be worth to experiment with using neural networks to identify duplicates.
>
>
> target_crash = 'kernel panic: corrupted stack end in wb_workfn'
> test commit:
> bad = false;
> skip = true;
> foreach run:
> run_started, crashed, crash := run_repro();
>
> //kernel built, booted, reproducer launched successfully
> if (run_started)
> skip = false;
> if (crashed && is_duplicates(crash, target_crash))
> bad = true;
>
> if (skip)
> git bisect skip;
> else if (bad)
> git bisect bad;
> else
> git bisect good;
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: kernel panic: corrupted stack end in wb_workfn
2019-03-20 13:57 ` Dmitry Vyukov
(?)
@ 2019-03-21 9:45 ` Dmitry Vyukov
-1 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-21 9:45 UTC (permalink / raw)
To: Andrey Ryabinin
Cc: Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller,
guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
Matthew Wilcox, Xin Long
On Wed, Mar 20, 2019 at 2:57 PM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Mar 20, 2019 at 2:33 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
> >
> >
> >
> > On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> > > On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> > > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > >>
> > >> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> > >>>> From bisection log:
> > >>>>
> > >>>> testing release v4.17
> > >>>> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> > >>>> run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>> run #1: crashed: kernel panic: corrupted stack end in worker_thread
> > >>>> run #2: crashed: kernel panic: Out of memory and no killable processes...
> > >>>> run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>> run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>> run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>> run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>> run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>> run #8: crashed: kernel panic: Out of memory and no killable processes...
> > >>>> run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>> testing release v4.16
> > >>>> testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> > >>>> run #0: OK
> > >>>> run #1: OK
> > >>>> run #2: OK
> > >>>> run #3: OK
> > >>>> run #4: OK
> > >>>> run #5: crashed: kernel panic: Out of memory and no killable processes...
> > >>>> run #6: OK
> > >>>> run #7: crashed: kernel panic: Out of memory and no killable processes...
> > >>>> run #8: OK
> > >>>> run #9: OK
> > >>>> testing release v4.15
> > >>>> testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> > >>>> all runs: OK
> > >>>> # git bisect start v4.16 v4.15
> > >>>>
> > >>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> > >>>
> > >>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > >>> looks like the right range, no?
> > >>
> > >> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> > >> "Stack corruption" can't manifest as "Out of memory and no killable processes".
> > >>
> > >> "kernel panic: Out of memory and no killable processes..." is completely
> > >> unrelated to "kernel panic: corrupted stack end in wb_workfn".
> > >
> > >
> > > Do you think this predicate is possible to code?
> >
> > Something like bellow probably would work better than current behavior.
> >
> > For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.
>
> Lots of bugs (half?) manifest differently. On top of this, titles
> change as we go back in history. On top of this, if we see a different
> bug, it does not mean that the original bug is also not there.
> This will sure solve some subset of cases better then the current
> logic. But I feel that that subset is smaller then what the current
> logic solves.
Counter-examples come up in basically every other bisection.
For example:
bisecting cause commit starting from ccda4af0f4b92f7b4c308d3acc262f4a7e3affad
building syzkaller on 5f5f6d14e80b8bd6b42db961118e902387716bcb
testing commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
testing release v4.19
testing commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
testing release v4.18
testing commit 94710cac0ef4ee177a63b5227664b38c95bbf703 with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
testing release v4.17
testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
That's a different crash title, unless somebody explicitly code this case.
Or, what crash is this?
testing commit 52358cb5a310990ea5069f986bdab3620e01181f with gcc (GCC) 8.1.0
run #1: crashed: general protection fault in cpuacct_charge
run #2: crashed: WARNING: suspicious RCU usage in corrupted
run #3: crashed: general protection fault in cpuacct_charge
run #4: crashed: BUG: unable to handle kernel paging request in ipt_do_table
run #5: crashed: KASAN: stack-out-of-bounds Read in cpuacct_charge
run #6: crashed: WARNING: suspicious RCU usage
run #7: crashed: no output from test machine
run #8: crashed: no output from test machine
Or, that "INFO: trying to register non-static key in can_notifier"
does not do any testing, but is "WARNING in dma_buf_vunmap" still
there or not?
testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
all runs: crashed: WARNING in dma_buf_vunmap
testing release v4.11
testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
all runs: OK
# git bisect start v4.12 v4.11
Bisecting: 7831 revisions left to test after this (roughly 13 steps)
[2bd80401743568ced7d303b008ae5298ce77e695] Merge tag 'gpio-v4.12-1' of
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
testing commit 2bd80401743568ced7d303b008ae5298ce77e695 with gcc (GCC) 7.3.0
all runs: crashed: INFO: trying to register non-static key in can_notifier
# git bisect bad 2bd80401743568ced7d303b008ae5298ce77e695
Bisecting: 3853 revisions left to test after this (roughly 12 steps)
[8d65b08debc7e62b2c6032d7fe7389d895b92cbc] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
testing commit 8d65b08debc7e62b2c6032d7fe7389d895b92cbc with gcc (GCC) 7.3.0
all runs: crashed: INFO: trying to register non-static key in can_notifier
# git bisect bad 8d65b08debc7e62b2c6032d7fe7389d895b92cbc
Bisecting: 2022 revisions left to test after this (roughly 11 steps)
[cec381919818a9a0cb85600b3c82404bdd38cf36] Merge tag
'mac80211-next-for-davem-2017-04-28' of
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
testing commit cec381919818a9a0cb85600b3c82404bdd38cf36 with gcc (GCC) 5.5.0
all runs: crashed: INFO: trying to register non-static key in can_notifier
> > syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.
>
> This is very limited set of info. And in the end I think we've seen
> all bug types being duped on all other bugs types pair-wise, and at
> the same time we've seen all bug types being not dups to all other bug
> types. So I don't see where this gets us.
> And again as we go back in history all these titles change.
>
> > Also it might be worth to experiment with using neural networks to identify duplicates.
> >
> >
> > target_crash = 'kernel panic: corrupted stack end in wb_workfn'
> > test commit:
> > bad = false;
> > skip = true;
> > foreach run:
> > run_started, crashed, crash := run_repro();
> >
> > //kernel built, booted, reproducer launched successfully
> > if (run_started)
> > skip = false;
> > if (crashed && is_duplicates(crash, target_crash))
> > bad = true;
> >
> > if (skip)
> > git bisect skip;
> > else if (bad)
> > git bisect bad;
> > else
> > git bisect good;
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-21 9:45 ` Dmitry Vyukov
0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-21 9:45 UTC (permalink / raw)
To: Andrey Ryabinin
Cc: Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller,
guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
Matthew Wilcox, Xin Long
On Wed, Mar 20, 2019 at 2:57 PM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Mar 20, 2019 at 2:33 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
> >
> >
> >
> > On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> > > On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> > > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > >>
> > >> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> > >>>> From bisection log:
> > >>>>
> > >>>> testing release v4.17
> > >>>> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> > >>>> run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>> run #1: crashed: kernel panic: corrupted stack end in worker_thread
> > >>>> run #2: crashed: kernel panic: Out of memory and no killable processes...
> > >>>> run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>> run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>> run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>> run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>> run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>> run #8: crashed: kernel panic: Out of memory and no killable processes...
> > >>>> run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>> testing release v4.16
> > >>>> testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> > >>>> run #0: OK
> > >>>> run #1: OK
> > >>>> run #2: OK
> > >>>> run #3: OK
> > >>>> run #4: OK
> > >>>> run #5: crashed: kernel panic: Out of memory and no killable processes...
> > >>>> run #6: OK
> > >>>> run #7: crashed: kernel panic: Out of memory and no killable processes...
> > >>>> run #8: OK
> > >>>> run #9: OK
> > >>>> testing release v4.15
> > >>>> testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> > >>>> all runs: OK
> > >>>> # git bisect start v4.16 v4.15
> > >>>>
> > >>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> > >>>
> > >>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > >>> looks like the right range, no?
> > >>
> > >> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> > >> "Stack corruption" can't manifest as "Out of memory and no killable processes".
> > >>
> > >> "kernel panic: Out of memory and no killable processes..." is completely
> > >> unrelated to "kernel panic: corrupted stack end in wb_workfn".
> > >
> > >
> > > Do you think this predicate is possible to code?
> >
> > Something like bellow probably would work better than current behavior.
> >
> > For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.
>
> Lots of bugs (half?) manifest differently. On top of this, titles
> change as we go back in history. On top of this, if we see a different
> bug, it does not mean that the original bug is also not there.
> This will sure solve some subset of cases better then the current
> logic. But I feel that that subset is smaller then what the current
> logic solves.
Counter-examples come up in basically every other bisection.
For example:
bisecting cause commit starting from ccda4af0f4b92f7b4c308d3acc262f4a7e3affad
building syzkaller on 5f5f6d14e80b8bd6b42db961118e902387716bcb
testing commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
testing release v4.19
testing commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
testing release v4.18
testing commit 94710cac0ef4ee177a63b5227664b38c95bbf703 with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
testing release v4.17
testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
That's a different crash title, unless somebody explicitly code this case.
Or, what crash is this?
testing commit 52358cb5a310990ea5069f986bdab3620e01181f with gcc (GCC) 8.1.0
run #1: crashed: general protection fault in cpuacct_charge
run #2: crashed: WARNING: suspicious RCU usage in corrupted
run #3: crashed: general protection fault in cpuacct_charge
run #4: crashed: BUG: unable to handle kernel paging request in ipt_do_table
run #5: crashed: KASAN: stack-out-of-bounds Read in cpuacct_charge
run #6: crashed: WARNING: suspicious RCU usage
run #7: crashed: no output from test machine
run #8: crashed: no output from test machine
Or, that "INFO: trying to register non-static key in can_notifier"
does not do any testing, but is "WARNING in dma_buf_vunmap" still
there or not?
testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
all runs: crashed: WARNING in dma_buf_vunmap
testing release v4.11
testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
all runs: OK
# git bisect start v4.12 v4.11
Bisecting: 7831 revisions left to test after this (roughly 13 steps)
[2bd80401743568ced7d303b008ae5298ce77e695] Merge tag 'gpio-v4.12-1' of
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
testing commit 2bd80401743568ced7d303b008ae5298ce77e695 with gcc (GCC) 7.3.0
all runs: crashed: INFO: trying to register non-static key in can_notifier
# git bisect bad 2bd80401743568ced7d303b008ae5298ce77e695
Bisecting: 3853 revisions left to test after this (roughly 12 steps)
[8d65b08debc7e62b2c6032d7fe7389d895b92cbc] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
testing commit 8d65b08debc7e62b2c6032d7fe7389d895b92cbc with gcc (GCC) 7.3.0
all runs: crashed: INFO: trying to register non-static key in can_notifier
# git bisect bad 8d65b08debc7e62b2c6032d7fe7389d895b92cbc
Bisecting: 2022 revisions left to test after this (roughly 11 steps)
[cec381919818a9a0cb85600b3c82404bdd38cf36] Merge tag
'mac80211-next-for-davem-2017-04-28' of
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
testing commit cec381919818a9a0cb85600b3c82404bdd38cf36 with gcc (GCC) 5.5.0
all runs: crashed: INFO: trying to register non-static key in can_notifier
> > syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.
>
> This is very limited set of info. And in the end I think we've seen
> all bug types being duped on all other bugs types pair-wise, and at
> the same time we've seen all bug types being not dups to all other bug
> types. So I don't see where this gets us.
> And again as we go back in history all these titles change.
>
> > Also it might be worth to experiment with using neural networks to identify duplicates.
> >
> >
> > target_crash = 'kernel panic: corrupted stack end in wb_workfn'
> > test commit:
> > bad = false;
> > skip = true;
> > foreach run:
> > run_started, crashed, crash := run_repro();
> >
> > //kernel built, booted, reproducer launched successfully
> > if (run_started)
> > skip = false;
> > if (crashed && is_duplicates(crash, target_crash))
> > bad = true;
> >
> > if (skip)
> > git bisect skip;
> > else if (bad)
> > git bisect bad;
> > else
> > git bisect good;
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-21 9:45 ` Dmitry Vyukov
0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-21 9:45 UTC (permalink / raw)
To: Andrey Ryabinin
Cc: Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller,
guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
Matthew Wilcox, Xin Long
On Wed, Mar 20, 2019 at 2:57 PM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Mar 20, 2019 at 2:33 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
> >
> >
> >
> > On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> > > On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> > > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > >>
> > >> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> > >>>> From bisection log:
> > >>>>
> > >>>> testing release v4.17
> > >>>> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> > >>>> run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>> run #1: crashed: kernel panic: corrupted stack end in worker_thread
> > >>>> run #2: crashed: kernel panic: Out of memory and no killable processes...
> > >>>> run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>> run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>> run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>> run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>> run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>> run #8: crashed: kernel panic: Out of memory and no killable processes...
> > >>>> run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>> testing release v4.16
> > >>>> testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> > >>>> run #0: OK
> > >>>> run #1: OK
> > >>>> run #2: OK
> > >>>> run #3: OK
> > >>>> run #4: OK
> > >>>> run #5: crashed: kernel panic: Out of memory and no killable processes...
> > >>>> run #6: OK
> > >>>> run #7: crashed: kernel panic: Out of memory and no killable processes...
> > >>>> run #8: OK
> > >>>> run #9: OK
> > >>>> testing release v4.15
> > >>>> testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> > >>>> all runs: OK
> > >>>> # git bisect start v4.16 v4.15
> > >>>>
> > >>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> > >>>
> > >>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > >>> looks like the right range, no?
> > >>
> > >> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> > >> "Stack corruption" can't manifest as "Out of memory and no killable processes".
> > >>
> > >> "kernel panic: Out of memory and no killable processes..." is completely
> > >> unrelated to "kernel panic: corrupted stack end in wb_workfn".
> > >
> > >
> > > Do you think this predicate is possible to code?
> >
> > Something like bellow probably would work better than current behavior.
> >
> > For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.
>
> Lots of bugs (half?) manifest differently. On top of this, titles
> change as we go back in history. On top of this, if we see a different
> bug, it does not mean that the original bug is also not there.
> This will sure solve some subset of cases better then the current
> logic. But I feel that that subset is smaller then what the current
> logic solves.
Counter-examples come up in basically every other bisection.
For example:
bisecting cause commit starting from ccda4af0f4b92f7b4c308d3acc262f4a7e3affad
building syzkaller on 5f5f6d14e80b8bd6b42db961118e902387716bcb
testing commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
testing release v4.19
testing commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
testing release v4.18
testing commit 94710cac0ef4ee177a63b5227664b38c95bbf703 with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
testing release v4.17
testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
That's a different crash title, unless somebody explicitly code this case.
Or, what crash is this?
testing commit 52358cb5a310990ea5069f986bdab3620e01181f with gcc (GCC) 8.1.0
run #1: crashed: general protection fault in cpuacct_charge
run #2: crashed: WARNING: suspicious RCU usage in corrupted
run #3: crashed: general protection fault in cpuacct_charge
run #4: crashed: BUG: unable to handle kernel paging request in ipt_do_table
run #5: crashed: KASAN: stack-out-of-bounds Read in cpuacct_charge
run #6: crashed: WARNING: suspicious RCU usage
run #7: crashed: no output from test machine
run #8: crashed: no output from test machine
Or, that "INFO: trying to register non-static key in can_notifier"
does not do any testing, but is "WARNING in dma_buf_vunmap" still
there or not?
testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
all runs: crashed: WARNING in dma_buf_vunmap
testing release v4.11
testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
all runs: OK
# git bisect start v4.12 v4.11
Bisecting: 7831 revisions left to test after this (roughly 13 steps)
[2bd80401743568ced7d303b008ae5298ce77e695] Merge tag 'gpio-v4.12-1' of
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
testing commit 2bd80401743568ced7d303b008ae5298ce77e695 with gcc (GCC) 7.3.0
all runs: crashed: INFO: trying to register non-static key in can_notifier
# git bisect bad 2bd80401743568ced7d303b008ae5298ce77e695
Bisecting: 3853 revisions left to test after this (roughly 12 steps)
[8d65b08debc7e62b2c6032d7fe7389d895b92cbc] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
testing commit 8d65b08debc7e62b2c6032d7fe7389d895b92cbc with gcc (GCC) 7.3.0
all runs: crashed: INFO: trying to register non-static key in can_notifier
# git bisect bad 8d65b08debc7e62b2c6032d7fe7389d895b92cbc
Bisecting: 2022 revisions left to test after this (roughly 11 steps)
[cec381919818a9a0cb85600b3c82404bdd38cf36] Merge tag
'mac80211-next-for-davem-2017-04-28' of
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
testing commit cec381919818a9a0cb85600b3c82404bdd38cf36 with gcc (GCC) 5.5.0
all runs: crashed: INFO: trying to register non-static key in can_notifier
> > syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.
>
> This is very limited set of info. And in the end I think we've seen
> all bug types being duped on all other bugs types pair-wise, and at
> the same time we've seen all bug types being not dups to all other bug
> types. So I don't see where this gets us.
> And again as we go back in history all these titles change.
>
> > Also it might be worth to experiment with using neural networks to identify duplicates.
> >
> >
> > target_crash = 'kernel panic: corrupted stack end in wb_workfn'
> > test commit:
> > bad = false;
> > skip = true;
> > foreach run:
> > run_started, crashed, crash := run_repro();
> >
> > //kernel built, booted, reproducer launched successfully
> > if (run_started)
> > skip = false;
> > if (crashed && is_duplicates(crash, target_crash))
> > bad = true;
> >
> > if (skip)
> > git bisect skip;
> > else if (bad)
> > git bisect bad;
> > else
> > git bisect good;
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: kernel panic: corrupted stack end in wb_workfn
2019-03-21 9:45 ` Dmitry Vyukov
(?)
@ 2019-03-21 9:51 ` Dmitry Vyukov
-1 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-21 9:51 UTC (permalink / raw)
To: Andrey Ryabinin
Cc: Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller,
guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
Matthew Wilcox, Xin Long
On Thu, Mar 21, 2019 at 10:45 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Mar 20, 2019 at 2:57 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > On Wed, Mar 20, 2019 at 2:33 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
> > >
> > >
> > >
> > > On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> > > > On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> > > > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > > >>
> > > >> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> > > >>>> From bisection log:
> > > >>>>
> > > >>>> testing release v4.17
> > > >>>> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> > > >>>> run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>> run #1: crashed: kernel panic: corrupted stack end in worker_thread
> > > >>>> run #2: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>> run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>> run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>> run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>> run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>> run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>> run #8: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>> run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>> testing release v4.16
> > > >>>> testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> > > >>>> run #0: OK
> > > >>>> run #1: OK
> > > >>>> run #2: OK
> > > >>>> run #3: OK
> > > >>>> run #4: OK
> > > >>>> run #5: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>> run #6: OK
> > > >>>> run #7: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>> run #8: OK
> > > >>>> run #9: OK
> > > >>>> testing release v4.15
> > > >>>> testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> > > >>>> all runs: OK
> > > >>>> # git bisect start v4.16 v4.15
> > > >>>>
> > > >>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> > > >>>
> > > >>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > > >>> looks like the right range, no?
> > > >>
> > > >> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> > > >> "Stack corruption" can't manifest as "Out of memory and no killable processes".
> > > >>
> > > >> "kernel panic: Out of memory and no killable processes..." is completely
> > > >> unrelated to "kernel panic: corrupted stack end in wb_workfn".
> > > >
> > > >
> > > > Do you think this predicate is possible to code?
> > >
> > > Something like bellow probably would work better than current behavior.
> > >
> > > For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.
> >
> > Lots of bugs (half?) manifest differently. On top of this, titles
> > change as we go back in history. On top of this, if we see a different
> > bug, it does not mean that the original bug is also not there.
> > This will sure solve some subset of cases better then the current
> > logic. But I feel that that subset is smaller then what the current
> > logic solves.
>
> Counter-examples come up in basically every other bisection.
> For example:
>
> bisecting cause commit starting from ccda4af0f4b92f7b4c308d3acc262f4a7e3affad
> building syzkaller on 5f5f6d14e80b8bd6b42db961118e902387716bcb
> testing commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
> testing release v4.19
> testing commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
> testing release v4.18
> testing commit 94710cac0ef4ee177a63b5227664b38c95bbf703 with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
> testing release v4.17
> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
And to make things even more interesting, this later changes to "BUG:
unable to handle kernel NULL pointer dereference in vb2_vmalloc_put":
testing release v4.12
testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
all runs: crashed: general protection fault in refcount_sub_and_test
testing release v4.11
testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
all runs: crashed: BUG: unable to handle kernel NULL pointer
dereference in vb2_vmalloc_put
And since the original bug is in vb2 subsystem
(https://syzkaller.appspot.com/bug?id=17535f4bf5b322437f7c639b59161ce343fc55a9),
it's actually not clear even for me, if we should treat it as the same
bug or not. May be different manifestation of the same root cause, or
a different bug around.
> That's a different crash title, unless somebody explicitly code this case.
>
> Or, what crash is this?
>
> testing commit 52358cb5a310990ea5069f986bdab3620e01181f with gcc (GCC) 8.1.0
> run #1: crashed: general protection fault in cpuacct_charge
> run #2: crashed: WARNING: suspicious RCU usage in corrupted
> run #3: crashed: general protection fault in cpuacct_charge
> run #4: crashed: BUG: unable to handle kernel paging request in ipt_do_table
> run #5: crashed: KASAN: stack-out-of-bounds Read in cpuacct_charge
> run #6: crashed: WARNING: suspicious RCU usage
> run #7: crashed: no output from test machine
> run #8: crashed: no output from test machine
>
>
> Or, that "INFO: trying to register non-static key in can_notifier"
> does not do any testing, but is "WARNING in dma_buf_vunmap" still
> there or not?
>
> testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
> all runs: crashed: WARNING in dma_buf_vunmap
> testing release v4.11
> testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
> all runs: OK
> # git bisect start v4.12 v4.11
> Bisecting: 7831 revisions left to test after this (roughly 13 steps)
> [2bd80401743568ced7d303b008ae5298ce77e695] Merge tag 'gpio-v4.12-1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
> testing commit 2bd80401743568ced7d303b008ae5298ce77e695 with gcc (GCC) 7.3.0
> all runs: crashed: INFO: trying to register non-static key in can_notifier
> # git bisect bad 2bd80401743568ced7d303b008ae5298ce77e695
> Bisecting: 3853 revisions left to test after this (roughly 12 steps)
> [8d65b08debc7e62b2c6032d7fe7389d895b92cbc] Merge
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
> testing commit 8d65b08debc7e62b2c6032d7fe7389d895b92cbc with gcc (GCC) 7.3.0
> all runs: crashed: INFO: trying to register non-static key in can_notifier
> # git bisect bad 8d65b08debc7e62b2c6032d7fe7389d895b92cbc
> Bisecting: 2022 revisions left to test after this (roughly 11 steps)
> [cec381919818a9a0cb85600b3c82404bdd38cf36] Merge tag
> 'mac80211-next-for-davem-2017-04-28' of
> git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
> testing commit cec381919818a9a0cb85600b3c82404bdd38cf36 with gcc (GCC) 5.5.0
> all runs: crashed: INFO: trying to register non-static key in can_notifier
>
>
>
>
>
>
> > > syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.
> >
> > This is very limited set of info. And in the end I think we've seen
> > all bug types being duped on all other bugs types pair-wise, and at
> > the same time we've seen all bug types being not dups to all other bug
> > types. So I don't see where this gets us.
> > And again as we go back in history all these titles change.
> >
> > > Also it might be worth to experiment with using neural networks to identify duplicates.
> > >
> > >
> > > target_crash = 'kernel panic: corrupted stack end in wb_workfn'
> > > test commit:
> > > bad = false;
> > > skip = true;
> > > foreach run:
> > > run_started, crashed, crash := run_repro();
> > >
> > > //kernel built, booted, reproducer launched successfully
> > > if (run_started)
> > > skip = false;
> > > if (crashed && is_duplicates(crash, target_crash))
> > > bad = true;
> > >
> > > if (skip)
> > > git bisect skip;
> > > else if (bad)
> > > git bisect bad;
> > > else
> > > git bisect good;
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-21 9:51 ` Dmitry Vyukov
0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-21 9:51 UTC (permalink / raw)
To: Andrey Ryabinin
Cc: Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller,
guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
Matthew Wilcox, Xin Long
On Thu, Mar 21, 2019 at 10:45 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Mar 20, 2019 at 2:57 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > On Wed, Mar 20, 2019 at 2:33 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
> > >
> > >
> > >
> > > On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> > > > On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> > > > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > > >>
> > > >> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> > > >>>> From bisection log:
> > > >>>>
> > > >>>> testing release v4.17
> > > >>>> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> > > >>>> run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>> run #1: crashed: kernel panic: corrupted stack end in worker_thread
> > > >>>> run #2: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>> run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>> run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>> run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>> run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>> run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>> run #8: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>> run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>> testing release v4.16
> > > >>>> testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> > > >>>> run #0: OK
> > > >>>> run #1: OK
> > > >>>> run #2: OK
> > > >>>> run #3: OK
> > > >>>> run #4: OK
> > > >>>> run #5: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>> run #6: OK
> > > >>>> run #7: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>> run #8: OK
> > > >>>> run #9: OK
> > > >>>> testing release v4.15
> > > >>>> testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> > > >>>> all runs: OK
> > > >>>> # git bisect start v4.16 v4.15
> > > >>>>
> > > >>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> > > >>>
> > > >>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > > >>> looks like the right range, no?
> > > >>
> > > >> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> > > >> "Stack corruption" can't manifest as "Out of memory and no killable processes".
> > > >>
> > > >> "kernel panic: Out of memory and no killable processes..." is completely
> > > >> unrelated to "kernel panic: corrupted stack end in wb_workfn".
> > > >
> > > >
> > > > Do you think this predicate is possible to code?
> > >
> > > Something like bellow probably would work better than current behavior.
> > >
> > > For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.
> >
> > Lots of bugs (half?) manifest differently. On top of this, titles
> > change as we go back in history. On top of this, if we see a different
> > bug, it does not mean that the original bug is also not there.
> > This will sure solve some subset of cases better then the current
> > logic. But I feel that that subset is smaller then what the current
> > logic solves.
>
> Counter-examples come up in basically every other bisection.
> For example:
>
> bisecting cause commit starting from ccda4af0f4b92f7b4c308d3acc262f4a7e3affad
> building syzkaller on 5f5f6d14e80b8bd6b42db961118e902387716bcb
> testing commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
> testing release v4.19
> testing commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
> testing release v4.18
> testing commit 94710cac0ef4ee177a63b5227664b38c95bbf703 with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
> testing release v4.17
> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
And to make things even more interesting, this later changes to "BUG:
unable to handle kernel NULL pointer dereference in vb2_vmalloc_put":
testing release v4.12
testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
all runs: crashed: general protection fault in refcount_sub_and_test
testing release v4.11
testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
all runs: crashed: BUG: unable to handle kernel NULL pointer
dereference in vb2_vmalloc_put
And since the original bug is in vb2 subsystem
(https://syzkaller.appspot.com/bug?id=17535f4bf5b322437f7c639b59161ce343fc55a9),
it's actually not clear even for me, if we should treat it as the same
bug or not. May be different manifestation of the same root cause, or
a different bug around.
> That's a different crash title, unless somebody explicitly code this case.
>
> Or, what crash is this?
>
> testing commit 52358cb5a310990ea5069f986bdab3620e01181f with gcc (GCC) 8.1.0
> run #1: crashed: general protection fault in cpuacct_charge
> run #2: crashed: WARNING: suspicious RCU usage in corrupted
> run #3: crashed: general protection fault in cpuacct_charge
> run #4: crashed: BUG: unable to handle kernel paging request in ipt_do_table
> run #5: crashed: KASAN: stack-out-of-bounds Read in cpuacct_charge
> run #6: crashed: WARNING: suspicious RCU usage
> run #7: crashed: no output from test machine
> run #8: crashed: no output from test machine
>
>
> Or, that "INFO: trying to register non-static key in can_notifier"
> does not do any testing, but is "WARNING in dma_buf_vunmap" still
> there or not?
>
> testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
> all runs: crashed: WARNING in dma_buf_vunmap
> testing release v4.11
> testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
> all runs: OK
> # git bisect start v4.12 v4.11
> Bisecting: 7831 revisions left to test after this (roughly 13 steps)
> [2bd80401743568ced7d303b008ae5298ce77e695] Merge tag 'gpio-v4.12-1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
> testing commit 2bd80401743568ced7d303b008ae5298ce77e695 with gcc (GCC) 7.3.0
> all runs: crashed: INFO: trying to register non-static key in can_notifier
> # git bisect bad 2bd80401743568ced7d303b008ae5298ce77e695
> Bisecting: 3853 revisions left to test after this (roughly 12 steps)
> [8d65b08debc7e62b2c6032d7fe7389d895b92cbc] Merge
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
> testing commit 8d65b08debc7e62b2c6032d7fe7389d895b92cbc with gcc (GCC) 7.3.0
> all runs: crashed: INFO: trying to register non-static key in can_notifier
> # git bisect bad 8d65b08debc7e62b2c6032d7fe7389d895b92cbc
> Bisecting: 2022 revisions left to test after this (roughly 11 steps)
> [cec381919818a9a0cb85600b3c82404bdd38cf36] Merge tag
> 'mac80211-next-for-davem-2017-04-28' of
> git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
> testing commit cec381919818a9a0cb85600b3c82404bdd38cf36 with gcc (GCC) 5.5.0
> all runs: crashed: INFO: trying to register non-static key in can_notifier
>
>
>
>
>
>
> > > syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.
> >
> > This is very limited set of info. And in the end I think we've seen
> > all bug types being duped on all other bugs types pair-wise, and at
> > the same time we've seen all bug types being not dups to all other bug
> > types. So I don't see where this gets us.
> > And again as we go back in history all these titles change.
> >
> > > Also it might be worth to experiment with using neural networks to identify duplicates.
> > >
> > >
> > > target_crash = 'kernel panic: corrupted stack end in wb_workfn'
> > > test commit:
> > > bad = false;
> > > skip = true;
> > > foreach run:
> > > run_started, crashed, crash := run_repro();
> > >
> > > //kernel built, booted, reproducer launched successfully
> > > if (run_started)
> > > skip = false;
> > > if (crashed && is_duplicates(crash, target_crash))
> > > bad = true;
> > >
> > > if (skip)
> > > git bisect skip;
> > > else if (bad)
> > > git bisect bad;
> > > else
> > > git bisect good;
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-21 9:51 ` Dmitry Vyukov
0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-21 9:51 UTC (permalink / raw)
To: Andrey Ryabinin
Cc: Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller,
guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
Matthew Wilcox, Xin Long
On Thu, Mar 21, 2019 at 10:45 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Mar 20, 2019 at 2:57 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > On Wed, Mar 20, 2019 at 2:33 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
> > >
> > >
> > >
> > > On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> > > > On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> > > > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > > >>
> > > >> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> > > >>>> From bisection log:
> > > >>>>
> > > >>>> testing release v4.17
> > > >>>> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> > > >>>> run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>> run #1: crashed: kernel panic: corrupted stack end in worker_thread
> > > >>>> run #2: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>> run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>> run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>> run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>> run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>> run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>> run #8: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>> run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>> testing release v4.16
> > > >>>> testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> > > >>>> run #0: OK
> > > >>>> run #1: OK
> > > >>>> run #2: OK
> > > >>>> run #3: OK
> > > >>>> run #4: OK
> > > >>>> run #5: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>> run #6: OK
> > > >>>> run #7: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>> run #8: OK
> > > >>>> run #9: OK
> > > >>>> testing release v4.15
> > > >>>> testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> > > >>>> all runs: OK
> > > >>>> # git bisect start v4.16 v4.15
> > > >>>>
> > > >>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> > > >>>
> > > >>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > > >>> looks like the right range, no?
> > > >>
> > > >> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> > > >> "Stack corruption" can't manifest as "Out of memory and no killable processes".
> > > >>
> > > >> "kernel panic: Out of memory and no killable processes..." is completely
> > > >> unrelated to "kernel panic: corrupted stack end in wb_workfn".
> > > >
> > > >
> > > > Do you think this predicate is possible to code?
> > >
> > > Something like bellow probably would work better than current behavior.
> > >
> > > For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.
> >
> > Lots of bugs (half?) manifest differently. On top of this, titles
> > change as we go back in history. On top of this, if we see a different
> > bug, it does not mean that the original bug is also not there.
> > This will sure solve some subset of cases better then the current
> > logic. But I feel that that subset is smaller then what the current
> > logic solves.
>
> Counter-examples come up in basically every other bisection.
> For example:
>
> bisecting cause commit starting from ccda4af0f4b92f7b4c308d3acc262f4a7e3affad
> building syzkaller on 5f5f6d14e80b8bd6b42db961118e902387716bcb
> testing commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
> testing release v4.19
> testing commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
> testing release v4.18
> testing commit 94710cac0ef4ee177a63b5227664b38c95bbf703 with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
> testing release v4.17
> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
And to make things even more interesting, this later changes to "BUG:
unable to handle kernel NULL pointer dereference in vb2_vmalloc_put":
testing release v4.12
testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
all runs: crashed: general protection fault in refcount_sub_and_test
testing release v4.11
testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
all runs: crashed: BUG: unable to handle kernel NULL pointer
dereference in vb2_vmalloc_put
And since the original bug is in vb2 subsystem
(https://syzkaller.appspot.com/bug?id\x17535f4bf5b322437f7c639b59161ce343fc55a9),
it's actually not clear even for me, if we should treat it as the same
bug or not. May be different manifestation of the same root cause, or
a different bug around.
> That's a different crash title, unless somebody explicitly code this case.
>
> Or, what crash is this?
>
> testing commit 52358cb5a310990ea5069f986bdab3620e01181f with gcc (GCC) 8.1.0
> run #1: crashed: general protection fault in cpuacct_charge
> run #2: crashed: WARNING: suspicious RCU usage in corrupted
> run #3: crashed: general protection fault in cpuacct_charge
> run #4: crashed: BUG: unable to handle kernel paging request in ipt_do_table
> run #5: crashed: KASAN: stack-out-of-bounds Read in cpuacct_charge
> run #6: crashed: WARNING: suspicious RCU usage
> run #7: crashed: no output from test machine
> run #8: crashed: no output from test machine
>
>
> Or, that "INFO: trying to register non-static key in can_notifier"
> does not do any testing, but is "WARNING in dma_buf_vunmap" still
> there or not?
>
> testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
> all runs: crashed: WARNING in dma_buf_vunmap
> testing release v4.11
> testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
> all runs: OK
> # git bisect start v4.12 v4.11
> Bisecting: 7831 revisions left to test after this (roughly 13 steps)
> [2bd80401743568ced7d303b008ae5298ce77e695] Merge tag 'gpio-v4.12-1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
> testing commit 2bd80401743568ced7d303b008ae5298ce77e695 with gcc (GCC) 7.3.0
> all runs: crashed: INFO: trying to register non-static key in can_notifier
> # git bisect bad 2bd80401743568ced7d303b008ae5298ce77e695
> Bisecting: 3853 revisions left to test after this (roughly 12 steps)
> [8d65b08debc7e62b2c6032d7fe7389d895b92cbc] Merge
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
> testing commit 8d65b08debc7e62b2c6032d7fe7389d895b92cbc with gcc (GCC) 7.3.0
> all runs: crashed: INFO: trying to register non-static key in can_notifier
> # git bisect bad 8d65b08debc7e62b2c6032d7fe7389d895b92cbc
> Bisecting: 2022 revisions left to test after this (roughly 11 steps)
> [cec381919818a9a0cb85600b3c82404bdd38cf36] Merge tag
> 'mac80211-next-for-davem-2017-04-28' of
> git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
> testing commit cec381919818a9a0cb85600b3c82404bdd38cf36 with gcc (GCC) 5.5.0
> all runs: crashed: INFO: trying to register non-static key in can_notifier
>
>
>
>
>
>
> > > syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.
> >
> > This is very limited set of info. And in the end I think we've seen
> > all bug types being duped on all other bugs types pair-wise, and at
> > the same time we've seen all bug types being not dups to all other bug
> > types. So I don't see where this gets us.
> > And again as we go back in history all these titles change.
> >
> > > Also it might be worth to experiment with using neural networks to identify duplicates.
> > >
> > >
> > > target_crash = 'kernel panic: corrupted stack end in wb_workfn'
> > > test commit:
> > > bad = false;
> > > skip = true;
> > > foreach run:
> > > run_started, crashed, crash := run_repro();
> > >
> > > //kernel built, booted, reproducer launched successfully
> > > if (run_started)
> > > skip = false;
> > > if (crashed && is_duplicates(crash, target_crash))
> > > bad = true;
> > >
> > > if (skip)
> > > git bisect skip;
> > > else if (bad)
> > > git bisect bad;
> > > else
> > > git bisect good;
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: kernel panic: corrupted stack end in wb_workfn
2019-03-21 9:51 ` Dmitry Vyukov
@ 2019-03-21 11:41 ` Tetsuo Handa
-1 siblings, 0 replies; 42+ messages in thread
From: Tetsuo Handa @ 2019-03-21 11:41 UTC (permalink / raw)
To: Dmitry Vyukov, Andrey Ryabinin
Cc: syzbot, Andrew Morton, Qian Cai, David Miller, guro,
Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
Matthew Wilcox, Xin Long
On 2019/03/21 18:51, Dmitry Vyukov wrote:
>>> Lots of bugs (half?) manifest differently. On top of this, titles
>>> change as we go back in history. On top of this, if we see a different
>>> bug, it does not mean that the original bug is also not there.
>>> This will sure solve some subset of cases better then the current
>>> logic. But I feel that that subset is smaller then what the current
>>> logic solves.
>>
>> Counter-examples come up in basically every other bisection.
>> For example:
>>
>> bisecting cause commit starting from ccda4af0f4b92f7b4c308d3acc262f4a7e3affad
>> building syzkaller on 5f5f6d14e80b8bd6b42db961118e902387716bcb
>> testing commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad with gcc (GCC) 8.1.0
>> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
>> testing release v4.19
>> testing commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d with gcc (GCC) 8.1.0
>> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
>> testing release v4.18
>> testing commit 94710cac0ef4ee177a63b5227664b38c95bbf703 with gcc (GCC) 8.1.0
>> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
>> testing release v4.17
>> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
>> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
>
>
> And to make things even more interesting, this later changes to "BUG:
> unable to handle kernel NULL pointer dereference in vb2_vmalloc_put":
>
> testing release v4.12
> testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
> all runs: crashed: general protection fault in refcount_sub_and_test
> testing release v4.11
> testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
> all runs: crashed: BUG: unable to handle kernel NULL pointer
> dereference in vb2_vmalloc_put
>
> And since the original bug is in vb2 subsystem
> (https://syzkaller.appspot.com/bug?id=17535f4bf5b322437f7c639b59161ce343fc55a9),
> it's actually not clear even for me, if we should treat it as the same
> bug or not. May be different manifestation of the same root cause, or
> a different bug around.
>
Well, maybe we should use reproducers for checking whether each not-yet-fixed
problem is reproducible with old kernels rather than finding specific commit
that is causing specific problem?
I think there are two patterns syzbot starts reporting.
(a) a commit which causes one or more problems is merged into a codebase where
syzbot was already testing because syzbot already knew what/how should
that codebase be tested.
(b) a commit which causes one or more problems was already there in a codebase
where syzbot did not know until now what/how should that codebase be tested.
(a) tends to require testing new kernels (i.e. bisection range is narrow) whereas
(b) tends to require testing old kernels (i.e. bisection range is wide).
Regarding case (b), it is difficult for developers to guess when the problem
started, and I think that (b) tends to confuse automatic bisection attempts.
Therefore, instead of trying to find specific commit for specific problem using
"git bisect" approach, try running all reproducers (gathered from all problems)
on each release (e.g. each git tag) and append reproduced crashes to the
Manager Time Kernel Commit Syzkaller Config Log Report Syz repro C repro Maintainers
table for each not-yet-fixed problem of dashboard interface. That is, if running a
repro1 from problem1 on some old kernel reproduced a crash for problem2, append the
crash to the problem2's table. Maybe we want to use a new table with only
Kernel Commit Syzkaller Config Log Report Syz repro C repro
entries because what we want to know is the oldest kernel release which helps
guessing when the problem started.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-21 11:41 ` Tetsuo Handa
0 siblings, 0 replies; 42+ messages in thread
From: Tetsuo Handa @ 2019-03-21 11:41 UTC (permalink / raw)
To: Dmitry Vyukov, Andrey Ryabinin
Cc: syzbot, Andrew Morton, Qian Cai, David Miller, guro,
Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
Matthew Wilcox, Xin Long
On 2019/03/21 18:51, Dmitry Vyukov wrote:
>>> Lots of bugs (half?) manifest differently. On top of this, titles
>>> change as we go back in history. On top of this, if we see a different
>>> bug, it does not mean that the original bug is also not there.
>>> This will sure solve some subset of cases better then the current
>>> logic. But I feel that that subset is smaller then what the current
>>> logic solves.
>>
>> Counter-examples come up in basically every other bisection.
>> For example:
>>
>> bisecting cause commit starting from ccda4af0f4b92f7b4c308d3acc262f4a7e3affad
>> building syzkaller on 5f5f6d14e80b8bd6b42db961118e902387716bcb
>> testing commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad with gcc (GCC) 8.1.0
>> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
>> testing release v4.19
>> testing commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d with gcc (GCC) 8.1.0
>> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
>> testing release v4.18
>> testing commit 94710cac0ef4ee177a63b5227664b38c95bbf703 with gcc (GCC) 8.1.0
>> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
>> testing release v4.17
>> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
>> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
>
>
> And to make things even more interesting, this later changes to "BUG:
> unable to handle kernel NULL pointer dereference in vb2_vmalloc_put":
>
> testing release v4.12
> testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
> all runs: crashed: general protection fault in refcount_sub_and_test
> testing release v4.11
> testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
> all runs: crashed: BUG: unable to handle kernel NULL pointer
> dereference in vb2_vmalloc_put
>
> And since the original bug is in vb2 subsystem
> (https://syzkaller.appspot.com/bug?id\x17535f4bf5b322437f7c639b59161ce343fc55a9),
> it's actually not clear even for me, if we should treat it as the same
> bug or not. May be different manifestation of the same root cause, or
> a different bug around.
>
Well, maybe we should use reproducers for checking whether each not-yet-fixed
problem is reproducible with old kernels rather than finding specific commit
that is causing specific problem?
I think there are two patterns syzbot starts reporting.
(a) a commit which causes one or more problems is merged into a codebase where
syzbot was already testing because syzbot already knew what/how should
that codebase be tested.
(b) a commit which causes one or more problems was already there in a codebase
where syzbot did not know until now what/how should that codebase be tested.
(a) tends to require testing new kernels (i.e. bisection range is narrow) whereas
(b) tends to require testing old kernels (i.e. bisection range is wide).
Regarding case (b), it is difficult for developers to guess when the problem
started, and I think that (b) tends to confuse automatic bisection attempts.
Therefore, instead of trying to find specific commit for specific problem using
"git bisect" approach, try running all reproducers (gathered from all problems)
on each release (e.g. each git tag) and append reproduced crashes to the
Manager Time Kernel Commit Syzkaller Config Log Report Syz repro C repro Maintainers
table for each not-yet-fixed problem of dashboard interface. That is, if running a
repro1 from problem1 on some old kernel reproduced a crash for problem2, append the
crash to the problem2's table. Maybe we want to use a new table with only
Kernel Commit Syzkaller Config Log Report Syz repro C repro
entries because what we want to know is the oldest kernel release which helps
guessing when the problem started.
^ permalink raw reply [flat|nested] 42+ messages in thread