linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* general protection fault in __schedule (2)
@ 2018-08-10 13:39 syzbot
  2019-11-22  7:19 ` syzbot
  0 siblings, 1 reply; 7+ messages in thread
From: syzbot @ 2018-08-10 13:39 UTC (permalink / raw)
  To: gregkh, hpa, kstewart, linux-kernel, mingo, pasha.tatashin,
	pombredanne, syzkaller-bugs, tglx, x86

Hello,

syzbot found the following crash on:

HEAD commit:    8c8399e0a3fb Add linux-next specific files for 20180806
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=101fe2ac400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=1b6bc1781e49e93e
dashboard link: https://syzkaller.appspot.com/bug?extid=7e2ab84953e4084a638d
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=1628c2ac400000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=12847864400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+7e2ab84953e4084a638d@syzkaller.appspotmail.com

random: sshd: uninitialized urandom read (32 bytes read)
urandom_read: 1 callbacks suppressed
random: sshd: uninitialized urandom read (32 bytes read)
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
CPU: 1 PID: 4345 Comm: syz-executor659 Not tainted  
4.18.0-rc8-next-20180806+ #32
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:__fire_sched_out_preempt_notifiers kernel/sched/core.c:2497  
[inline]
RIP: 0010:fire_sched_out_preempt_notifiers kernel/sched/core.c:2505 [inline]
RIP: 0010:prepare_task_switch kernel/sched/core.c:2611 [inline]
RIP: 0010:context_switch kernel/sched/core.c:2788 [inline]
RIP: 0010:__schedule+0x1061/0x1ec0 kernel/sched/core.c:3471
Code: 4c 89 e8 48 c1 e8 03 42 80 3c 30 00 0f 85 ea 08 00 00 4d 8b 6d 00 4d  
85 ed 0f 84 6b f6 ff ff 49 8d 7d 10 48 89 f8 48 c1 e8 03 <42> 80 3c 30 00  
74 a6 e8 93 26 04 fb eb 9f 4c 89 e6 48 89 df e8 46
RSP: 0018:ffff8801ac05ea80 EFLAGS: 00010806
RAX: 1bd5a00000000022 RBX: ffff8801af7f83c0 RCX: 1ffff1003731888f
RDX: 0000000040000000 RSI: 0000000000000000 RDI: dead000000000110
RBP: ffff8801ac05ec50 R08: ffff8801af7f83c0 R09: fffff520003c4a47
R10: fffff520003c4a47 R11: ffffc90001e2523b R12: ffff8801d9ee8280
R13: dead000000000100 R14: dffffc0000000000 R15: ffff8801db12ca40
FS:  00000000023af880(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6c7cdd3000 CR3: 0000000007e6a000 CR4: 00000000001426e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  preempt_schedule_common+0x22/0x60 kernel/sched/core.c:3595
  _cond_resched+0x1d/0x30 kernel/sched/core.c:4961
  __mutex_lock_common kernel/locking/mutex.c:908 [inline]
  __mutex_lock+0x13d/0x1700 kernel/locking/mutex.c:1073
  mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088
  arch_jump_label_transform+0x1b/0x40 arch/x86/kernel/jump_label.c:112
  __jump_label_update+0x16e/0x1a0 kernel/jump_label.c:375
  jump_label_update+0x151/0x2e0 kernel/jump_label.c:760
  __static_key_slow_dec_cpuslocked+0xb8/0x210 kernel/jump_label.c:205
  __static_key_slow_dec kernel/jump_label.c:215 [inline]
  static_key_slow_dec+0x63/0xa0 kernel/jump_label.c:229
  kvm_arch_vcpu_uninit+0x18e/0x1d0 arch/x86/kvm/x86.c:8768
  kvm_vcpu_uninit+0x44/0x90 arch/x86/kvm/../../../virt/kvm/kvm_main.c:337
  vmx_free_vcpu+0x23a/0x300 arch/x86/kvm/vmx.c:10649
  kvm_arch_vcpu_free arch/x86/kvm/x86.c:8384 [inline]
  kvm_free_vcpus arch/x86/kvm/x86.c:8833 [inline]
  kvm_arch_destroy_vm+0x365/0x7c0 arch/x86/kvm/x86.c:8930
  kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:752 [inline]
  kvm_put_kvm+0x73f/0x1060 arch/x86/kvm/../../../virt/kvm/kvm_main.c:773
  kvm_vm_release+0x42/0x50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:784
  __fput+0x376/0x8a0 fs/file_table.c:279
  ____fput+0x15/0x20 fs/file_table.c:312
  task_work_run+0x1e8/0x2a0 kernel/task_work.c:113
  exit_task_work include/linux/task_work.h:22 [inline]
  do_exit+0x1b25/0x2760 kernel/exit.c:869
  do_group_exit+0x177/0x440 kernel/exit.c:972
  __do_sys_exit_group kernel/exit.c:983 [inline]
  __se_sys_exit_group kernel/exit.c:981 [inline]
  __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:981
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x43ed58
Code: Bad RIP value.
RSP: 002b:00007ffcc3a21518 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000043ed58
RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
RBP: 00000000004be608 R08: 00000000000000e7 R09: ffffffffffffffd0
R10: 00000000004002c8 R11: 0000000000000246 R12: 0000000000000001
R13: 00000000006d0180 R14: 0000000000000000 R15: 0000000000000000
Modules linked in:
Dumping ftrace buffer:
    (ftrace buffer empty)
---[ end trace 23164517240acd85 ]---
RIP: 0010:__fire_sched_out_preempt_notifiers kernel/sched/core.c:2497  
[inline]
RIP: 0010:fire_sched_out_preempt_notifiers kernel/sched/core.c:2505 [inline]
RIP: 0010:prepare_task_switch kernel/sched/core.c:2611 [inline]
RIP: 0010:context_switch kernel/sched/core.c:2788 [inline]
RIP: 0010:__schedule+0x1061/0x1ec0 kernel/sched/core.c:3471
Code: 4c 89 e8 48 c1 e8 03 42 80 3c 30 00 0f 85 ea 08 00 00 4d 8b 6d 00 4d  
85 ed 0f 84 6b f6 ff ff 49 8d 7d 10 48 89 f8 48 c1 e8 03 <42> 80 3c 30 00  
74 a6 e8 93 26 04 fb eb 9f 4c 89 e6 48 89 df e8 46
RSP: 0018:ffff8801ac05ea80 EFLAGS: 00010806
RAX: 1bd5a00000000022 RBX: ffff8801af7f83c0 RCX: 1ffff1003731888f
RDX: 0000000040000000 RSI: 0000000000000000 RDI: dead000000000110
RBP: ffff8801ac05ec50 R08: ffff8801af7f83c0 R09: fffff520003c4a47
R10: fffff520003c4a47 R11: ffffc90001e2523b R12: ffff8801d9ee8280
R13: dead000000000100 R14: dffffc0000000000 R15: ffff8801db12ca40
FS:  00000000023af880(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000043ed2e CR3: 0000000007e6a000 CR4: 00000000001426e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: general protection fault in __schedule (2)
  2018-08-10 13:39 general protection fault in __schedule (2) syzbot
@ 2019-11-22  7:19 ` syzbot
  2019-11-22 20:54   ` Sean Christopherson
  0 siblings, 1 reply; 7+ messages in thread
From: syzbot @ 2019-11-22  7:19 UTC (permalink / raw)
  To: casey, frederic, gregkh, hpa, jmattson, jmorris, karahmed,
	kstewart, kvm, linux-kernel, linux-security-module, mingo, mingo,
	pasha.tatashin, pbonzini, pombredanne, rkrcmar, serge,
	syzkaller-bugs, tglx, x86

syzbot has bisected this bug to:

commit 8fcc4b5923af5de58b80b53a069453b135693304
Author: Jim Mattson <jmattson@google.com>
Date:   Tue Jul 10 09:27:20 2018 +0000

     kvm: nVMX: Introduce KVM_CAP_NESTED_STATE

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=124cdbace00000
start commit:   234b69e3 ocfs2: fix ocfs2 read block panic
git tree:       upstream
final crash:    https://syzkaller.appspot.com/x/report.txt?x=114cdbace00000
console output: https://syzkaller.appspot.com/x/log.txt?x=164cdbace00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=5fa12be50bca08d8
dashboard link: https://syzkaller.appspot.com/bug?extid=7e2ab84953e4084a638d
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=150f0a4e400000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17f67111400000

Reported-by: syzbot+7e2ab84953e4084a638d@syzkaller.appspotmail.com
Fixes: 8fcc4b5923af ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: general protection fault in __schedule (2)
  2019-11-22  7:19 ` syzbot
@ 2019-11-22 20:54   ` Sean Christopherson
  2019-11-23  5:15     ` Dmitry Vyukov
  0 siblings, 1 reply; 7+ messages in thread
From: Sean Christopherson @ 2019-11-22 20:54 UTC (permalink / raw)
  To: syzbot
  Cc: casey, frederic, gregkh, hpa, jmattson, jmorris, karahmed,
	kstewart, kvm, linux-kernel, linux-security-module, mingo, mingo,
	pasha.tatashin, pbonzini, pombredanne, rkrcmar, serge,
	syzkaller-bugs, tglx, x86

On Thu, Nov 21, 2019 at 11:19:00PM -0800, syzbot wrote:
> syzbot has bisected this bug to:
> 
> commit 8fcc4b5923af5de58b80b53a069453b135693304
> Author: Jim Mattson <jmattson@google.com>
> Date:   Tue Jul 10 09:27:20 2018 +0000
> 
>     kvm: nVMX: Introduce KVM_CAP_NESTED_STATE
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=124cdbace00000
> start commit:   234b69e3 ocfs2: fix ocfs2 read block panic
> git tree:       upstream
> final crash:    https://syzkaller.appspot.com/x/report.txt?x=114cdbace00000
> console output: https://syzkaller.appspot.com/x/log.txt?x=164cdbace00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=5fa12be50bca08d8
> dashboard link: https://syzkaller.appspot.com/bug?extid=7e2ab84953e4084a638d
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=150f0a4e400000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17f67111400000
> 
> Reported-by: syzbot+7e2ab84953e4084a638d@syzkaller.appspotmail.com
> Fixes: 8fcc4b5923af ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")
> 
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection

Is there a way to have syzbot stop processing/bisecting these things
after a reasonable amount of time?  The original crash is from August of
last year...

Note, the original crash is actually due to KVM's put_kvm() fd race, but
whatever we want to blame, it's a duplicate.

#syz dup: general protection fault in kvm_lapic_hv_timer_in_use

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: general protection fault in __schedule (2)
  2019-11-22 20:54   ` Sean Christopherson
@ 2019-11-23  5:15     ` Dmitry Vyukov
  2019-11-25 17:54       ` Sean Christopherson
  0 siblings, 1 reply; 7+ messages in thread
From: Dmitry Vyukov @ 2019-11-23  5:15 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: syzbot, Casey Schaufler, Frederic Weisbecker, Greg Kroah-Hartman,
	H. Peter Anvin, Jim Mattson, James Morris, Raslan, KarimAllah,
	Kate Stewart, KVM list, LKML, linux-security-module, Ingo Molnar,
	Ingo Molnar, Pavel Tatashin, Paolo Bonzini, Philippe Ombredanne,
	Radim Krčmář,
	Serge E. Hallyn, syzkaller-bugs, Thomas Gleixner,
	the arch/x86 maintainers

On Fri, Nov 22, 2019 at 9:54 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Thu, Nov 21, 2019 at 11:19:00PM -0800, syzbot wrote:
> > syzbot has bisected this bug to:
> >
> > commit 8fcc4b5923af5de58b80b53a069453b135693304
> > Author: Jim Mattson <jmattson@google.com>
> > Date:   Tue Jul 10 09:27:20 2018 +0000
> >
> >     kvm: nVMX: Introduce KVM_CAP_NESTED_STATE
> >
> > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=124cdbace00000
> > start commit:   234b69e3 ocfs2: fix ocfs2 read block panic
> > git tree:       upstream
> > final crash:    https://syzkaller.appspot.com/x/report.txt?x=114cdbace00000
> > console output: https://syzkaller.appspot.com/x/log.txt?x=164cdbace00000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=5fa12be50bca08d8
> > dashboard link: https://syzkaller.appspot.com/bug?extid=7e2ab84953e4084a638d
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=150f0a4e400000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17f67111400000
> >
> > Reported-by: syzbot+7e2ab84953e4084a638d@syzkaller.appspotmail.com
> > Fixes: 8fcc4b5923af ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")
> >
> > For information about bisection process see: https://goo.gl/tpsmEJ#bisection
>
> Is there a way to have syzbot stop processing/bisecting these things
> after a reasonable amount of time?  The original crash is from August of
> last year...
>
> Note, the original crash is actually due to KVM's put_kvm() fd race, but
> whatever we want to blame, it's a duplicate.
>
> #syz dup: general protection fault in kvm_lapic_hv_timer_in_use

Hi Sean,

syzbot only sends bisection results to open bugs with no known fixes.
So what you did (marking the bug as invalid/dup, or attaching a fix)
would stop it from doing/sending bisection.

"Original crash happened a long time ago" is not necessary a good
signal. On the syzbot dashboard
(https://syzkaller.appspot.com/upstream), you can see bugs with the
original crash 2+ years ago, but they are still pretty much relevant.
The default kernel development process strategy for invalidating bug
reports by burying them in oblivion has advantages, but also
downsides. FWIW syzbot prefers explicit status tracking.

Besides implications on the mainline development, consider the
following. We regularly discover the same bugs (missed backports) on
LTS kernels:
https://syzkaller.appspot.com/linux-4.14
https://syzkaller.appspot.com/linux-4.19
The dashboard also shows similar crash signatures in other tested
kernels. So say you see a crash in your product kernel, and you notice
that a similar crash happened on mainline some time ago, but
presumably it was fixed, but then you look at the bug report thread
and there is no info whatsoever as to what happened.
Now this bug report:
https://syzkaller.appspot.com/bug?extid=7e2ab84953e4084a638d
is linked to "general protection fault in kvm_lapic_hv_timer_in_use":
https://syzkaller.appspot.com/bug?id=0c330c4e475223a40d95f1d94c761357dd0f011f
which has a recorded fix "KVM: nVMX: Fix bad cleanup on error of
get/set nested state IOCTLs":
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=26b471c7e2f7befd0f59c35b257749ca57e0ed70

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: general protection fault in __schedule (2)
  2019-11-23  5:15     ` Dmitry Vyukov
@ 2019-11-25 17:54       ` Sean Christopherson
  2019-11-28  9:53         ` Dmitry Vyukov
  0 siblings, 1 reply; 7+ messages in thread
From: Sean Christopherson @ 2019-11-25 17:54 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzbot, Casey Schaufler, Frederic Weisbecker, Greg Kroah-Hartman,
	H. Peter Anvin, Jim Mattson, James Morris, Raslan, KarimAllah,
	Kate Stewart, KVM list, LKML, linux-security-module, Ingo Molnar,
	Ingo Molnar, Pavel Tatashin, Paolo Bonzini, Philippe Ombredanne,
	Radim Krčmář,
	Serge E. Hallyn, syzkaller-bugs, Thomas Gleixner,
	the arch/x86 maintainers

On Sat, Nov 23, 2019 at 06:15:15AM +0100, Dmitry Vyukov wrote:
> On Fri, Nov 22, 2019 at 9:54 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Thu, Nov 21, 2019 at 11:19:00PM -0800, syzbot wrote:
> > > syzbot has bisected this bug to:
> > >
> > > commit 8fcc4b5923af5de58b80b53a069453b135693304
> > > Author: Jim Mattson <jmattson@google.com>
> > > Date:   Tue Jul 10 09:27:20 2018 +0000
> > >
> > >     kvm: nVMX: Introduce KVM_CAP_NESTED_STATE
> > >
> > > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=124cdbace00000
> > > start commit:   234b69e3 ocfs2: fix ocfs2 read block panic
> > > git tree:       upstream
> > > final crash:    https://syzkaller.appspot.com/x/report.txt?x=114cdbace00000
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=164cdbace00000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=5fa12be50bca08d8
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=7e2ab84953e4084a638d
> > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=150f0a4e400000
> > > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17f67111400000
> > >
> > > Reported-by: syzbot+7e2ab84953e4084a638d@syzkaller.appspotmail.com
> > > Fixes: 8fcc4b5923af ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")
> > >
> > > For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> >
> > Is there a way to have syzbot stop processing/bisecting these things
> > after a reasonable amount of time?  The original crash is from August of
> > last year...
> >
> > Note, the original crash is actually due to KVM's put_kvm() fd race, but
> > whatever we want to blame, it's a duplicate.
> >
> > #syz dup: general protection fault in kvm_lapic_hv_timer_in_use
> 
> Hi Sean,
> 
> syzbot only sends bisection results to open bugs with no known fixes.
> So what you did (marking the bug as invalid/dup, or attaching a fix)
> would stop it from doing/sending bisection.
> 
> "Original crash happened a long time ago" is not necessary a good
> signal. On the syzbot dashboard
> (https://syzkaller.appspot.com/upstream), you can see bugs with the
> original crash 2+ years ago, but they are still pretty much relevant.
> The default kernel development process strategy for invalidating bug
> reports by burying them in oblivion has advantages, but also
> downsides. FWIW syzbot prefers explicit status tracking.

I have no objection to explicit status tracking or getting pinged on old
open bugs.  I suppose I don't even mind the belated bisection, I'd probably
whine if syzbot didn't do the bisection :-).

What's annoying is the report doesn't provide any information about when it
originally occured or on what kernel it originally failed.  It didn't occur
to me that the original bug might be a year old and I only realized it was
from an old kernel when I saw "4.19.0-rc4+" in the dashboard's sample crash
log.  Knowing that the original crash was a year old would have saved me
5-10 minutes of getting myself oriented.

Could syzbot provide the date and reported kernel version (assuming the
kernel version won't be misleading) of the original failure in its reports?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: general protection fault in __schedule (2)
  2019-11-25 17:54       ` Sean Christopherson
@ 2019-11-28  9:53         ` Dmitry Vyukov
  2019-12-02 16:56           ` Sean Christopherson
  0 siblings, 1 reply; 7+ messages in thread
From: Dmitry Vyukov @ 2019-11-28  9:53 UTC (permalink / raw)
  To: Sean Christopherson, syzkaller
  Cc: syzbot, Casey Schaufler, Frederic Weisbecker, Greg Kroah-Hartman,
	H. Peter Anvin, Jim Mattson, James Morris, Raslan, KarimAllah,
	Kate Stewart, KVM list, LKML, linux-security-module, Ingo Molnar,
	Ingo Molnar, Pavel Tatashin, Paolo Bonzini, Philippe Ombredanne,
	Radim Krčmář,
	Serge E. Hallyn, syzkaller-bugs, Thomas Gleixner,
	the arch/x86 maintainers

On Mon, Nov 25, 2019 at 6:54 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Sat, Nov 23, 2019 at 06:15:15AM +0100, Dmitry Vyukov wrote:
> > On Fri, Nov 22, 2019 at 9:54 PM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > > On Thu, Nov 21, 2019 at 11:19:00PM -0800, syzbot wrote:
> > > > syzbot has bisected this bug to:
> > > >
> > > > commit 8fcc4b5923af5de58b80b53a069453b135693304
> > > > Author: Jim Mattson <jmattson@google.com>
> > > > Date:   Tue Jul 10 09:27:20 2018 +0000
> > > >
> > > >     kvm: nVMX: Introduce KVM_CAP_NESTED_STATE
> > > >
> > > > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=124cdbace00000
> > > > start commit:   234b69e3 ocfs2: fix ocfs2 read block panic
> > > > git tree:       upstream
> > > > final crash:    https://syzkaller.appspot.com/x/report.txt?x=114cdbace00000
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=164cdbace00000
> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=5fa12be50bca08d8
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=7e2ab84953e4084a638d
> > > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=150f0a4e400000
> > > > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17f67111400000
> > > >
> > > > Reported-by: syzbot+7e2ab84953e4084a638d@syzkaller.appspotmail.com
> > > > Fixes: 8fcc4b5923af ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")
> > > >
> > > > For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> > >
> > > Is there a way to have syzbot stop processing/bisecting these things
> > > after a reasonable amount of time?  The original crash is from August of
> > > last year...
> > >
> > > Note, the original crash is actually due to KVM's put_kvm() fd race, but
> > > whatever we want to blame, it's a duplicate.
> > >
> > > #syz dup: general protection fault in kvm_lapic_hv_timer_in_use
> >
> > Hi Sean,
> >
> > syzbot only sends bisection results to open bugs with no known fixes.
> > So what you did (marking the bug as invalid/dup, or attaching a fix)
> > would stop it from doing/sending bisection.
> >
> > "Original crash happened a long time ago" is not necessary a good
> > signal. On the syzbot dashboard
> > (https://syzkaller.appspot.com/upstream), you can see bugs with the
> > original crash 2+ years ago, but they are still pretty much relevant.
> > The default kernel development process strategy for invalidating bug
> > reports by burying them in oblivion has advantages, but also
> > downsides. FWIW syzbot prefers explicit status tracking.
>
> I have no objection to explicit status tracking or getting pinged on old
> open bugs.  I suppose I don't even mind the belated bisection, I'd probably
> whine if syzbot didn't do the bisection :-).
>
> What's annoying is the report doesn't provide any information about when it
> originally occured or on what kernel it originally failed.  It didn't occur
> to me that the original bug might be a year old and I only realized it was
> from an old kernel when I saw "4.19.0-rc4+" in the dashboard's sample crash
> log.  Knowing that the original crash was a year old would have saved me
> 5-10 minutes of getting myself oriented.
>
> Could syzbot provide the date and reported kernel version (assuming the
> kernel version won't be misleading) of the original failure in its reports?

+syzkaller mailing list for syzbot discussion

We tried to provide some aggregate info in email reports long time ago
(like trees where it occurred, number of crashes). The problem was
that any such info captured in emails become stale very quickly. E.g.
later somebody looks at the report and thinking "oh, linux-next only"
or "it happened only once", but maybe it's not for a long time. E.g.
if we say "it last happened 3 months" ago, maybe it's just happened
again once we send it... While this "emails always provide latest
updates" works for kernel in other context b/c updates provided by
humans and there is no other source of truth; it does not play well
with automated systems, or syzbot will need to send several emails per
second, because it's really the rate at which things change.

If we add some info, which one should it be? The original crash, the
one used for bisection, or the latest one? All these are different...
syzbot does not know "4.19.0-rc4+" strings for commits, it generally
identifies commits by hashes. There are dates, but then again which
one? Author or commit? Author is what generally shown, but I remember
a number of patches where Author date is 1.5 years old for just merged
commits :)

There is another problem: if we stuff too many info into emails,
people still stop reading them. This is very serious and real concern.
If you have 1000-page manual, it's well documented, but it's
equivalent to no docs at all, nobody is reading 1000 pages to find 1
bit of info. Especially if you don't know that there is an important
bit that you need to find in the first place...

What would be undoubtedly positive is presenting information on the
dashboard better (If we find a way).
Currently the page says near the top:

First crash: 478d, last: 430d

The idea was that "last: 430d" is supposed to communicate the bit of
info that confused you. Is it what you were looking for? Is there a
better way to present it?

Unfortunately most of such problems are much harder if extended beyond
1 concrete case...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: general protection fault in __schedule (2)
  2019-11-28  9:53         ` Dmitry Vyukov
@ 2019-12-02 16:56           ` Sean Christopherson
  0 siblings, 0 replies; 7+ messages in thread
From: Sean Christopherson @ 2019-12-02 16:56 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzkaller, syzbot, Casey Schaufler, Frederic Weisbecker,
	Greg Kroah-Hartman, H. Peter Anvin, Jim Mattson, James Morris,
	Raslan, KarimAllah, Kate Stewart, KVM list, LKML,
	linux-security-module, Ingo Molnar, Ingo Molnar, Pavel Tatashin,
	Paolo Bonzini, Philippe Ombredanne, Radim Krčmář,
	Serge E. Hallyn, syzkaller-bugs, Thomas Gleixner,
	the arch/x86 maintainers

On Thu, Nov 28, 2019 at 10:53:10AM +0100, Dmitry Vyukov wrote:
> On Mon, Nov 25, 2019 at 6:54 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> > I have no objection to explicit status tracking or getting pinged on old
> > open bugs.  I suppose I don't even mind the belated bisection, I'd probably
> > whine if syzbot didn't do the bisection :-).
> >
> > What's annoying is the report doesn't provide any information about when it
> > originally occured or on what kernel it originally failed.  It didn't occur
> > to me that the original bug might be a year old and I only realized it was
> > from an old kernel when I saw "4.19.0-rc4+" in the dashboard's sample crash
> > log.  Knowing that the original crash was a year old would have saved me
> > 5-10 minutes of getting myself oriented.
> >
> > Could syzbot provide the date and reported kernel version (assuming the
> > kernel version won't be misleading) of the original failure in its reports?
> 
> +syzkaller mailing list for syzbot discussion
> 
> We tried to provide some aggregate info in email reports long time ago
> (like trees where it occurred, number of crashes). The problem was
> that any such info captured in emails become stale very quickly. E.g.
> later somebody looks at the report and thinking "oh, linux-next only"
> or "it happened only once", but maybe it's not for a long time. E.g.
> if we say "it last happened 3 months" ago, maybe it's just happened
> again once we send it... While this "emails always provide latest
> updates" works for kernel in other context b/c updates provided by
> humans and there is no other source of truth; it does not play well
> with automated systems, or syzbot will need to send several emails per
> second, because it's really the rate at which things change.
> 
> If we add some info, which one should it be? The original crash, the
> one used for bisection, or the latest one? All these are different...
> syzbot does not know "4.19.0-rc4+" strings for commits, it generally
> identifies commits by hashes. There are dates, but then again which
> one? Author or commit? Author is what generally shown, but I remember
> a number of patches where Author date is 1.5 years old for just merged
> commits :)
> 
> There is another problem: if we stuff too many info into emails,
> people still stop reading them. This is very serious and real concern.
> If you have 1000-page manual, it's well documented, but it's
> equivalent to no docs at all, nobody is reading 1000 pages to find 1
> bit of info. Especially if you don't know that there is an important
> bit that you need to find in the first place...
> 
> What would be undoubtedly positive is presenting information on the
> dashboard better (If we find a way).
> Currently the page says near the top:
> 
> First crash: 478d, last: 430d
> 
> The idea was that "last: 430d" is supposed to communicate the bit of
> info that confused you. Is it what you were looking for? Is there a
> better way to present it?

Ah, yes, that's what I was looking for.  Tweaking the presention of the
dashboard and/or email reports, e.g. to encourage readers to go to the
dashboard in the first place, would definitely help.  A few ideas:

  - Display the first/last crash dates in yyyy/mm/dd format rather than
    showing the number of days since failure.  I didn't even realize 478d
    and 430d were relative dates until your email, though that's probably
    more my failing than syzbot's :-)

  - On the dashboard page, separate the basic crash info from the bisection
    details, e.g. display the basic crash info using the same table format
    as "Duplicate of" and "similar bugs", and/or move the bisection details
    below the aforementioned tables.  The basic info stands out fairly well
    when there aren't bisection details, but for bugs with bisection info
    the combined info becomes a wall of text that my eyes tend to skip over.

  - Don't rely on the recipients of bisection reports having the original
    crash report, e.g. use the dashboard link to reference the crash and
    always display it at the top, maybe isolated via whitespace.  The other
    auto-generated reports could use a similar format to teach folks that
    the dashboard link is the canonical reference.

    For example, on bisection show:

      syzbot has bisected crash:

        https://syzkaller.appspot.com/bug?extid=7e2ab84953e4084a638d

      first bad commit:
 
        commit 8fcc4b5923af5de58b80b53a069453b135693304
        Author: Jim Mattson <jmattson@google.com>
        Date:   Tue Jul 10 09:27:20 2018 +0000
      
             kvm: nVMX: Introduce KVM_CAP_NESTED_STATE

      bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=124cdbace00000
      start commit:   234b69e3 ocfs2: fix ocfs2 read block panic
      git tree:       upstream
      final crash:    https://syzkaller.appspot.com/x/report.txt?x=114cdbace00000
      console output: https://syzkaller.appspot.com/x/log.txt?x=164cdbace00000
      kernel config:  https://syzkaller.appspot.com/x/.config?x=5fa12be50bca08d8
      syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=150f0a4e400000
      C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17f67111400000

    vs.

      syzbot has bisected this bug to:

      commit 8fcc4b5923af5de58b80b53a069453b135693304
      Author: Jim Mattson <jmattson@google.com>
      Date:   Tue Jul 10 09:27:20 2018 +0000

          kvm: nVMX: Introduce KVM_CAP_NESTED_STATE

      bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=124cdbace00000
      start commit:   234b69e3 ocfs2: fix ocfs2 read block panic
      git tree:       upstream
      final crash:    https://syzkaller.appspot.com/x/report.txt?x=114cdbace00000
      console output: https://syzkaller.appspot.com/x/log.txt?x=164cdbace00000
      kernel config:  https://syzkaller.appspot.com/x/.config?x=5fa12be50bca08d8
      dashboard link: https://syzkaller.appspot.com/bug?extid=7e2ab84953e4084a638d
      syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=150f0a4e400000
      C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17f67111400000


  And a similar format for the initial crash report:

      syzbot found the following crash:

        https://syzkaller.appspot.com/bug?extid=00be5da1d75f1cc95f6b

      HEAD commit:    ad062195 Merge tag 'platform-drivers-x86-v5.4-1' of git://..
      git tree:       upstream
      console output: https://syzkaller.appspot.com/x/log.txt?x=154910ad600000
      kernel config:  https://syzkaller.appspot.com/x/.config?x=f9fc16a6374d5fd0
      compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

      Unfortunately, I don't have any reproducer for this crash yet.

  vs.

      syzbot found the following crash on:

      HEAD commit:    ad062195 Merge tag 'platform-drivers-x86-v5.4-1' of git://..
      git tree:       upstream
      console output: https://syzkaller.appspot.com/x/log.txt?x=154910ad600000
      kernel config:  https://syzkaller.appspot.com/x/.config?x=f9fc16a6374d5fd0
      dashboard link: https://syzkaller.appspot.com/bug?extid=00be5da1d75f1cc95f6b
      compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

      Unfortunately, I don't have any reproducer for this crash yet.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-12-02 16:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-10 13:39 general protection fault in __schedule (2) syzbot
2019-11-22  7:19 ` syzbot
2019-11-22 20:54   ` Sean Christopherson
2019-11-23  5:15     ` Dmitry Vyukov
2019-11-25 17:54       ` Sean Christopherson
2019-11-28  9:53         ` Dmitry Vyukov
2019-12-02 16:56           ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).