* general protection fault in perf_misc_flags @ 2020-09-19 8:32 syzbot 2020-09-19 11:08 ` Borislav Petkov 2020-09-27 14:57 ` Borislav Petkov 0 siblings, 2 replies; 41+ messages in thread From: syzbot @ 2020-09-19 8:32 UTC (permalink / raw) To: acme, alexander.shishkin, bp, hpa, jolsa, linux-kernel, mark.rutland, mingo, namhyung, peterz, syzkaller-bugs, tglx, x86 Hello, syzbot found the following issue on: HEAD commit: 92ab97ad Merge tag 'sh-for-5.9-part2' of git://git.libc.or.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=1069669b900000 kernel config: https://syzkaller.appspot.com/x/.config?x=cd992d74d6c7e62 dashboard link: https://syzkaller.appspot.com/bug?extid=ce179bc99e64377c24bc compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81) Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+ce179bc99e64377c24bc@syzkaller.appspotmail.com general protection fault, probably for non-canonical address 0xffff518084501e28: 0000 [#1] PREEMPT SMP KASAN KASAN: maybe wild-memory-access in range [0xfffaac042280f140-0xfffaac042280f147] CPU: 0 PID: 17449 Comm: syz-executor.5 Not tainted 5.9.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:perf_misc_flags+0x125/0x150 arch/x86/events/core.c:2638 Code: e4 48 83 e6 03 41 0f 94 c4 31 ff e8 95 fa 73 00 bb 02 00 00 00 4c 29 e3 49 81 c6 90 00 00 00 4c 89 f0 48 c1 e8 00 00 00 00 38 <00> 74 08 4c 89 f7 e8 40 c0 b3 00 41 8b 06 83 e0 08 48 c1 e0 0b 48 RSP: 0018:ffffc90000007a08 EFLAGS: 00010002 RAX: ffffc90017db7aa8 RBX: 0000000000000001 RCX: ffff88806c74a380 RDX: ffff88806c74a380 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffc90000007ac0 R08: ffffffff810117eb R09: fffffbfff167da99 R10: fffffbfff167da99 R11: 0000000000000000 R12: 0000000000000001 R13: dffffc0000000000 R14: ffffc90017db7aa8 R15: dffffc0000000000 FS: 00007f1165303700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004d8750 CR3: 0000000095a35000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> perf_prepare_sample+0xea/0x19f0 kernel/events/core.c:7001 __perf_event_output kernel/events/core.c:7170 [inline] perf_event_output_forward+0xa7/0x1c0 kernel/events/core.c:7190 __perf_event_overflow+0x1b9/0x340 kernel/events/core.c:8845 perf_swevent_hrtimer+0x43c/0x4d0 kernel/events/core.c:10247 __run_hrtimer kernel/time/hrtimer.c:1524 [inline] __hrtimer_run_queues+0x42d/0x930 kernel/time/hrtimer.c:1588 hrtimer_interrupt+0x373/0xd60 kernel/time/hrtimer.c:1650 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1080 [inline] __sysvec_apic_timer_interrupt+0xf0/0x260 arch/x86/kernel/apic/apic.c:1097 asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706 </IRQ> __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline] sysvec_apic_timer_interrupt+0x94/0xf0 arch/x86/kernel/apic/apic.c:1091 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:581 RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:770 [inline] RIP: 0010:lock_acquire+0x195/0x6f0 kernel/locking/lockdep.c:5009 Code: c1 e8 03 80 3c 18 00 74 0c 48 c7 c7 b8 17 4d 89 e8 20 df 5a 00 48 83 3d 80 1e f3 07 00 0f 84 fc 04 00 00 48 8b 7c 24 08 57 9d <0f> 1f 44 00 00 65 48 8b 04 25 28 00 00 00 48 3b 44 24 48 0f 85 d3 RSP: 0018:ffffc90017db7ac8 EFLAGS: 0000022a RAX: 1ffffffff129a2f7 RBX: dffffc0000000000 RCX: ffffffff815adc44 RDX: dffffc0000000000 RSI: 0000000000000008 RDI: 0000000000000282 RBP: 0000000000000001 R08: dffffc0000000000 R09: fffffbfff167da9c R10: fffffbfff167da9c R11: 0000000000000000 R12: ffff88806c74ac64 R13: 0000000000000000 R14: ffff888094349228 R15: 1ffff1100d8e958c __might_fault+0xf1/0x150 mm/memory.c:4870 _copy_from_user+0x28/0x170 lib/usercopy.c:12 copy_from_user include/linux/uaccess.h:160 [inline] __copy_msghdr_from_user+0x45/0x700 net/socket.c:2235 copy_msghdr_from_user net/socket.c:2286 [inline] sendmsg_copy_msghdr net/socket.c:2384 [inline] ___sys_sendmsg net/socket.c:2403 [inline] __sys_sendmmsg+0x2a1/0x680 net/socket.c:2497 __do_sys_sendmmsg net/socket.c:2526 [inline] __se_sys_sendmmsg net/socket.c:2523 [inline] __x64_sys_sendmmsg+0x9c/0xb0 net/socket.c:2523 do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45d5f9 Code: 5d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f1165302c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 0000000000027c00 RCX: 000000000045d5f9 RDX: 04924924924925c6 RSI: 0000000020000680 RDI: 0000000000000005 RBP: 000000000118cf88 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118cf4c R13: 00007fffb49ba86f R14: 00007f11653039c0 R15: 000000000118cf4c Modules linked in: ---[ end trace 2cb388c0ff8c4c87 ]--- RIP: 0010:perf_misc_flags+0x125/0x150 arch/x86/events/core.c:2638 Code: e4 48 83 e6 03 41 0f 94 c4 31 ff e8 95 fa 73 00 bb 02 00 00 00 4c 29 e3 49 81 c6 90 00 00 00 4c 89 f0 48 c1 e8 00 00 00 00 38 <00> 74 08 4c 89 f7 e8 40 c0 b3 00 41 8b 06 83 e0 08 48 c1 e0 0b 48 RSP: 0018:ffffc90000007a08 EFLAGS: 00010002 RAX: ffffc90017db7aa8 RBX: 0000000000000001 RCX: ffff88806c74a380 RDX: ffff88806c74a380 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffc90000007ac0 R08: ffffffff810117eb R09: fffffbfff167da99 R10: fffffbfff167da99 R11: 0000000000000000 R12: 0000000000000001 R13: dffffc0000000000 R14: ffffc90017db7aa8 R15: dffffc0000000000 FS: 00007f1165303700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004d8750 CR3: 0000000095a35000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-19 8:32 general protection fault in perf_misc_flags syzbot @ 2020-09-19 11:08 ` Borislav Petkov 2020-09-21 5:54 ` Dmitry Vyukov 2020-09-27 14:57 ` Borislav Petkov 1 sibling, 1 reply; 41+ messages in thread From: Borislav Petkov @ 2020-09-19 11:08 UTC (permalink / raw) To: syzbot Cc: acme, alexander.shishkin, hpa, jolsa, linux-kernel, mark.rutland, mingo, namhyung, peterz, syzkaller-bugs, tglx, x86 On Sat, Sep 19, 2020 at 01:32:14AM -0700, syzbot wrote: > Hello, > > syzbot found the following issue on: > > HEAD commit: 92ab97ad Merge tag 'sh-for-5.9-part2' of git://git.libc.or.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=1069669b900000 > kernel config: https://syzkaller.appspot.com/x/.config?x=cd992d74d6c7e62 > dashboard link: https://syzkaller.appspot.com/bug?extid=ce179bc99e64377c24bc > compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81) > > Unfortunately, I don't have any reproducer for this issue yet. > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+ce179bc99e64377c24bc@syzkaller.appspotmail.com > > general protection fault, probably for non-canonical address 0xffff518084501e28: 0000 [#1] PREEMPT SMP KASAN > KASAN: maybe wild-memory-access in range [0xfffaac042280f140-0xfffaac042280f147] > CPU: 0 PID: 17449 Comm: syz-executor.5 Not tainted 5.9.0-rc5-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > RIP: 0010:perf_misc_flags+0x125/0x150 arch/x86/events/core.c:2638 > Code: e4 48 83 e6 03 41 0f 94 c4 31 ff e8 95 fa 73 00 bb 02 00 00 00 4c 29 e3 49 81 c6 90 00 00 00 4c 89 f0 48 c1 e8 00 00 00 00 38 <00> 74 08 4c 89 f7 e8 40 c0 b3 00 41 8b 06 83 e0 08 48 c1 e0 0b 48 Hmm, so converting this back to opcodes with decodecode gives: Code: e4 48 83 e6 03 41 0f 94 c4 31 ff e8 95 fa 73 00 bb 02 00 00 00 4c 29 e3 49 81 c6 90 00 00 00 4c 89 f0 48 c1 e8 00 00 00 00 38 <00> 74 08 4c 89 f7 e8 40 c0 b3 00 41 8b 06 83 e0 08 48 c1 e0 0b 48 All code ======== 0: e4 48 in $0x48,%al 2: 83 e6 03 and $0x3,%esi 5: 41 0f 94 c4 sete %r12b 9: 31 ff xor %edi,%edi b: e8 95 fa 73 00 callq 0x73faa5 10: bb 02 00 00 00 mov $0x2,%ebx 15: 4c 29 e3 sub %r12,%rbx 18: 49 81 c6 90 00 00 00 add $0x90,%r14 1f: 4c 89 f0 mov %r14,%rax 22: 48 c1 e8 00 shr $0x0,%rax 26: 00 00 add %al,(%rax) 28: 00 38 add %bh,(%rax) 2a:* 00 74 08 4c add %dh,0x4c(%rax,%rcx,1) <-- trapping instruction 2e: 89 f7 mov %esi,%edi 30: e8 40 c0 b3 00 callq 0xb3c075 35: 41 8b 06 mov (%r14),%eax 38: 83 e0 08 and $0x8,%eax 3b: 48 c1 e0 0b shl $0xb,%rax 3f: 48 rex.W and those ADDs before the rIP look real strange. Just as if something wrote 4 bytes of 0s there. And building your config with clang-10 gives around that area: ffffffff8101177c: 48 83 e6 03 and $0x3,%rsi ffffffff81011780: 41 0f 94 c4 sete %r12b ffffffff81011784: 31 ff xor %edi,%edi ffffffff81011786: e8 05 c9 73 00 callq ffffffff8174e090 <__sanitizer_cov_trace_const_cmp8> ffffffff8101178b: bb 02 00 00 00 mov $0x2,%ebx ffffffff81011790: 4c 29 e3 sub %r12,%rbx ffffffff81011793: 49 81 c6 90 00 00 00 add $0x90,%r14 ffffffff8101179a: 4c 89 f0 mov %r14,%rax ffffffff8101179d: 48 c1 e8 03 shr $0x3,%rax ffffffff810117a1: 42 80 3c 38 00 cmpb $0x0,(%rax,%r15,1) ffffffff810117a6: 74 08 je ffffffff810117b0 <perf_misc_flags+0x130> ffffffff810117a8: 4c 89 f7 mov %r14,%rdi ffffffff810117ab: e8 20 75 b3 00 callq ffffffff81b48cd0 <__asan_report_load8_noabort> ffffffff810117b0: 41 8b 06 mov (%r14),%eax ffffffff810117b3: 83 e0 08 and $0x8,%eax ffffffff810117b6: 48 c1 e0 0b shl $0xb,%rax and I can pretty much follow it instruction by instruction until I reach that SHR. Your SHR is doing a shift by 0 bytes and that already looks suspicious. After it, your output has a bunch of suspicious ADDs and mine has a CMP; JE instead. And that looks really strange too. Could it be that something has scribbled in guest memory and corrupted that area, leading to that strange discrepancy in the opcodes? -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-19 11:08 ` Borislav Petkov @ 2020-09-21 5:54 ` Dmitry Vyukov 2020-09-21 8:08 ` Dmitry Vyukov 0 siblings, 1 reply; 41+ messages in thread From: Dmitry Vyukov @ 2020-09-21 5:54 UTC (permalink / raw) To: Borislav Petkov Cc: syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, jolsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Sat, Sep 19, 2020 at 1:08 PM Borislav Petkov <bp@alien8.de> wrote: > > On Sat, Sep 19, 2020 at 01:32:14AM -0700, syzbot wrote: > > Hello, > > > > syzbot found the following issue on: > > > > HEAD commit: 92ab97ad Merge tag 'sh-for-5.9-part2' of git://git.libc.or.. > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=1069669b900000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=cd992d74d6c7e62 > > dashboard link: https://syzkaller.appspot.com/bug?extid=ce179bc99e64377c24bc > > compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81) > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > Reported-by: syzbot+ce179bc99e64377c24bc@syzkaller.appspotmail.com > > > > general protection fault, probably for non-canonical address 0xffff518084501e28: 0000 [#1] PREEMPT SMP KASAN > > KASAN: maybe wild-memory-access in range [0xfffaac042280f140-0xfffaac042280f147] > > CPU: 0 PID: 17449 Comm: syz-executor.5 Not tainted 5.9.0-rc5-syzkaller #0 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > RIP: 0010:perf_misc_flags+0x125/0x150 arch/x86/events/core.c:2638 > > Code: e4 48 83 e6 03 41 0f 94 c4 31 ff e8 95 fa 73 00 bb 02 00 00 00 4c 29 e3 49 81 c6 90 00 00 00 4c 89 f0 48 c1 e8 00 00 00 00 38 <00> 74 08 4c 89 f7 e8 40 c0 b3 00 41 8b 06 83 e0 08 48 c1 e0 0b 48 > > Hmm, so converting this back to opcodes with decodecode gives: > > Code: e4 48 83 e6 03 41 0f 94 c4 31 ff e8 95 fa 73 00 bb 02 00 00 00 4c 29 e3 49 81 c6 90 00 00 00 4c 89 f0 48 c1 e8 00 00 00 00 38 <00> 74 08 4c 89 f7 e8 40 c0 b3 00 41 8b 06 83 e0 08 48 c1 e0 0b 48 > All code > ======== > 0: e4 48 in $0x48,%al > 2: 83 e6 03 and $0x3,%esi > 5: 41 0f 94 c4 sete %r12b > 9: 31 ff xor %edi,%edi > b: e8 95 fa 73 00 callq 0x73faa5 > 10: bb 02 00 00 00 mov $0x2,%ebx > 15: 4c 29 e3 sub %r12,%rbx > 18: 49 81 c6 90 00 00 00 add $0x90,%r14 > 1f: 4c 89 f0 mov %r14,%rax > 22: 48 c1 e8 00 shr $0x0,%rax > 26: 00 00 add %al,(%rax) > 28: 00 38 add %bh,(%rax) > 2a:* 00 74 08 4c add %dh,0x4c(%rax,%rcx,1) <-- trapping instruction > 2e: 89 f7 mov %esi,%edi > 30: e8 40 c0 b3 00 callq 0xb3c075 > 35: 41 8b 06 mov (%r14),%eax > 38: 83 e0 08 and $0x8,%eax > 3b: 48 c1 e0 0b shl $0xb,%rax > 3f: 48 rex.W > > and those ADDs before the rIP look real strange. Just as if something > wrote 4 bytes of 0s there. And building your config with clang-10 gives > around that area: > > ffffffff8101177c: 48 83 e6 03 and $0x3,%rsi > ffffffff81011780: 41 0f 94 c4 sete %r12b > ffffffff81011784: 31 ff xor %edi,%edi > ffffffff81011786: e8 05 c9 73 00 callq ffffffff8174e090 <__sanitizer_cov_trace_const_cmp8> > ffffffff8101178b: bb 02 00 00 00 mov $0x2,%ebx > ffffffff81011790: 4c 29 e3 sub %r12,%rbx > ffffffff81011793: 49 81 c6 90 00 00 00 add $0x90,%r14 > ffffffff8101179a: 4c 89 f0 mov %r14,%rax > ffffffff8101179d: 48 c1 e8 03 shr $0x3,%rax > ffffffff810117a1: 42 80 3c 38 00 cmpb $0x0,(%rax,%r15,1) > ffffffff810117a6: 74 08 je ffffffff810117b0 <perf_misc_flags+0x130> > ffffffff810117a8: 4c 89 f7 mov %r14,%rdi > ffffffff810117ab: e8 20 75 b3 00 callq ffffffff81b48cd0 <__asan_report_load8_noabort> > ffffffff810117b0: 41 8b 06 mov (%r14),%eax > ffffffff810117b3: 83 e0 08 and $0x8,%eax > ffffffff810117b6: 48 c1 e0 0b shl $0xb,%rax > > and I can pretty much follow it instruction by instruction until I reach > that SHR. Your SHR is doing a shift by 0 bytes and that already looks > suspicious. > > After it, your output has a bunch of suspicious ADDs and mine has a CMP; > JE instead. And that looks really strange too. > > Could it be that something has scribbled in guest memory and corrupted > that area, leading to that strange discrepancy in the opcodes? Hi Boris, Memory corruption is definitely possible. There are hundreds of known bugs that can potentially lead to silent memory corruptions, and some observed to lead to silent memory corruptions. However, these tend to produce crash signatures with 1-2 crashes. While this has 6 and they look similar and all happened on the only instance that uses clang. So my bet would be on something-clang-related rather than a silent memory corruption. +clang-built-linux ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-21 5:54 ` Dmitry Vyukov @ 2020-09-21 8:08 ` Dmitry Vyukov 2020-09-21 20:59 ` Nick Desaulniers 0 siblings, 1 reply; 41+ messages in thread From: Dmitry Vyukov @ 2020-09-21 8:08 UTC (permalink / raw) To: Borislav Petkov Cc: syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, jolsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Mon, Sep 21, 2020 at 7:54 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Sat, Sep 19, 2020 at 1:08 PM Borislav Petkov <bp@alien8.de> wrote: > > > > On Sat, Sep 19, 2020 at 01:32:14AM -0700, syzbot wrote: > > > Hello, > > > > > > syzbot found the following issue on: > > > > > > HEAD commit: 92ab97ad Merge tag 'sh-for-5.9-part2' of git://git.libc.or.. > > > git tree: upstream > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1069669b900000 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=cd992d74d6c7e62 > > > dashboard link: https://syzkaller.appspot.com/bug?extid=ce179bc99e64377c24bc > > > compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81) > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > Reported-by: syzbot+ce179bc99e64377c24bc@syzkaller.appspotmail.com > > > > > > general protection fault, probably for non-canonical address 0xffff518084501e28: 0000 [#1] PREEMPT SMP KASAN > > > KASAN: maybe wild-memory-access in range [0xfffaac042280f140-0xfffaac042280f147] > > > CPU: 0 PID: 17449 Comm: syz-executor.5 Not tainted 5.9.0-rc5-syzkaller #0 > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > > RIP: 0010:perf_misc_flags+0x125/0x150 arch/x86/events/core.c:2638 > > > Code: e4 48 83 e6 03 41 0f 94 c4 31 ff e8 95 fa 73 00 bb 02 00 00 00 4c 29 e3 49 81 c6 90 00 00 00 4c 89 f0 48 c1 e8 00 00 00 00 38 <00> 74 08 4c 89 f7 e8 40 c0 b3 00 41 8b 06 83 e0 08 48 c1 e0 0b 48 > > > > Hmm, so converting this back to opcodes with decodecode gives: > > > > Code: e4 48 83 e6 03 41 0f 94 c4 31 ff e8 95 fa 73 00 bb 02 00 00 00 4c 29 e3 49 81 c6 90 00 00 00 4c 89 f0 48 c1 e8 00 00 00 00 38 <00> 74 08 4c 89 f7 e8 40 c0 b3 00 41 8b 06 83 e0 08 48 c1 e0 0b 48 > > All code > > ======== > > 0: e4 48 in $0x48,%al > > 2: 83 e6 03 and $0x3,%esi > > 5: 41 0f 94 c4 sete %r12b > > 9: 31 ff xor %edi,%edi > > b: e8 95 fa 73 00 callq 0x73faa5 > > 10: bb 02 00 00 00 mov $0x2,%ebx > > 15: 4c 29 e3 sub %r12,%rbx > > 18: 49 81 c6 90 00 00 00 add $0x90,%r14 > > 1f: 4c 89 f0 mov %r14,%rax > > 22: 48 c1 e8 00 shr $0x0,%rax > > 26: 00 00 add %al,(%rax) > > 28: 00 38 add %bh,(%rax) > > 2a:* 00 74 08 4c add %dh,0x4c(%rax,%rcx,1) <-- trapping instruction > > 2e: 89 f7 mov %esi,%edi > > 30: e8 40 c0 b3 00 callq 0xb3c075 > > 35: 41 8b 06 mov (%r14),%eax > > 38: 83 e0 08 and $0x8,%eax > > 3b: 48 c1 e0 0b shl $0xb,%rax > > 3f: 48 rex.W > > > > and those ADDs before the rIP look real strange. Just as if something > > wrote 4 bytes of 0s there. And building your config with clang-10 gives > > around that area: > > > > ffffffff8101177c: 48 83 e6 03 and $0x3,%rsi > > ffffffff81011780: 41 0f 94 c4 sete %r12b > > ffffffff81011784: 31 ff xor %edi,%edi > > ffffffff81011786: e8 05 c9 73 00 callq ffffffff8174e090 <__sanitizer_cov_trace_const_cmp8> > > ffffffff8101178b: bb 02 00 00 00 mov $0x2,%ebx > > ffffffff81011790: 4c 29 e3 sub %r12,%rbx > > ffffffff81011793: 49 81 c6 90 00 00 00 add $0x90,%r14 > > ffffffff8101179a: 4c 89 f0 mov %r14,%rax > > ffffffff8101179d: 48 c1 e8 03 shr $0x3,%rax > > ffffffff810117a1: 42 80 3c 38 00 cmpb $0x0,(%rax,%r15,1) > > ffffffff810117a6: 74 08 je ffffffff810117b0 <perf_misc_flags+0x130> > > ffffffff810117a8: 4c 89 f7 mov %r14,%rdi > > ffffffff810117ab: e8 20 75 b3 00 callq ffffffff81b48cd0 <__asan_report_load8_noabort> > > ffffffff810117b0: 41 8b 06 mov (%r14),%eax > > ffffffff810117b3: 83 e0 08 and $0x8,%eax > > ffffffff810117b6: 48 c1 e0 0b shl $0xb,%rax > > > > and I can pretty much follow it instruction by instruction until I reach > > that SHR. Your SHR is doing a shift by 0 bytes and that already looks > > suspicious. > > > > After it, your output has a bunch of suspicious ADDs and mine has a CMP; > > JE instead. And that looks really strange too. > > > > Could it be that something has scribbled in guest memory and corrupted > > that area, leading to that strange discrepancy in the opcodes? > > Hi Boris, > > Memory corruption is definitely possible. There are hundreds of known > bugs that can potentially lead to silent memory corruptions, and some > observed to lead to silent memory corruptions. > > However, these tend to produce crash signatures with 1-2 crashes. > While this has 6 and they look similar and all happened on the only > instance that uses clang. So my bet would be on > something-clang-related rather than a silent memory corruption. > +clang-built-linux general protection fault in pvclock_gtod_notify (2) looks somewhat similar: - only clang - gpf in systems code - happened few times https://syzkaller.appspot.com/bug?extid=1dccfcb049726389379c https://groups.google.com/g/syzkaller-bugs/c/0eUUkjFKrBg/m/nGfTjIfCBAAJ ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-21 8:08 ` Dmitry Vyukov @ 2020-09-21 20:59 ` Nick Desaulniers 2020-09-21 22:13 ` Borislav Petkov ` (2 more replies) 0 siblings, 3 replies; 41+ messages in thread From: Nick Desaulniers @ 2020-09-21 20:59 UTC (permalink / raw) To: Dmitry Vyukov, Borislav Petkov Cc: syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Mon, Sep 21, 2020 at 1:09 AM 'Dmitry Vyukov' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote: > > On Mon, Sep 21, 2020 at 7:54 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > On Sat, Sep 19, 2020 at 1:08 PM Borislav Petkov <bp@alien8.de> wrote: > > > > > > On Sat, Sep 19, 2020 at 01:32:14AM -0700, syzbot wrote: > > > > Hello, > > > > > > > > syzbot found the following issue on: > > > > > > > > HEAD commit: 92ab97ad Merge tag 'sh-for-5.9-part2' of git://git.libc.or.. > > > > git tree: upstream > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1069669b900000 > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=cd992d74d6c7e62 > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=ce179bc99e64377c24bc > > > > compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81) > > > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > Reported-by: syzbot+ce179bc99e64377c24bc@syzkaller.appspotmail.com > > > > > > > > general protection fault, probably for non-canonical address 0xffff518084501e28: 0000 [#1] PREEMPT SMP KASAN > > > > KASAN: maybe wild-memory-access in range [0xfffaac042280f140-0xfffaac042280f147] > > > > CPU: 0 PID: 17449 Comm: syz-executor.5 Not tainted 5.9.0-rc5-syzkaller #0 > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > > > RIP: 0010:perf_misc_flags+0x125/0x150 arch/x86/events/core.c:2638 > > > > Code: e4 48 83 e6 03 41 0f 94 c4 31 ff e8 95 fa 73 00 bb 02 00 00 00 4c 29 e3 49 81 c6 90 00 00 00 4c 89 f0 48 c1 e8 00 00 00 00 38 <00> 74 08 4c 89 f7 e8 40 c0 b3 00 41 8b 06 83 e0 08 48 c1 e0 0b 48 > > > > > > Hmm, so converting this back to opcodes with decodecode gives: > > > > > > Code: e4 48 83 e6 03 41 0f 94 c4 31 ff e8 95 fa 73 00 bb 02 00 00 00 4c 29 e3 49 81 c6 90 00 00 00 4c 89 f0 48 c1 e8 00 00 00 00 38 <00> 74 08 4c 89 f7 e8 40 c0 b3 00 41 8b 06 83 e0 08 48 c1 e0 0b 48 > > > All code > > > ======== > > > 0: e4 48 in $0x48,%al > > > 2: 83 e6 03 and $0x3,%esi > > > 5: 41 0f 94 c4 sete %r12b > > > 9: 31 ff xor %edi,%edi > > > b: e8 95 fa 73 00 callq 0x73faa5 > > > 10: bb 02 00 00 00 mov $0x2,%ebx > > > 15: 4c 29 e3 sub %r12,%rbx > > > 18: 49 81 c6 90 00 00 00 add $0x90,%r14 > > > 1f: 4c 89 f0 mov %r14,%rax > > > 22: 48 c1 e8 00 shr $0x0,%rax > > > 26: 00 00 add %al,(%rax) > > > 28: 00 38 add %bh,(%rax) > > > 2a:* 00 74 08 4c add %dh,0x4c(%rax,%rcx,1) <-- trapping instruction > > > 2e: 89 f7 mov %esi,%edi > > > 30: e8 40 c0 b3 00 callq 0xb3c075 > > > 35: 41 8b 06 mov (%r14),%eax > > > 38: 83 e0 08 and $0x8,%eax > > > 3b: 48 c1 e0 0b shl $0xb,%rax > > > 3f: 48 rex.W > > > > > > and those ADDs before the rIP look real strange. Just as if something > > > wrote 4 bytes of 0s there. And building your config with clang-10 gives > > > around that area: > > > > > > ffffffff8101177c: 48 83 e6 03 and $0x3,%rsi > > > ffffffff81011780: 41 0f 94 c4 sete %r12b > > > ffffffff81011784: 31 ff xor %edi,%edi > > > ffffffff81011786: e8 05 c9 73 00 callq ffffffff8174e090 <__sanitizer_cov_trace_const_cmp8> > > > ffffffff8101178b: bb 02 00 00 00 mov $0x2,%ebx > > > ffffffff81011790: 4c 29 e3 sub %r12,%rbx > > > ffffffff81011793: 49 81 c6 90 00 00 00 add $0x90,%r14 > > > ffffffff8101179a: 4c 89 f0 mov %r14,%rax > > > ffffffff8101179d: 48 c1 e8 03 shr $0x3,%rax > > > ffffffff810117a1: 42 80 3c 38 00 cmpb $0x0,(%rax,%r15,1) > > > ffffffff810117a6: 74 08 je ffffffff810117b0 <perf_misc_flags+0x130> > > > ffffffff810117a8: 4c 89 f7 mov %r14,%rdi > > > ffffffff810117ab: e8 20 75 b3 00 callq ffffffff81b48cd0 <__asan_report_load8_noabort> > > > ffffffff810117b0: 41 8b 06 mov (%r14),%eax > > > ffffffff810117b3: 83 e0 08 and $0x8,%eax > > > ffffffff810117b6: 48 c1 e0 0b shl $0xb,%rax > > > > > > and I can pretty much follow it instruction by instruction until I reach > > > that SHR. Your SHR is doing a shift by 0 bytes and that already looks > > > suspicious. > > > > > > After it, your output has a bunch of suspicious ADDs and mine has a CMP; > > > JE instead. And that looks really strange too. > > > > > > Could it be that something has scribbled in guest memory and corrupted > > > that area, leading to that strange discrepancy in the opcodes? Right, the two sequences above look almost the same, except those 4 bytes of zeros (the disassembler gets confused about the rest, but it's the same byte sequence otherwise). Are the two disassemblies a comparison of the code at runtime vs. compile-time? If so, how did you disassemble the runtime code? If runtime and compile time differ, I suspect some kind of runtime patching. I wonder if we calculated the address of a static_key wrong (asm goto). What function am I looking at the disassembly of? perf_misc_flags() in arch/x86/events/core.c? With this config? https://syzkaller.appspot.com/x/.config?x=cd992d74d6c7e62 (though I don't see _any_ asm goto in the IR for this file built with this config). If this is deterministically reproducible, I suppose we could set a watchpoint on the address being overwritten? (Un-interestingly, I do get a panic trying to boot that config in qemu, unless I bump the VMs RAM up.) > > > > Hi Boris, > > > > Memory corruption is definitely possible. There are hundreds of known > > bugs that can potentially lead to silent memory corruptions, and some > > observed to lead to silent memory corruptions. > > > > However, these tend to produce crash signatures with 1-2 crashes. > > While this has 6 and they look similar and all happened on the only > > instance that uses clang. So my bet would be on > > something-clang-related rather than a silent memory corruption. > > +clang-built-linux > > > general protection fault in pvclock_gtod_notify (2) looks somewhat similar: > - only clang > - gpf in systems code > - happened few times > > https://syzkaller.appspot.com/bug?extid=1dccfcb049726389379c > https://groups.google.com/g/syzkaller-bugs/c/0eUUkjFKrBg/m/nGfTjIfCBAAJ Dmitry, Is there an easy way for me to get from https://syzkaller.appspot.com/upstream to <list of clang specific failures>? ctrl+f, `clang`, returns nothing on that first link; it seems the compiler version is only included in the email? -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-21 20:59 ` Nick Desaulniers @ 2020-09-21 22:13 ` Borislav Petkov 2020-09-22 18:56 ` Nick Desaulniers 2020-09-22 5:15 ` Dmitry Vyukov 2020-09-22 5:16 ` Dmitry Vyukov 2 siblings, 1 reply; 41+ messages in thread From: Borislav Petkov @ 2020-09-21 22:13 UTC (permalink / raw) To: Nick Desaulniers Cc: Dmitry Vyukov, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Mon, Sep 21, 2020 at 01:59:43PM -0700, Nick Desaulniers wrote: > Right, the two sequences above look almost the same, except those 4 > bytes of zeros (the disassembler gets confused about the rest, but > it's the same byte sequence otherwise). Are the two disassemblies a > comparison of the code at runtime vs. compile-time? Yes. > If so, how did you disassemble the runtime code? ./scripts/decodecode < /tmp/splat where /tmp/splat contains the line starting with "Code:". Make sure you have only one "Code:"-line, otherwise you'll see the code of the *last* Code: line only. > If runtime and compile time differ, I suspect some kind of runtime > patching. If it is, it ain't patching at the right place. :) But no, that function is pretty simple and looking at its asm, there's no asm goto() or alternatives in there. But that .config might add them. It adds a lot of calls to *ASAN helpers and whatnot. > I wonder if we calculated the address of a static_key wrong > (asm goto). What function am I looking at the disassembly of? > perf_misc_flags() in arch/x86/events/core.c? Yes. > With this config? > https://syzkaller.appspot.com/x/.config?x=cd992d74d6c7e62 (though I > don't see _any_ asm goto in the IR for this file built with this > config). Right, there should be none. > If this is deterministically reproducible, I suppose we > could set a watchpoint on the address being overwritten? Sounds like worth a try. I'll go sleep instead, tho. :) Gnight and good luck. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-21 22:13 ` Borislav Petkov @ 2020-09-22 18:56 ` Nick Desaulniers 2020-09-22 19:29 ` Borislav Petkov 2020-09-23 9:03 ` Borislav Petkov 0 siblings, 2 replies; 41+ messages in thread From: Nick Desaulniers @ 2020-09-22 18:56 UTC (permalink / raw) To: Borislav Petkov, Josh Poimboeuf Cc: Dmitry Vyukov, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Mon, Sep 21, 2020 at 3:13 PM Borislav Petkov <bp@alien8.de> wrote: > > On Mon, Sep 21, 2020 at 01:59:43PM -0700, Nick Desaulniers wrote: > > Right, the two sequences above look almost the same, except those 4 > > bytes of zeros (the disassembler gets confused about the rest, but > > it's the same byte sequence otherwise). Are the two disassemblies a > > comparison of the code at runtime vs. compile-time? > > Yes. > > > If so, how did you disassemble the runtime code? > > ./scripts/decodecode < /tmp/splat > > where /tmp/splat contains the line starting with "Code:". Make sure you > have only one "Code:"-line, otherwise you'll see the code of the *last* > Code: line only. Thanks. > > If runtime and compile time differ, I suspect some kind of runtime > > patching. > > If it is, it ain't patching at the right place. :) Yeah, but we've had this kind of bug before: https://nickdesaulniers.github.io/blog/2020/04/06/off-by-two/ I'm sure it's not the last. > But no, that function is pretty simple and looking at its asm, there's > no asm goto() or alternatives in there. But that .config might add them. > It adds a lot of calls to *ASAN helpers and whatnot. Maybe not in this translation unit, but it's possible another TU does have one and it miscalculates the offset; overwriting code in another TU. > > I wonder if we calculated the address of a static_key wrong > > (asm goto). What function am I looking at the disassembly of? > > perf_misc_flags() in arch/x86/events/core.c? > > Yes. > > > With this config? > > https://syzkaller.appspot.com/x/.config?x=cd992d74d6c7e62 (though I > > don't see _any_ asm goto in the IR for this file built with this > > config). > > Right, there should be none. > > > If this is deterministically reproducible, I suppose we > > could set a watchpoint on the address being overwritten? > > Sounds like worth a try. I'll go sleep instead, tho. :) So I think there's an issue with "deterministically reproducible." The syzcaller report has: > > Unfortunately, I don't have any reproducer for this issue yet. Following my hypothesis about having a bad address calculation; the tricky part is I'd need to look through the relocations and try to see if any could resolve to the address that was accidentally modified. I suspect objtool could be leveraged for that; maybe it could check whether each `struct jump_entry`'s `target` member referred to either a NOP or a CMP, and error otherwise? (Do we have other non-NOP or CMP targets? IDK) This hypothesis might also be incorrect, and thus would be chasing a red herring...not really sure how else to pursue debugging this. > Gnight and good luck. Ah, that's a famous quote from journalist Edward R Murrow, who helped defeat Senator Joseph McCarthy (Murrow's show See It Now dedicated a segment to addressing McCarthy). Sometimes I fund uncanny parallels between claims of what a compiler can do on LKML "without proper regard for evidence" and McCarthyism. Falsifiability is an interesting trait. That's why I try to advocate for sharing links from godbolt.org as much as possible. -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-22 18:56 ` Nick Desaulniers @ 2020-09-22 19:29 ` Borislav Petkov 2020-09-23 9:03 ` Borislav Petkov 1 sibling, 0 replies; 41+ messages in thread From: Borislav Petkov @ 2020-09-22 19:29 UTC (permalink / raw) To: Nick Desaulniers Cc: Josh Poimboeuf, Dmitry Vyukov, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Tue, Sep 22, 2020 at 11:56:04AM -0700, Nick Desaulniers wrote: > > Gnight and good luck. > > Ah, that's a famous quote from journalist Edward R Murrow, who helped > defeat Senator Joseph McCarthy (Murrow's show See It Now dedicated a > segment to addressing McCarthy). Good. Finally someone has recognized this - I use it from time to time in mails but no one would pick up on it! And how relevant it is again... https://www.youtube.com/watch?v=vEvEmkMNYHY > Sometimes I fund uncanny parallels between claims of what a compiler > can do on LKML "without proper regard for evidence" and McCarthyism. > Falsifiability is an interesting trait. That's why I try to advocate > for sharing links from godbolt.org as much as possible. LOL. Good one. :-) I'll take a look at the rest of your reply tomorrow because brain is fried for today. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-22 18:56 ` Nick Desaulniers 2020-09-22 19:29 ` Borislav Petkov @ 2020-09-23 9:03 ` Borislav Petkov 2020-09-23 9:24 ` Dmitry Vyukov 1 sibling, 1 reply; 41+ messages in thread From: Borislav Petkov @ 2020-09-23 9:03 UTC (permalink / raw) To: Nick Desaulniers Cc: Josh Poimboeuf, Dmitry Vyukov, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Tue, Sep 22, 2020 at 11:56:04AM -0700, Nick Desaulniers wrote: > So I think there's an issue with "deterministically reproducible." > The syzcaller report has: > > > Unfortunately, I don't have any reproducer for this issue yet. Yeah, Dmitry gave two other links of similar reports, the first one works for me: https://syzkaller.appspot.com/bug?extid=1dccfcb049726389379c and that one doesn't have a reproducer either. The bytes look familiar though: Code: c1 e8 03 42 80 3c 20 00 74 05 e8 79 7a a7 00 49 8b 47 10 48 89 05 f6 d8 ef 09 49 8d 7f 08 48 89 f8 48 c1 e8 03 42 80 3c 00 00 <00> 00 e8 57 7a a7 00 49 8b 47 08 48 89 05 dc d8 ef 09 49 8d 7f 18 All code ======== 0: c1 e8 03 shr $0x3,%eax 3: 42 80 3c 20 00 cmpb $0x0,(%rax,%r12,1) 8: 74 05 je 0xf a: e8 79 7a a7 00 callq 0xa77a88 f: 49 8b 47 10 mov 0x10(%r15),%rax 13: 48 89 05 f6 d8 ef 09 mov %rax,0x9efd8f6(%rip) # 0x9efd910 1a: 49 8d 7f 08 lea 0x8(%r15),%rdi 1e: 48 89 f8 mov %rdi,%rax 21: 48 c1 e8 03 shr $0x3,%rax 25: 42 80 3c 00 00 cmpb $0x0,(%rax,%r8,1) 2a:* 00 00 add %al,(%rax) <-- trapping instruction 2c: e8 57 7a a7 00 callq 0xa77a88 31: 49 8b 47 08 mov 0x8(%r15),%rax 35: 48 89 05 dc d8 ef 09 mov %rax,0x9efd8dc(%rip) # 0x9efd918 3c: 49 8d 7f 18 lea 0x18(%r15),%rdi 4 zero bytes again. And that .config has kasan stuff enabled too so could the failure be related to having kasan stuff enabled and it messing up offsets? That is, provided this is the mechanism how it would happen. We still don't know what and when wrote those zeroes in there. Not having a reproducer is nasty but looking at those reports above and if I'm reading this correctly, rIP points to RIP: 0010:update_pvclock_gtod arch/x86/kvm/x86.c:1743 [inline] each time and the URL says they're 9 crashes total. And each have happened at that rIP. So all we'd need is set a watchpoint when that address is being written and dump stuff. Dmitry, can the syzkaller do debugging stuff like that? > Following my hypothesis about having a bad address calculation; the > tricky part is I'd need to look through the relocations and try to see > if any could resolve to the address that was accidentally modified. I > suspect objtool could be leveraged for that; If you can find this at compile time... > maybe it could check whether each `struct jump_entry`'s `target` > member referred to either a NOP or a CMP, and error otherwise? (Do we > have other non-NOP or CMP targets? IDK) Follow jump_label_transform() - it does verify what it is going to patch. And while I'm looking at this, I realize that the jump labels patch 5 bytes but the above zeroes are 4 bytes. In the other opcode bytes I decoded it is 4 bytes too. So this might not be caused by the jump labels patching... > This hypothesis might also be incorrect, and thus would be chasing a > red herring...not really sure how else to pursue debugging this. Yeah, this one is tricky to debug. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-23 9:03 ` Borislav Petkov @ 2020-09-23 9:24 ` Dmitry Vyukov 2020-09-23 10:34 ` Borislav Petkov 0 siblings, 1 reply; 41+ messages in thread From: Dmitry Vyukov @ 2020-09-23 9:24 UTC (permalink / raw) To: Borislav Petkov Cc: Nick Desaulniers, Josh Poimboeuf, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Wed, Sep 23, 2020 at 11:03 AM Borislav Petkov <bp@alien8.de> wrote: > > On Tue, Sep 22, 2020 at 11:56:04AM -0700, Nick Desaulniers wrote: > > So I think there's an issue with "deterministically reproducible." > > The syzcaller report has: > > > > Unfortunately, I don't have any reproducer for this issue yet. > > Yeah, Dmitry gave two other links of similar reports, the first one > works for me: > > https://syzkaller.appspot.com/bug?extid=1dccfcb049726389379c > > and that one doesn't have a reproducer either. The bytes look familiar > though: > > Code: c1 e8 03 42 80 3c 20 00 74 05 e8 79 7a a7 00 49 8b 47 10 48 89 05 f6 d8 ef 09 49 8d 7f 08 48 89 f8 48 c1 e8 03 42 80 3c 00 00 <00> 00 e8 57 7a a7 00 49 8b 47 08 48 89 05 dc d8 ef 09 49 8d 7f 18 > All code > ======== > 0: c1 e8 03 shr $0x3,%eax > 3: 42 80 3c 20 00 cmpb $0x0,(%rax,%r12,1) > 8: 74 05 je 0xf > a: e8 79 7a a7 00 callq 0xa77a88 > f: 49 8b 47 10 mov 0x10(%r15),%rax > 13: 48 89 05 f6 d8 ef 09 mov %rax,0x9efd8f6(%rip) # 0x9efd910 > 1a: 49 8d 7f 08 lea 0x8(%r15),%rdi > 1e: 48 89 f8 mov %rdi,%rax > 21: 48 c1 e8 03 shr $0x3,%rax > 25: 42 80 3c 00 00 cmpb $0x0,(%rax,%r8,1) > 2a:* 00 00 add %al,(%rax) <-- trapping instruction > 2c: e8 57 7a a7 00 callq 0xa77a88 > 31: 49 8b 47 08 mov 0x8(%r15),%rax > 35: 48 89 05 dc d8 ef 09 mov %rax,0x9efd8dc(%rip) # 0x9efd918 > 3c: 49 8d 7f 18 lea 0x18(%r15),%rdi > > 4 zero bytes again. And that .config has kasan stuff enabled too so > could the failure be related to having kasan stuff enabled and it > messing up offsets? > > That is, provided this is the mechanism how it would happen. We still > don't know what and when wrote those zeroes in there. Not having a > reproducer is nasty but looking at those reports above and if I'm > reading this correctly, rIP points to > > RIP: 0010:update_pvclock_gtod arch/x86/kvm/x86.c:1743 [inline] > > each time and the URL says they're 9 crashes total. And each have > happened at that rIP. So all we'd need is set a watchpoint when that > address is being written and dump stuff. > > Dmitry, can the syzkaller do debugging stuff like that? syzbot does not have direct support for such things. It uses CONFIG_DEBUG_AID_FOR_SYZBOT=y: https://github.com/google/syzkaller/blob/master/docs/syzbot.md#no-custom-patches But that's generally useful for linux-next only and the clang build is on the upstream tree... Options I see: 1. Add stricter debug checks for code that overwrites code. Then maybe we can catch it red handed. 2. Setup clang instance on linux-next 3. Run syzkaller locally with custom patches. > > Following my hypothesis about having a bad address calculation; the > > tricky part is I'd need to look through the relocations and try to see > > if any could resolve to the address that was accidentally modified. I > > suspect objtool could be leveraged for that; > > If you can find this at compile time... > > > maybe it could check whether each `struct jump_entry`'s `target` > > member referred to either a NOP or a CMP, and error otherwise? (Do we > > have other non-NOP or CMP targets? IDK) > > Follow jump_label_transform() - it does verify what it is going to > patch. And while I'm looking at this, I realize that the jump labels > patch 5 bytes but the above zeroes are 4 bytes. In the other opcode > bytes I decoded it is 4 bytes too. So this might not be caused by the > jump labels patching... > > > This hypothesis might also be incorrect, and thus would be chasing a > > red herring...not really sure how else to pursue debugging this. > > Yeah, this one is tricky to debug. > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-23 9:24 ` Dmitry Vyukov @ 2020-09-23 10:34 ` Borislav Petkov 2020-09-23 15:20 ` Dmitry Vyukov 0 siblings, 1 reply; 41+ messages in thread From: Borislav Petkov @ 2020-09-23 10:34 UTC (permalink / raw) To: Dmitry Vyukov Cc: Nick Desaulniers, Josh Poimboeuf, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Wed, Sep 23, 2020 at 11:24:48AM +0200, Dmitry Vyukov wrote: > 3. Run syzkaller locally with custom patches. Let's say I wanna build the kernel with clang-10 using your .config and run it in a vm locally. What are the steps in order to reproduce the same workload syzkaller runs in the guest on the GCE so that I can at least try get as close as possible to reproducing locally? Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-23 10:34 ` Borislav Petkov @ 2020-09-23 15:20 ` Dmitry Vyukov 2020-09-25 12:22 ` Dmitry Vyukov 2020-09-26 11:21 ` Borislav Petkov 0 siblings, 2 replies; 41+ messages in thread From: Dmitry Vyukov @ 2020-09-23 15:20 UTC (permalink / raw) To: Borislav Petkov Cc: Nick Desaulniers, Josh Poimboeuf, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Wed, Sep 23, 2020 at 12:34 PM Borislav Petkov <bp@alien8.de> wrote: > > On Wed, Sep 23, 2020 at 11:24:48AM +0200, Dmitry Vyukov wrote: > > 3. Run syzkaller locally with custom patches. > > Let's say I wanna build the kernel with clang-10 using your .config and > run it in a vm locally. What are the steps in order to reproduce the > same workload syzkaller runs in the guest on the GCE so that I can at > least try get as close as possible to reproducing locally? It's a random fuzzing workload. You can get this workload by running syzkaller locally: https://github.com/google/syzkaller/blob/master/docs/linux/setup_ubuntu-host_qemu-vm_x86-64-kernel.md The exact clang compiler syzbot used is available here: https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-23 15:20 ` Dmitry Vyukov @ 2020-09-25 12:22 ` Dmitry Vyukov 2020-09-26 0:32 ` Nick Desaulniers 2020-09-26 11:21 ` Borislav Petkov 1 sibling, 1 reply; 41+ messages in thread From: Dmitry Vyukov @ 2020-09-25 12:22 UTC (permalink / raw) To: Borislav Petkov Cc: Nick Desaulniers, Josh Poimboeuf, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Wed, Sep 23, 2020 at 5:20 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Wed, Sep 23, 2020 at 12:34 PM Borislav Petkov <bp@alien8.de> wrote: > > > > On Wed, Sep 23, 2020 at 11:24:48AM +0200, Dmitry Vyukov wrote: > > > 3. Run syzkaller locally with custom patches. > > > > Let's say I wanna build the kernel with clang-10 using your .config and > > run it in a vm locally. What are the steps in order to reproduce the > > same workload syzkaller runs in the guest on the GCE so that I can at > > least try get as close as possible to reproducing locally? > > It's a random fuzzing workload. You can get this workload by running > syzkaller locally: > https://github.com/google/syzkaller/blob/master/docs/linux/setup_ubuntu-host_qemu-vm_x86-64-kernel.md > > The exact clang compiler syzbot used is available here: > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce I've marked all other similar ones a dup of this one. Now you can see all manifestations on the dashboard: https://syzkaller.appspot.com/bug?extid=ce179bc99e64377c24bc Another possible debugging vector on this: The location of crashes does not seem to be completely random and evenly spread across kernel code. I think there are many more static branches (mm, net), but we have 3 crashes in vdso and 9 in paravirt code + these 6 crashes in perf_misc_flags which looks a bit like an outlier (?). What's special about paravirt/vdso?.. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-25 12:22 ` Dmitry Vyukov @ 2020-09-26 0:32 ` Nick Desaulniers 2020-09-26 6:46 ` Dmitry Vyukov 2020-09-26 17:14 ` Borislav Petkov 0 siblings, 2 replies; 41+ messages in thread From: Nick Desaulniers @ 2020-09-26 0:32 UTC (permalink / raw) To: Dmitry Vyukov, Borislav Petkov Cc: Josh Poimboeuf, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Fri, Sep 25, 2020 at 5:22 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Wed, Sep 23, 2020 at 5:20 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > On Wed, Sep 23, 2020 at 12:34 PM Borislav Petkov <bp@alien8.de> wrote: > > > > > > On Wed, Sep 23, 2020 at 11:24:48AM +0200, Dmitry Vyukov wrote: > > > > 3. Run syzkaller locally with custom patches. > > > > > > Let's say I wanna build the kernel with clang-10 using your .config and > > > run it in a vm locally. What are the steps in order to reproduce the > > > same workload syzkaller runs in the guest on the GCE so that I can at > > > least try get as close as possible to reproducing locally? > > > > It's a random fuzzing workload. You can get this workload by running > > syzkaller locally: > > https://github.com/google/syzkaller/blob/master/docs/linux/setup_ubuntu-host_qemu-vm_x86-64-kernel.md These are virtualized guests, right? Has anyone played with getting `rr` working to record traces of guests in QEMU? I had seen the bug that generated this on github: https://julialang.org/blog/2020/09/rr-memory-magic/ That way, even if syzkaller didn't have a reproducer binary, it would at least have a replayable trace. Boris, one question I have. Doesn't the kernel mark pages backing executable code as read only at some point? If that were the case, then I don't see how the instruction stream could be modified. I guess static key patching would have to undo that permission mapping before patching. You're right about the length shorter than what I would have expected from static key patching. That could very well be a write through dangling int pointer... > > > > The exact clang compiler syzbot used is available here: > > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce > > I've marked all other similar ones a dup of this one. Now you can see > all manifestations on the dashboard: > https://syzkaller.appspot.com/bug?extid=ce179bc99e64377c24bc > > Another possible debugging vector on this: > The location of crashes does not seem to be completely random and > evenly spread across kernel code. I think there are many more static > branches (mm, net), but we have 3 crashes in vdso and 9 in paravirt > code + these 6 crashes in perf_misc_flags which looks a bit like an > outlier (?). What's special about paravirt/vdso?.. -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-26 0:32 ` Nick Desaulniers @ 2020-09-26 6:46 ` Dmitry Vyukov 2020-09-26 17:14 ` Borislav Petkov 1 sibling, 0 replies; 41+ messages in thread From: Dmitry Vyukov @ 2020-09-26 6:46 UTC (permalink / raw) To: Nick Desaulniers Cc: Borislav Petkov, Josh Poimboeuf, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Sat, Sep 26, 2020 at 2:32 AM 'Nick Desaulniers' via syzkaller-bugs <syzkaller-bugs@googlegroups.com> wrote: > > > > On Wed, Sep 23, 2020 at 11:24:48AM +0200, Dmitry Vyukov wrote: > > > > > 3. Run syzkaller locally with custom patches. > > > > > > > > Let's say I wanna build the kernel with clang-10 using your .config and > > > > run it in a vm locally. What are the steps in order to reproduce the > > > > same workload syzkaller runs in the guest on the GCE so that I can at > > > > least try get as close as possible to reproducing locally? > > > > > > It's a random fuzzing workload. You can get this workload by running > > > syzkaller locally: > > > https://github.com/google/syzkaller/blob/master/docs/linux/setup_ubuntu-host_qemu-vm_x86-64-kernel.md > > These are virtualized guests, right? Has anyone played with getting > `rr` working to record traces of guests in QEMU? > > I had seen the bug that generated this on github: > https://julialang.org/blog/2020/09/rr-memory-magic/ > > That way, even if syzkaller didn't have a reproducer binary, it would > at least have a replayable trace. These are virtualized guests, but they run on GCE, not in QEMU. > Boris, one question I have. Doesn't the kernel mark pages backing > executable code as read only at some point? If that were the case, > then I don't see how the instruction stream could be modified. I > guess static key patching would have to undo that permission mapping > before patching. > > You're right about the length shorter than what I would have expected > from static key patching. That could very well be a write through > dangling int pointer... > > > > > > > The exact clang compiler syzbot used is available here: > > > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce > > > > I've marked all other similar ones a dup of this one. Now you can see > > all manifestations on the dashboard: > > https://syzkaller.appspot.com/bug?extid=ce179bc99e64377c24bc > > > > Another possible debugging vector on this: > > The location of crashes does not seem to be completely random and > > evenly spread across kernel code. I think there are many more static > > branches (mm, net), but we have 3 crashes in vdso and 9 in paravirt > > code + these 6 crashes in perf_misc_flags which looks a bit like an > > outlier (?). What's special about paravirt/vdso?.. > > > > -- > Thanks, > ~Nick Desaulniers > > -- > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/CAKwvOdkYEP%3DoRtEu_89JBq2g41PL9_FuFyfeB94XwBKuSz4XLg%40mail.gmail.com. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-26 0:32 ` Nick Desaulniers 2020-09-26 6:46 ` Dmitry Vyukov @ 2020-09-26 17:14 ` Borislav Petkov 1 sibling, 0 replies; 41+ messages in thread From: Borislav Petkov @ 2020-09-26 17:14 UTC (permalink / raw) To: Nick Desaulniers Cc: Dmitry Vyukov, Josh Poimboeuf, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Fri, Sep 25, 2020 at 05:32:14PM -0700, Nick Desaulniers wrote: > Boris, one question I have. Doesn't the kernel mark pages backing > executable code as read only at some point? Yes, I added some debug output: [ 562.959995][ T1] Freeing unused kernel image (initmem) memory: 2548K [ 563.672645][ T1] Write protecting the kernel read-only data: 137216k [0xffffffff81000000:0xffffffff89600000] and perf_misc_flags() is well within that range: ffffffff810118e0 <perf_misc_flags>: [ 566.076923][ T1] unused kernel image (text/rodata gap): [0xffffffff88608000:0xffffffff88800000] [ 567.039076][ T1] unused kernel image (rodata/data gap): [0xffffffff8941d000:0xffffffff89600000] [ 568.205550][ T1] Freeing unused kernel image (text/rodata gap) memory: 2016K [ 569.277742][ T1] Freeing unused kernel image (rodata/data gap) memory: 1932K We also have this debug option which I enabled: [ 570.598533][ T1] x86/mm: Checked W+X mappings: passed, no W+X pages found. so that looks ok too. > If that were the case, then I don't see how the instruction stream > could be modified. I guess static key patching would have to undo that > permission mapping before patching. Yap, and I still have no clue about the mechanism which would lead to this corruption. > You're right about the length shorter than what I would have expected > from static key patching. That could very well be a write through > dangling int pointer... Right. Lemme try to setup one of my test boxes to run syzkaller and see how far I can get. But don't hold your breath... -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-23 15:20 ` Dmitry Vyukov 2020-09-25 12:22 ` Dmitry Vyukov @ 2020-09-26 11:21 ` Borislav Petkov 2020-09-26 12:08 ` Dmitry Vyukov 1 sibling, 1 reply; 41+ messages in thread From: Borislav Petkov @ 2020-09-26 11:21 UTC (permalink / raw) To: Dmitry Vyukov Cc: Nick Desaulniers, Josh Poimboeuf, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Wed, Sep 23, 2020 at 05:20:06PM +0200, Dmitry Vyukov wrote: > It's a random fuzzing workload. You can get this workload by running > syzkaller locally: > https://github.com/google/syzkaller/blob/master/docs/linux/setup_ubuntu-host_qemu-vm_x86-64-kernel.md Yeah, the my.cfg example suggests that the syz-manager starts the guest and supplies the kernel, etc. Is there a possibility to run the workload in an already existing guest which I've booted prior? I'm asking because I have all the infra for testing kernels in guests already setup here and it would be easier for me to simply run the workload directly in the guest and then poke at it. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-26 11:21 ` Borislav Petkov @ 2020-09-26 12:08 ` Dmitry Vyukov 0 siblings, 0 replies; 41+ messages in thread From: Dmitry Vyukov @ 2020-09-26 12:08 UTC (permalink / raw) To: Borislav Petkov Cc: Nick Desaulniers, Josh Poimboeuf, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux, syzkaller On Sat, Sep 26, 2020 at 1:21 PM Borislav Petkov <bp@alien8.de> wrote: > > On Wed, Sep 23, 2020 at 05:20:06PM +0200, Dmitry Vyukov wrote: > > It's a random fuzzing workload. You can get this workload by running > > syzkaller locally: > > https://github.com/google/syzkaller/blob/master/docs/linux/setup_ubuntu-host_qemu-vm_x86-64-kernel.md > > Yeah, the my.cfg example suggests that the syz-manager starts the guest > and supplies the kernel, etc. > > Is there a possibility to run the workload in an already existing guest > which I've booted prior? > > I'm asking because I have all the infra for testing kernels in guests > already setup here and it would be easier for me to simply run the > workload directly in the guest and then poke at it. +syzkaller mailing list There is also "isolated" VM type, which allows to connect to a set of external machines via ssh: https://github.com/google/syzkaller/blob/master/vm/isolated/isolated.go#L29-L37 However, it's better to have lots of them and with a console cables, and still sometimes they may brick for various reasons. There is also syz-stress utility that may run some workload directly on the underlying kernel: https://github.com/google/syzkaller/blob/master/tools/syz-stress/stress.go#L29 However, it does not use corpus/coverage, so I don't know if it will be able to reproduce these crashes or not. It will also be up to you then to restart the VM/fuzzing every minute. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-21 20:59 ` Nick Desaulniers 2020-09-21 22:13 ` Borislav Petkov @ 2020-09-22 5:15 ` Dmitry Vyukov 2020-09-22 5:16 ` Dmitry Vyukov 2 siblings, 0 replies; 41+ messages in thread From: Dmitry Vyukov @ 2020-09-22 5:15 UTC (permalink / raw) To: Nick Desaulniers Cc: Borislav Petkov, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Mon, Sep 21, 2020 at 10:59 PM 'Nick Desaulniers' via syzkaller-bugs <syzkaller-bugs@googlegroups.com> wrote: > > On Mon, Sep 21, 2020 at 1:09 AM 'Dmitry Vyukov' via Clang Built Linux > <clang-built-linux@googlegroups.com> wrote: > > > > On Mon, Sep 21, 2020 at 7:54 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > > > On Sat, Sep 19, 2020 at 1:08 PM Borislav Petkov <bp@alien8.de> wrote: > > > > > > > > On Sat, Sep 19, 2020 at 01:32:14AM -0700, syzbot wrote: > > > > > Hello, > > > > > > > > > > syzbot found the following issue on: > > > > > > > > > > HEAD commit: 92ab97ad Merge tag 'sh-for-5.9-part2' of git://git.libc.or.. > > > > > git tree: upstream > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1069669b900000 > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=cd992d74d6c7e62 > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=ce179bc99e64377c24bc > > > > > compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81) > > > > > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > Reported-by: syzbot+ce179bc99e64377c24bc@syzkaller.appspotmail.com > > > > > > > > > > general protection fault, probably for non-canonical address 0xffff518084501e28: 0000 [#1] PREEMPT SMP KASAN > > > > > KASAN: maybe wild-memory-access in range [0xfffaac042280f140-0xfffaac042280f147] > > > > > CPU: 0 PID: 17449 Comm: syz-executor.5 Not tainted 5.9.0-rc5-syzkaller #0 > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > > > > RIP: 0010:perf_misc_flags+0x125/0x150 arch/x86/events/core.c:2638 > > > > > Code: e4 48 83 e6 03 41 0f 94 c4 31 ff e8 95 fa 73 00 bb 02 00 00 00 4c 29 e3 49 81 c6 90 00 00 00 4c 89 f0 48 c1 e8 00 00 00 00 38 <00> 74 08 4c 89 f7 e8 40 c0 b3 00 41 8b 06 83 e0 08 48 c1 e0 0b 48 > > > > > > > > Hmm, so converting this back to opcodes with decodecode gives: > > > > > > > > Code: e4 48 83 e6 03 41 0f 94 c4 31 ff e8 95 fa 73 00 bb 02 00 00 00 4c 29 e3 49 81 c6 90 00 00 00 4c 89 f0 48 c1 e8 00 00 00 00 38 <00> 74 08 4c 89 f7 e8 40 c0 b3 00 41 8b 06 83 e0 08 48 c1 e0 0b 48 > > > > All code > > > > ======== > > > > 0: e4 48 in $0x48,%al > > > > 2: 83 e6 03 and $0x3,%esi > > > > 5: 41 0f 94 c4 sete %r12b > > > > 9: 31 ff xor %edi,%edi > > > > b: e8 95 fa 73 00 callq 0x73faa5 > > > > 10: bb 02 00 00 00 mov $0x2,%ebx > > > > 15: 4c 29 e3 sub %r12,%rbx > > > > 18: 49 81 c6 90 00 00 00 add $0x90,%r14 > > > > 1f: 4c 89 f0 mov %r14,%rax > > > > 22: 48 c1 e8 00 shr $0x0,%rax > > > > 26: 00 00 add %al,(%rax) > > > > 28: 00 38 add %bh,(%rax) > > > > 2a:* 00 74 08 4c add %dh,0x4c(%rax,%rcx,1) <-- trapping instruction > > > > 2e: 89 f7 mov %esi,%edi > > > > 30: e8 40 c0 b3 00 callq 0xb3c075 > > > > 35: 41 8b 06 mov (%r14),%eax > > > > 38: 83 e0 08 and $0x8,%eax > > > > 3b: 48 c1 e0 0b shl $0xb,%rax > > > > 3f: 48 rex.W > > > > > > > > and those ADDs before the rIP look real strange. Just as if something > > > > wrote 4 bytes of 0s there. And building your config with clang-10 gives > > > > around that area: > > > > > > > > ffffffff8101177c: 48 83 e6 03 and $0x3,%rsi > > > > ffffffff81011780: 41 0f 94 c4 sete %r12b > > > > ffffffff81011784: 31 ff xor %edi,%edi > > > > ffffffff81011786: e8 05 c9 73 00 callq ffffffff8174e090 <__sanitizer_cov_trace_const_cmp8> > > > > ffffffff8101178b: bb 02 00 00 00 mov $0x2,%ebx > > > > ffffffff81011790: 4c 29 e3 sub %r12,%rbx > > > > ffffffff81011793: 49 81 c6 90 00 00 00 add $0x90,%r14 > > > > ffffffff8101179a: 4c 89 f0 mov %r14,%rax > > > > ffffffff8101179d: 48 c1 e8 03 shr $0x3,%rax > > > > ffffffff810117a1: 42 80 3c 38 00 cmpb $0x0,(%rax,%r15,1) > > > > ffffffff810117a6: 74 08 je ffffffff810117b0 <perf_misc_flags+0x130> > > > > ffffffff810117a8: 4c 89 f7 mov %r14,%rdi > > > > ffffffff810117ab: e8 20 75 b3 00 callq ffffffff81b48cd0 <__asan_report_load8_noabort> > > > > ffffffff810117b0: 41 8b 06 mov (%r14),%eax > > > > ffffffff810117b3: 83 e0 08 and $0x8,%eax > > > > ffffffff810117b6: 48 c1 e0 0b shl $0xb,%rax > > > > > > > > and I can pretty much follow it instruction by instruction until I reach > > > > that SHR. Your SHR is doing a shift by 0 bytes and that already looks > > > > suspicious. > > > > > > > > After it, your output has a bunch of suspicious ADDs and mine has a CMP; > > > > JE instead. And that looks really strange too. > > > > > > > > Could it be that something has scribbled in guest memory and corrupted > > > > that area, leading to that strange discrepancy in the opcodes? > > Right, the two sequences above look almost the same, except those 4 > bytes of zeros (the disassembler gets confused about the rest, but > it's the same byte sequence otherwise). Are the two disassemblies a > comparison of the code at runtime vs. compile-time? If so, how did > you disassemble the runtime code? If runtime and compile time differ, > I suspect some kind of runtime patching. I wonder if we calculated > the address of a static_key wrong (asm goto). What function am I > looking at the disassembly of? perf_misc_flags() in > arch/x86/events/core.c? With this config? > https://syzkaller.appspot.com/x/.config?x=cd992d74d6c7e62 Hi Nick, Yes, it should be this config. Borislav looked at the crash in the first email in the thread and that's the config provided there. If syzbot provides a crash and a config, these always match. And the exact compiler used to produce the build is this: compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81) https://storage.googleapis.com/syzkaller/clang_install_c2443155.tar.gz > (though I > don't see _any_ asm goto in the IR for this file built with this > config). If this is deterministically reproducible, I suppose we > could set a watchpoint on the address being overwritten? > > (Un-interestingly, I do get a panic trying to boot that config in > qemu, unless I bump the VMs RAM up.) > > > > > > > Hi Boris, > > > > > > Memory corruption is definitely possible. There are hundreds of known > > > bugs that can potentially lead to silent memory corruptions, and some > > > observed to lead to silent memory corruptions. > > > > > > However, these tend to produce crash signatures with 1-2 crashes. > > > While this has 6 and they look similar and all happened on the only > > > instance that uses clang. So my bet would be on > > > something-clang-related rather than a silent memory corruption. > > > +clang-built-linux > > > > > > general protection fault in pvclock_gtod_notify (2) looks somewhat similar: > > - only clang > > - gpf in systems code > > - happened few times > > > > https://syzkaller.appspot.com/bug?extid=1dccfcb049726389379c > > https://groups.google.com/g/syzkaller-bugs/c/0eUUkjFKrBg/m/nGfTjIfCBAAJ > > Dmitry, > Is there an easy way for me to get from > https://syzkaller.appspot.com/upstream to <list of clang specific > failures>? ctrl+f, `clang`, returns nothing on that first link; it > seems the compiler version is only included in the email? > -- > Thanks, > ~Nick Desaulniers > > -- > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/CAKwvOdmKcn%3DFNzwtBZ8z0evLz4BXgWtsoz9%2BQTC6GLqtNp1bXg%40mail.gmail.com. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-21 20:59 ` Nick Desaulniers 2020-09-21 22:13 ` Borislav Petkov 2020-09-22 5:15 ` Dmitry Vyukov @ 2020-09-22 5:16 ` Dmitry Vyukov 2 siblings, 0 replies; 41+ messages in thread From: Dmitry Vyukov @ 2020-09-22 5:16 UTC (permalink / raw) To: Nick Desaulniers Cc: Borislav Petkov, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Mon, Sep 21, 2020 at 10:59 PM 'Nick Desaulniers' via syzkaller-bugs <syzkaller-bugs@googlegroups.com> wrote: > > On Mon, Sep 21, 2020 at 1:09 AM 'Dmitry Vyukov' via Clang Built Linux > <clang-built-linux@googlegroups.com> wrote: > > > > On Mon, Sep 21, 2020 at 7:54 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > > > On Sat, Sep 19, 2020 at 1:08 PM Borislav Petkov <bp@alien8.de> wrote: > > > > > > > > On Sat, Sep 19, 2020 at 01:32:14AM -0700, syzbot wrote: > > > > > Hello, > > > > > > > > > > syzbot found the following issue on: > > > > > > > > > > HEAD commit: 92ab97ad Merge tag 'sh-for-5.9-part2' of git://git.libc.or.. > > > > > git tree: upstream > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1069669b900000 > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=cd992d74d6c7e62 > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=ce179bc99e64377c24bc > > > > > compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81) > > > > > > > > > > Unfortunately, I don't have any reproducer for this issue yet. > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > Reported-by: syzbot+ce179bc99e64377c24bc@syzkaller.appspotmail.com > > > > > > > > > > general protection fault, probably for non-canonical address 0xffff518084501e28: 0000 [#1] PREEMPT SMP KASAN > > > > > KASAN: maybe wild-memory-access in range [0xfffaac042280f140-0xfffaac042280f147] > > > > > CPU: 0 PID: 17449 Comm: syz-executor.5 Not tainted 5.9.0-rc5-syzkaller #0 > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > > > > RIP: 0010:perf_misc_flags+0x125/0x150 arch/x86/events/core.c:2638 > > > > > Code: e4 48 83 e6 03 41 0f 94 c4 31 ff e8 95 fa 73 00 bb 02 00 00 00 4c 29 e3 49 81 c6 90 00 00 00 4c 89 f0 48 c1 e8 00 00 00 00 38 <00> 74 08 4c 89 f7 e8 40 c0 b3 00 41 8b 06 83 e0 08 48 c1 e0 0b 48 > > > > > > > > Hmm, so converting this back to opcodes with decodecode gives: > > > > > > > > Code: e4 48 83 e6 03 41 0f 94 c4 31 ff e8 95 fa 73 00 bb 02 00 00 00 4c 29 e3 49 81 c6 90 00 00 00 4c 89 f0 48 c1 e8 00 00 00 00 38 <00> 74 08 4c 89 f7 e8 40 c0 b3 00 41 8b 06 83 e0 08 48 c1 e0 0b 48 > > > > All code > > > > ======== > > > > 0: e4 48 in $0x48,%al > > > > 2: 83 e6 03 and $0x3,%esi > > > > 5: 41 0f 94 c4 sete %r12b > > > > 9: 31 ff xor %edi,%edi > > > > b: e8 95 fa 73 00 callq 0x73faa5 > > > > 10: bb 02 00 00 00 mov $0x2,%ebx > > > > 15: 4c 29 e3 sub %r12,%rbx > > > > 18: 49 81 c6 90 00 00 00 add $0x90,%r14 > > > > 1f: 4c 89 f0 mov %r14,%rax > > > > 22: 48 c1 e8 00 shr $0x0,%rax > > > > 26: 00 00 add %al,(%rax) > > > > 28: 00 38 add %bh,(%rax) > > > > 2a:* 00 74 08 4c add %dh,0x4c(%rax,%rcx,1) <-- trapping instruction > > > > 2e: 89 f7 mov %esi,%edi > > > > 30: e8 40 c0 b3 00 callq 0xb3c075 > > > > 35: 41 8b 06 mov (%r14),%eax > > > > 38: 83 e0 08 and $0x8,%eax > > > > 3b: 48 c1 e0 0b shl $0xb,%rax > > > > 3f: 48 rex.W > > > > > > > > and those ADDs before the rIP look real strange. Just as if something > > > > wrote 4 bytes of 0s there. And building your config with clang-10 gives > > > > around that area: > > > > > > > > ffffffff8101177c: 48 83 e6 03 and $0x3,%rsi > > > > ffffffff81011780: 41 0f 94 c4 sete %r12b > > > > ffffffff81011784: 31 ff xor %edi,%edi > > > > ffffffff81011786: e8 05 c9 73 00 callq ffffffff8174e090 <__sanitizer_cov_trace_const_cmp8> > > > > ffffffff8101178b: bb 02 00 00 00 mov $0x2,%ebx > > > > ffffffff81011790: 4c 29 e3 sub %r12,%rbx > > > > ffffffff81011793: 49 81 c6 90 00 00 00 add $0x90,%r14 > > > > ffffffff8101179a: 4c 89 f0 mov %r14,%rax > > > > ffffffff8101179d: 48 c1 e8 03 shr $0x3,%rax > > > > ffffffff810117a1: 42 80 3c 38 00 cmpb $0x0,(%rax,%r15,1) > > > > ffffffff810117a6: 74 08 je ffffffff810117b0 <perf_misc_flags+0x130> > > > > ffffffff810117a8: 4c 89 f7 mov %r14,%rdi > > > > ffffffff810117ab: e8 20 75 b3 00 callq ffffffff81b48cd0 <__asan_report_load8_noabort> > > > > ffffffff810117b0: 41 8b 06 mov (%r14),%eax > > > > ffffffff810117b3: 83 e0 08 and $0x8,%eax > > > > ffffffff810117b6: 48 c1 e0 0b shl $0xb,%rax > > > > > > > > and I can pretty much follow it instruction by instruction until I reach > > > > that SHR. Your SHR is doing a shift by 0 bytes and that already looks > > > > suspicious. > > > > > > > > After it, your output has a bunch of suspicious ADDs and mine has a CMP; > > > > JE instead. And that looks really strange too. > > > > > > > > Could it be that something has scribbled in guest memory and corrupted > > > > that area, leading to that strange discrepancy in the opcodes? > > Right, the two sequences above look almost the same, except those 4 > bytes of zeros (the disassembler gets confused about the rest, but > it's the same byte sequence otherwise). Are the two disassemblies a > comparison of the code at runtime vs. compile-time? If so, how did > you disassemble the runtime code? If runtime and compile time differ, > I suspect some kind of runtime patching. I wonder if we calculated > the address of a static_key wrong (asm goto). What function am I > looking at the disassembly of? perf_misc_flags() in > arch/x86/events/core.c? With this config? > https://syzkaller.appspot.com/x/.config?x=cd992d74d6c7e62 (though I > don't see _any_ asm goto in the IR for this file built with this > config). If this is deterministically reproducible, I suppose we > could set a watchpoint on the address being overwritten? > > (Un-interestingly, I do get a panic trying to boot that config in > qemu, unless I bump the VMs RAM up.) > > > > > > > Hi Boris, > > > > > > Memory corruption is definitely possible. There are hundreds of known > > > bugs that can potentially lead to silent memory corruptions, and some > > > observed to lead to silent memory corruptions. > > > > > > However, these tend to produce crash signatures with 1-2 crashes. > > > While this has 6 and they look similar and all happened on the only > > > instance that uses clang. So my bet would be on > > > something-clang-related rather than a silent memory corruption. > > > +clang-built-linux > > > > > > general protection fault in pvclock_gtod_notify (2) looks somewhat similar: > > - only clang > > - gpf in systems code > > - happened few times > > > > https://syzkaller.appspot.com/bug?extid=1dccfcb049726389379c > > https://groups.google.com/g/syzkaller-bugs/c/0eUUkjFKrBg/m/nGfTjIfCBAAJ > > Dmitry, > Is there an easy way for me to get from > https://syzkaller.appspot.com/upstream to <list of clang specific > failures>? ctrl+f, `clang`, returns nothing on that first link; it > seems the compiler version is only included in the email? No such search exists. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-19 8:32 general protection fault in perf_misc_flags syzbot 2020-09-19 11:08 ` Borislav Petkov @ 2020-09-27 14:57 ` Borislav Petkov 2020-09-28 5:18 ` Dmitry Vyukov 1 sibling, 1 reply; 41+ messages in thread From: Borislav Petkov @ 2020-09-27 14:57 UTC (permalink / raw) To: Dmitry Vyukov Cc: syzbot, acme, alexander.shishkin, hpa, jolsa, linux-kernel, mark.rutland, mingo, namhyung, peterz, syzkaller-bugs, tglx, x86 On Sat, Sep 19, 2020 at 01:32:14AM -0700, syzbot wrote: > Hello, > > syzbot found the following issue on: > > HEAD commit: 92ab97ad Merge tag 'sh-for-5.9-part2' of git://git.libc.or.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=1069669b900000 > kernel config: https://syzkaller.appspot.com/x/.config?x=cd992d74d6c7e62 > dashboard link: https://syzkaller.appspot.com/bug?extid=ce179bc99e64377c24bc > compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81) All below is AFAICT: This compiler you're using is not some official release but some random commit before the v10 release: $ git show c2443155a0fb245c8f17f2c1c72b6ea391e86e81 Author: Hans Wennborg <hans@chromium.org> Date: Sat Nov 30 14:20:11 2019 +0100 Revert 651f07908a1 "[AArch64] Don't combine callee-save and local stack adjustment when optimizing for size" ... $ git describe c2443155a0fb245c8f17f2c1c72b6ea391e86e81 llvmorg-10-init-10900-gc2443155a0fb The v10 release is: $ git show llvmorg-10.0.0 tag llvmorg-10.0.0 Tagger: Hans Wennborg <hans@chromium.org> Date: Tue Mar 24 12:58:58 2020 +0100 Tag 10.0.0 and v10 has reached v10.0.1 in the meantime: $ git log --oneline c2443155a0fb245c8f17f2c1c72b6ea391e86e81~1..llvmorg-10.0.1 | wc -l 7051 so can you please update your compiler and see if you can still reproduce with 10.0.1 so that we don't waste time chasing a bug which has been likely already fixed in one of those >7K commits. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-27 14:57 ` Borislav Petkov @ 2020-09-28 5:18 ` Dmitry Vyukov 2020-09-28 6:06 ` Dmitry Vyukov ` (2 more replies) 0 siblings, 3 replies; 41+ messages in thread From: Dmitry Vyukov @ 2020-09-28 5:18 UTC (permalink / raw) To: Borislav Petkov, Alexander Potapenko, Marco Elver Cc: syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Sun, Sep 27, 2020 at 4:57 PM Borislav Petkov <bp@alien8.de> wrote: > > On Sat, Sep 19, 2020 at 01:32:14AM -0700, syzbot wrote: > > Hello, > > > > syzbot found the following issue on: > > > > HEAD commit: 92ab97ad Merge tag 'sh-for-5.9-part2' of git://git.libc.or.. > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=1069669b900000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=cd992d74d6c7e62 > > dashboard link: https://syzkaller.appspot.com/bug?extid=ce179bc99e64377c24bc > > compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81) > > All below is AFAICT: > > This compiler you're using is not some official release but some random > commit before the v10 release: > > $ git show c2443155a0fb245c8f17f2c1c72b6ea391e86e81 > Author: Hans Wennborg <hans@chromium.org> > Date: Sat Nov 30 14:20:11 2019 +0100 > > Revert 651f07908a1 "[AArch64] Don't combine callee-save and local stack adjustment when optimizing for size" > ... > > $ git describe c2443155a0fb245c8f17f2c1c72b6ea391e86e81 > llvmorg-10-init-10900-gc2443155a0fb > > The v10 release is: > > $ git show llvmorg-10.0.0 > tag llvmorg-10.0.0 > Tagger: Hans Wennborg <hans@chromium.org> > Date: Tue Mar 24 12:58:58 2020 +0100 > > Tag 10.0.0 > > and v10 has reached v10.0.1 in the meantime: > > $ git log --oneline c2443155a0fb245c8f17f2c1c72b6ea391e86e81~1..llvmorg-10.0.1 | wc -l > 7051 > > so can you please update your compiler and see if you can still > reproduce with 10.0.1 so that we don't waste time chasing a bug which > has been likely already fixed in one of those >7K commits. +Alex, Marco, There is suspicion that these may be caused by use of unreleased clang. Do we use the same clang as we use for the KMSAN instance? But this is not KMSAN machine, so I am not sure who/when/why updated it last to this revision. I even see we have some clang 11 version: https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce Is it possible to switch to some released version for both KMSAN and KASAN now? ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-28 5:18 ` Dmitry Vyukov @ 2020-09-28 6:06 ` Dmitry Vyukov 2020-09-28 8:38 ` Borislav Petkov 2020-09-28 7:25 ` Marco Elver 2020-09-28 20:32 ` Nick Desaulniers 2 siblings, 1 reply; 41+ messages in thread From: Dmitry Vyukov @ 2020-09-28 6:06 UTC (permalink / raw) To: Borislav Petkov, Alexander Potapenko, Marco Elver Cc: syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Mon, Sep 28, 2020 at 7:18 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Sat, Sep 19, 2020 at 01:32:14AM -0700, syzbot wrote: > > > Hello, > > > > > > syzbot found the following issue on: > > > > > > HEAD commit: 92ab97ad Merge tag 'sh-for-5.9-part2' of git://git.libc.or.. > > > git tree: upstream > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1069669b900000 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=cd992d74d6c7e62 > > > dashboard link: https://syzkaller.appspot.com/bug?extid=ce179bc99e64377c24bc > > > compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81) > > > > All below is AFAICT: > > > > This compiler you're using is not some official release but some random > > commit before the v10 release: > > > > $ git show c2443155a0fb245c8f17f2c1c72b6ea391e86e81 > > Author: Hans Wennborg <hans@chromium.org> > > Date: Sat Nov 30 14:20:11 2019 +0100 > > > > Revert 651f07908a1 "[AArch64] Don't combine callee-save and local stack adjustment when optimizing for size" > > ... > > > > $ git describe c2443155a0fb245c8f17f2c1c72b6ea391e86e81 > > llvmorg-10-init-10900-gc2443155a0fb > > > > The v10 release is: > > > > $ git show llvmorg-10.0.0 > > tag llvmorg-10.0.0 > > Tagger: Hans Wennborg <hans@chromium.org> > > Date: Tue Mar 24 12:58:58 2020 +0100 > > > > Tag 10.0.0 > > > > and v10 has reached v10.0.1 in the meantime: > > > > $ git log --oneline c2443155a0fb245c8f17f2c1c72b6ea391e86e81~1..llvmorg-10.0.1 | wc -l > > 7051 > > > > so can you please update your compiler and see if you can still > > reproduce with 10.0.1 so that we don't waste time chasing a bug which > > has been likely already fixed in one of those >7K commits. > > +Alex, Marco, > > There is suspicion that these may be caused by use of unreleased clang. > Do we use the same clang as we use for the KMSAN instance? But this is > not KMSAN machine, so I am not sure who/when/why updated it last to > this revision. > I even see we have some clang 11 version: > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce > > Is it possible to switch to some released version for both KMSAN and KASAN now? Interestingly there is a new crash, which looks similar: general protection fault in map_vdso https://syzkaller.appspot.com/bug?extid=c2ae01c2b1b385384a06 The code is also with 4 0's: Code: 00 00 00 48 b8 00 00 00 00 00 fc ff df 41 57 49 89 ff 41 56 41 55 41 54 55 65 48 8b 2c 25 c0 fe 01 00 48 8d bd 28 04 00 00 53 <48> 00 00 00 00 fa 48 83 ec 10 48 c1 ea 03 80 3c 02 00 0f 85 51 02 But it happened with gcc. Also I found this older one: general protection fault in map_vdso_randomized https://syzkaller.appspot.com/bug?id=8366fd024559946137b9db23b26fd2235d43b383 which also has code smashed and happened with gcc: Code: 00 fc ff df 48 89 f9 48 c1 e9 03 80 3c 01 00 0f 85 eb 00 00 00 65 48 8b 1c 25 c0 fe 01 00 48 8d bb 28 04 00 00 41 2b 54 24 20 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 I think there may be dozens older ones here: https://syzkaller.appspot.com/upstream#moderation2 e.g. this one where code also looks strange: https://syzkaller.appspot.com/bug?id=651c61721c822bfdcdae8bfb9320e4a9b4bd49c9 Maybe it's just a random silent memory corruption in the end?... ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-28 6:06 ` Dmitry Vyukov @ 2020-09-28 8:38 ` Borislav Petkov 2020-09-28 8:40 ` Dmitry Vyukov 0 siblings, 1 reply; 41+ messages in thread From: Borislav Petkov @ 2020-09-28 8:38 UTC (permalink / raw) To: Dmitry Vyukov Cc: Alexander Potapenko, Marco Elver, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Mon, Sep 28, 2020 at 08:06:19AM +0200, Dmitry Vyukov wrote: > Maybe it's just a random silent memory corruption in the end?... Oh, the rabbit hole goes deeper. But if it is corruption, what is the common element in all those? All those guests have run on the same physical machine? If so, you probably won't get any logs from it to search for MCEs or so...? Maybe the GCE people can do some grepping :) -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-28 8:38 ` Borislav Petkov @ 2020-09-28 8:40 ` Dmitry Vyukov 2020-09-28 8:54 ` Borislav Petkov 0 siblings, 1 reply; 41+ messages in thread From: Dmitry Vyukov @ 2020-09-28 8:40 UTC (permalink / raw) To: Borislav Petkov Cc: Alexander Potapenko, Marco Elver, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Mon, Sep 28, 2020 at 10:38 AM Borislav Petkov <bp@alien8.de> wrote: > > On Mon, Sep 28, 2020 at 08:06:19AM +0200, Dmitry Vyukov wrote: > > Maybe it's just a random silent memory corruption in the end?... > > Oh, the rabbit hole goes deeper. But if it is corruption, what is the > common element in all those? All those guests have run on the same > physical machine? > > If so, you probably won't get any logs from it to search for MCEs or > so...? Maybe the GCE people can do some grepping :) I meant the kernel self-corrupts itself, that just wasn't detected by KASAN, page protections, etc. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-28 8:40 ` Dmitry Vyukov @ 2020-09-28 8:54 ` Borislav Petkov 2020-09-28 10:33 ` Dmitry Vyukov 0 siblings, 1 reply; 41+ messages in thread From: Borislav Petkov @ 2020-09-28 8:54 UTC (permalink / raw) To: Dmitry Vyukov Cc: Alexander Potapenko, Marco Elver, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Mon, Sep 28, 2020 at 10:40:19AM +0200, Dmitry Vyukov wrote: > I meant the kernel self-corrupts itself, that just wasn't detected by > KASAN, page protections, etc. Well, Nick already asked this but we're marking all kernel text RO early during boot. So it either is happening before that or something else altogether is going on. And if that is a kernel issue, I believe we should've heard by now from others. Or maybe this happens only in VMs. Questions over questions... -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-28 8:54 ` Borislav Petkov @ 2020-09-28 10:33 ` Dmitry Vyukov 2020-09-28 20:23 ` Borislav Petkov 2020-09-28 20:51 ` Nick Desaulniers 0 siblings, 2 replies; 41+ messages in thread From: Dmitry Vyukov @ 2020-09-28 10:33 UTC (permalink / raw) To: Borislav Petkov Cc: Alexander Potapenko, Marco Elver, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Mon, Sep 28, 2020 at 10:54 AM Borislav Petkov <bp@alien8.de> wrote: > > On Mon, Sep 28, 2020 at 10:40:19AM +0200, Dmitry Vyukov wrote: > > I meant the kernel self-corrupts itself, that just wasn't detected by > > KASAN, page protections, etc. > > Well, Nick already asked this but we're marking all kernel text RO early > during boot. So it either is happening before that or something else > altogether is going on. > > And if that is a kernel issue, I believe we should've heard by now from > others. Or maybe this happens only in VMs. > > Questions over questions... I don't have answers to all of the questions, but syzkaller produces a pretty unique workload. It has found thousands of bugs that you have not heard from others: https://syzkaller.appspot.com/upstream#open https://syzkaller.appspot.com/upstream/fixed In particular there are hundreds of known and active potential memory corruption bugs. It may be related to VMs, but also may well not be related to VMs. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-28 10:33 ` Dmitry Vyukov @ 2020-09-28 20:23 ` Borislav Petkov 2020-09-29 8:33 ` Borislav Petkov 2020-09-28 20:51 ` Nick Desaulniers 1 sibling, 1 reply; 41+ messages in thread From: Borislav Petkov @ 2020-09-28 20:23 UTC (permalink / raw) To: Dmitry Vyukov Cc: Alexander Potapenko, Marco Elver, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Mon, Sep 28, 2020 at 12:33:57PM +0200, Dmitry Vyukov wrote: > It may be related to VMs, but also may well not be related to VMs. Right, and so I tried to set it up on a test box here, it looks like it worked, see below. I'll let it fuzz in the coming days and see what explodes... 2020/09/28 22:19:51 booting test machines... 2020/09/28 22:19:51 wait for the connection from test machine... 2020/09/28 22:20:27 machine check: 2020/09/28 22:20:27 syscalls : 3389/3739 2020/09/28 22:20:27 code coverage : enabled 2020/09/28 22:20:27 comparison tracing : enabled 2020/09/28 22:20:27 extra coverage : enabled 2020/09/28 22:20:27 setuid sandbox : enabled 2020/09/28 22:20:27 namespace sandbox : enabled 2020/09/28 22:20:27 Android sandbox : enabled 2020/09/28 22:20:27 fault injection : enabled 2020/09/28 22:20:27 leak checking : CONFIG_DEBUG_KMEMLEAK is not enabled 2020/09/28 22:20:27 net packet injection : enabled 2020/09/28 22:20:27 net device setup : enabled 2020/09/28 22:20:27 concurrency sanitizer : /sys/kernel/debug/kcsan does not exist 2020/09/28 22:20:27 devlink PCI setup : PCI device 0000:00:10.0 is not available 2020/09/28 22:20:27 USB emulation : enabled 2020/09/28 22:20:27 hci packet injection : enabled 2020/09/28 22:20:27 wifi device emulation : enabled 2020/09/28 22:20:29 corpus : 458 (deleted 0 broken) 2020/09/28 22:20:31 seeds : 620/667 2020/09/28 22:20:31 VMs 1, executed 0, corpus cover 0, corpus signal 0, max signal 0, crashes 0, repro 0 2020/09/28 22:20:41 VMs 2, executed 12, corpus cover 0, corpus signal 0, max signal 0, crashes 0, repro 0 2020/09/28 22:20:51 VMs 2, executed 28, corpus cover 5578, corpus signal 5925, max signal 10155, crashes 0, repro 0 2020/09/28 22:21:01 VMs 3, executed 179, corpus cover 11792, corpus signal 10881, max signal 19337, crashes 0, repro 0 ... -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-28 20:23 ` Borislav Petkov @ 2020-09-29 8:33 ` Borislav Petkov 2020-09-29 13:29 ` Dmitry Vyukov 0 siblings, 1 reply; 41+ messages in thread From: Borislav Petkov @ 2020-09-29 8:33 UTC (permalink / raw) To: Dmitry Vyukov Cc: Alexander Potapenko, Marco Elver, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Mon, Sep 28, 2020 at 10:23:53PM +0200, Borislav Petkov wrote: > 2020/09/28 22:21:01 VMs 3, executed 179, corpus cover 11792, corpus signal 10881, max signal 19337, crashes 0, repro 0 Ok, so far triggered two things: WARNING in f2fs_is_valid_blkaddr 1 2020/09/29 10:27 reproducing WARNING in reiserfs_put_super 1 2020/09/28 22:42 you've probably seen them already. Anyway, next question. Let's say I trigger the corruption: is there a way to stop the guest VM which has triggered it so that I'm able to examine it with gdb? What about kdump? Can I dump the guest memory either with kdump or through the qemu monitor (I believe there's a command to dump memory) so that it can be poked at? Because as it is, we don't have a reproducer and as I see it, the fuzzing simply gets restarted: 2020/09/29 10:27:03 vm-3: crash: WARNING in f2fs_is_valid_blkaddr ... 2020/09/29 10:27:05 loop: phase=1 shutdown=false instances=1/4 [3] repro: pending=0 reproducing=1 queued=1 2020/09/29 10:27:05 loop: starting instance 3 so it would be good to be able to say, when a vm encounters a crash, it should be stopped immediately so that the guest can be examined through qemu's gdb interface, i.e., -gdb tcp::<portnum> or so? Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-29 8:33 ` Borislav Petkov @ 2020-09-29 13:29 ` Dmitry Vyukov 2020-09-30 16:17 ` Borislav Petkov 0 siblings, 1 reply; 41+ messages in thread From: Dmitry Vyukov @ 2020-09-29 13:29 UTC (permalink / raw) To: Borislav Petkov Cc: Alexander Potapenko, Marco Elver, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux, syzkaller On Tue, Sep 29, 2020 at 10:33 AM Borislav Petkov <bp@alien8.de> wrote: > > On Mon, Sep 28, 2020 at 10:23:53PM +0200, Borislav Petkov wrote: > > 2020/09/28 22:21:01 VMs 3, executed 179, corpus cover 11792, corpus signal 10881, max signal 19337, crashes 0, repro 0 > > Ok, so far triggered two things: > > WARNING in f2fs_is_valid_blkaddr 1 2020/09/29 10:27 reproducing > WARNING in reiserfs_put_super 1 2020/09/28 22:42 > > you've probably seen them already. > > Anyway, next question. Let's say I trigger the corruption: is there a > way to stop the guest VM which has triggered it so that I'm able to > examine it with gdb? > > What about kdump? Can I dump the guest memory either with kdump or > through the qemu monitor (I believe there's a command to dump memory) so > that it can be poked at? > > Because as it is, we don't have a reproducer and as I see it, the fuzzing simply > gets restarted: > > 2020/09/29 10:27:03 vm-3: crash: WARNING in f2fs_is_valid_blkaddr > ... > 2020/09/29 10:27:05 loop: phase=1 shutdown=false instances=1/4 [3] repro: pending=0 reproducing=1 queued=1 > 2020/09/29 10:27:05 loop: starting instance 3 > > so it would be good to be able to say, when a vm encounters a crash, it > should be stopped immediately so that the guest can be examined through > qemu's gdb interface, i.e., > > -gdb tcp::<portnum> > > or so? Currently there is no such feature. I think some people did it because something similar was mentioned on the mailing IIRC, but I don't know how they did it, probably with some local code changes. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-29 13:29 ` Dmitry Vyukov @ 2020-09-30 16:17 ` Borislav Petkov 2020-09-30 16:23 ` Dmitry Vyukov 0 siblings, 1 reply; 41+ messages in thread From: Borislav Petkov @ 2020-09-30 16:17 UTC (permalink / raw) To: Dmitry Vyukov Cc: Alexander Potapenko, Marco Elver, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux, syzkaller Hi, one more thing I just spotted. The default install of syzkaller here runs the guest with this on the kernel command line: 2020/09/30 17:56:18 running command: qemu-system-x86_64 []string{"-m", "2048", "-smp", "2", "-display", ... "-append", "earlyprintk=serial oops=panic ... nmi_watchdog=panic panic_on_warn=1 panic=1 ftrace_dump_on_oops=orig_cpu rodata=n ^^^^^^^^^^ which basically leaves guest kernel's memory RW and it gets caught immediately on vm boot by CONFIG_DEBUG_WX. This pretty much explains why kernel text can get corrupted with a stray pointer write or so. So what's the use case for rodata=n? [ 2.478136] Kernel memory protection disabled. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [ 2.478689] x86/mm: Checking user space page tables [ 2.550163] ------------[ cut here ]------------ [ 2.550736] x86/mm: Found insecure W+X mapping at address entry_SYSCALL_64+0x0/0x29 [ 2.551612] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:246 note_page+0x81f/0x13a0 [ 2.552577] Kernel panic - not syncing: panic_on_warn set ... [ 2.553240] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc7+ #5 [ 2.553953] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1 04/01/2014 [ 2.554922] Call Trace: [ 2.555233] dump_stack+0x9c/0xcf [ 2.555633] panic+0x250/0x5a0 [ 2.556004] ? __warn_printk+0xf8/0xf8 [ 2.556450] ? console_trylock+0xb0/0xb0 [ 2.556914] ? __warn.cold+0x5/0x44 [ 2.557332] ? note_page+0x81f/0x13a0 [ 2.557768] __warn.cold+0x20/0x44 [ 2.558176] ? note_page+0x81f/0x13a0 [ 2.558641] report_bug+0x168/0x1b0 [ 2.559059] handle_bug+0x3c/0x60 [ 2.559458] exc_invalid_op+0x14/0x40 [ 2.559894] asm_exc_invalid_op+0x12/0x20 [ 2.560368] RIP: 0010:note_page+0x81f/0x13a0 [ 2.560870] Code: 26 00 80 3d 9a d0 7f 02 00 0f 85 82 f9 ff ff e8 47 3c 26 00 4c 89 e6 48 c7 c7 40 aff [ 2.562951] RSP: 0000:ffff88800e9f7a90 EFLAGS: 00010282 [ 2.563554] RAX: 0000000000000000 RBX: ffff88800e9f7e00 RCX: 0000000000000000 [ 2.564361] RDX: ffff88800e9edb80 RSI: 0000000000000004 RDI: ffffed1001d3ef44 [ 2.565167] RBP: 0000000000000200 R08: 0000000000000001 R09: 0000000000000003 [ 2.565973] R10: ffffed1001d3eefd R11: 0000000000000001 R12: ffffffff96e00000 [ 2.566780] R13: 00000000000001e3 R14: 0000000000000000 R15: ffff88800e9f7e58 [ 2.567587] ? __kprobes_text_end+0xb3598/0xb3598 [ 2.568122] ? __entry_text_end+0x1fea85/0x1fea85 [ 2.568754] ? __entry_text_end+0x1fea85/0x1fea85 [ 2.569288] ? __entry_text_end+0x1fea85/0x1fea85 [ 2.569821] ptdump_hole+0x61/0x90 [ 2.570212] ? ptdump_pte_entry+0x100/0x100 [ 2.570712] walk_pgd_range+0xdb8/0x15f0 [ 2.571178] walk_page_range_novma+0xd9/0x140 [ 2.571689] ? walk_page_range+0x2b0/0x2b0 [ 2.572171] ? console_unlock+0x58f/0xb10 [ 2.572644] ptdump_walk_pgd+0xcd/0x180 [ 2.573099] ptdump_walk_pgd_level_core+0x13c/0x1b0 [ 2.573663] ? effective_prot+0xb0/0xb0 [ 2.574117] ? vprintk_emit+0x214/0x380 [ 2.574601] ? ptdump_walk_pgd_level_core+0x1b0/0x1b0 [ 2.575186] ? memtype_copy_nth_element+0x1a0/0x1a0 [ 2.575752] ? __kprobes_text_end+0xb3598/0xb3598 [ 2.576300] ? pti_user_pagetable_walk_pmd+0x130/0x460 [ 2.576894] ? __kprobes_text_end+0xb3598/0xb3598 [ 2.577441] ? __kprobes_text_end+0xb3598/0xb3598 [ 2.577988] ? __kprobes_text_end+0xb3598/0xb3598 [ 2.578564] ? rest_init+0xdd/0xdd [ 2.578972] ptdump_walk_user_pgd_level_checkwx.cold+0x31/0x36 [ 2.579640] pti_finalize+0x7b/0x170 [ 2.580066] kernel_init+0x5b/0x183 [ 2.580484] ret_from_fork+0x22/0x30 [ 2.581010] Dumping ftrace buffer: [ 2.581456] (ftrace buffer empty) ffffffffbfffffff) [ 2.583137] Rebooting in 1 seconds.. 2020/09/30 17:56:23 failed to create instance: failed to read from qemu: EOF -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-30 16:17 ` Borislav Petkov @ 2020-09-30 16:23 ` Dmitry Vyukov 2020-09-30 16:29 ` Dmitry Vyukov 2020-09-30 16:31 ` Borislav Petkov 0 siblings, 2 replies; 41+ messages in thread From: Dmitry Vyukov @ 2020-09-30 16:23 UTC (permalink / raw) To: Borislav Petkov Cc: Alexander Potapenko, Marco Elver, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux, syzkaller On Wed, Sep 30, 2020 at 6:17 PM Borislav Petkov <bp@alien8.de> wrote: > > Hi, > > one more thing I just spotted. The default install of syzkaller here > runs the guest with this on the kernel command line: > > 2020/09/30 17:56:18 running command: qemu-system-x86_64 []string{"-m", "2048", > "-smp", "2", "-display", ... "-append", "earlyprintk=serial oops=panic ... > nmi_watchdog=panic panic_on_warn=1 panic=1 ftrace_dump_on_oops=orig_cpu rodata=n > ^^^^^^^^^^ > > which basically leaves guest kernel's memory RW and it gets caught > immediately on vm boot by CONFIG_DEBUG_WX. > > This pretty much explains why kernel text can get corrupted with a stray > pointer write or so. So what's the use case for rodata=n? > > [ 2.478136] Kernel memory protection disabled. > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Ha! Here is the answer: https://github.com/google/syzkaller/blob/master/tools/create-gce-image.sh#L189 # rodata=n: mark_rodata_ro becomes very slow with KASAN (lots of PGDs) I have some vague memory that there was some debug double checking that pages are indeed read-only and that debug check was slow, but it was always executed without rodata=n. > [ 2.478689] x86/mm: Checking user space page tables > [ 2.550163] ------------[ cut here ]------------ > [ 2.550736] x86/mm: Found insecure W+X mapping at address entry_SYSCALL_64+0x0/0x29 > [ 2.551612] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:246 note_page+0x81f/0x13a0 > [ 2.552577] Kernel panic - not syncing: panic_on_warn set ... > [ 2.553240] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc7+ #5 > [ 2.553953] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1 04/01/2014 > [ 2.554922] Call Trace: > [ 2.555233] dump_stack+0x9c/0xcf > [ 2.555633] panic+0x250/0x5a0 > [ 2.556004] ? __warn_printk+0xf8/0xf8 > [ 2.556450] ? console_trylock+0xb0/0xb0 > [ 2.556914] ? __warn.cold+0x5/0x44 > [ 2.557332] ? note_page+0x81f/0x13a0 > [ 2.557768] __warn.cold+0x20/0x44 > [ 2.558176] ? note_page+0x81f/0x13a0 > [ 2.558641] report_bug+0x168/0x1b0 > [ 2.559059] handle_bug+0x3c/0x60 > [ 2.559458] exc_invalid_op+0x14/0x40 > [ 2.559894] asm_exc_invalid_op+0x12/0x20 > [ 2.560368] RIP: 0010:note_page+0x81f/0x13a0 > [ 2.560870] Code: 26 00 80 3d 9a d0 7f 02 00 0f 85 82 f9 ff ff e8 47 3c 26 00 4c 89 e6 48 c7 c7 40 aff > [ 2.562951] RSP: 0000:ffff88800e9f7a90 EFLAGS: 00010282 > [ 2.563554] RAX: 0000000000000000 RBX: ffff88800e9f7e00 RCX: 0000000000000000 > [ 2.564361] RDX: ffff88800e9edb80 RSI: 0000000000000004 RDI: ffffed1001d3ef44 > [ 2.565167] RBP: 0000000000000200 R08: 0000000000000001 R09: 0000000000000003 > [ 2.565973] R10: ffffed1001d3eefd R11: 0000000000000001 R12: ffffffff96e00000 > [ 2.566780] R13: 00000000000001e3 R14: 0000000000000000 R15: ffff88800e9f7e58 > [ 2.567587] ? __kprobes_text_end+0xb3598/0xb3598 > [ 2.568122] ? __entry_text_end+0x1fea85/0x1fea85 > [ 2.568754] ? __entry_text_end+0x1fea85/0x1fea85 > [ 2.569288] ? __entry_text_end+0x1fea85/0x1fea85 > [ 2.569821] ptdump_hole+0x61/0x90 > [ 2.570212] ? ptdump_pte_entry+0x100/0x100 > [ 2.570712] walk_pgd_range+0xdb8/0x15f0 > [ 2.571178] walk_page_range_novma+0xd9/0x140 > [ 2.571689] ? walk_page_range+0x2b0/0x2b0 > [ 2.572171] ? console_unlock+0x58f/0xb10 > [ 2.572644] ptdump_walk_pgd+0xcd/0x180 > [ 2.573099] ptdump_walk_pgd_level_core+0x13c/0x1b0 > [ 2.573663] ? effective_prot+0xb0/0xb0 > [ 2.574117] ? vprintk_emit+0x214/0x380 > [ 2.574601] ? ptdump_walk_pgd_level_core+0x1b0/0x1b0 > [ 2.575186] ? memtype_copy_nth_element+0x1a0/0x1a0 > [ 2.575752] ? __kprobes_text_end+0xb3598/0xb3598 > [ 2.576300] ? pti_user_pagetable_walk_pmd+0x130/0x460 > [ 2.576894] ? __kprobes_text_end+0xb3598/0xb3598 > [ 2.577441] ? __kprobes_text_end+0xb3598/0xb3598 > [ 2.577988] ? __kprobes_text_end+0xb3598/0xb3598 > [ 2.578564] ? rest_init+0xdd/0xdd > [ 2.578972] ptdump_walk_user_pgd_level_checkwx.cold+0x31/0x36 > [ 2.579640] pti_finalize+0x7b/0x170 > [ 2.580066] kernel_init+0x5b/0x183 > [ 2.580484] ret_from_fork+0x22/0x30 > [ 2.581010] Dumping ftrace buffer: > [ 2.581456] (ftrace buffer empty) > ffffffffbfffffff) > [ 2.583137] Rebooting in 1 seconds.. > 2020/09/30 17:56:23 failed to create instance: failed to read from qemu: EOF > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-30 16:23 ` Dmitry Vyukov @ 2020-09-30 16:29 ` Dmitry Vyukov 2020-09-30 16:31 ` Borislav Petkov 1 sibling, 0 replies; 41+ messages in thread From: Dmitry Vyukov @ 2020-09-30 16:29 UTC (permalink / raw) To: Borislav Petkov Cc: Alexander Potapenko, Marco Elver, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux, syzkaller On Wed, Sep 30, 2020 at 6:23 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > Hi, > > > > one more thing I just spotted. The default install of syzkaller here > > runs the guest with this on the kernel command line: > > > > 2020/09/30 17:56:18 running command: qemu-system-x86_64 []string{"-m", "2048", > > "-smp", "2", "-display", ... "-append", "earlyprintk=serial oops=panic ... > > nmi_watchdog=panic panic_on_warn=1 panic=1 ftrace_dump_on_oops=orig_cpu rodata=n > > ^^^^^^^^^^ > > > > which basically leaves guest kernel's memory RW and it gets caught > > immediately on vm boot by CONFIG_DEBUG_WX. > > > > This pretty much explains why kernel text can get corrupted with a stray > > pointer write or so. So what's the use case for rodata=n? > > > > [ 2.478136] Kernel memory protection disabled. > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > Ha! > > Here is the answer: > https://github.com/google/syzkaller/blob/master/tools/create-gce-image.sh#L189 > > # rodata=n: mark_rodata_ro becomes very slow with KASAN (lots of PGDs) > > I have some vague memory that there was some debug double checking > that pages are indeed read-only and that debug check was slow, but it > was always executed without rodata=n. I don't see this is still the case. Diff between 2 boots: [ 11.985152][ T1] Freeing unused kernel image (initmem) memory: 3432K [ 11.986129][ T1] Write protecting the kernel read-only data: 147456k [ 11.990863][ T1] Freeing unused kernel image (text/rodata gap) memory: 2012K [ 11.992797][ T1] Freeing unused kernel image (rodata/data gap) memory: 1324K [ 11.993895][ T1] Run /sbin/init as init process [ 11.910396][ T1] Freeing unused kernel image (initmem) memory: 3432K [ 11.911277][ T1] Kernel memory protection disabled. [ 11.911984][ T1] Run /sbin/init as init process Was it fixed at some point? Was it backported to stable? ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-30 16:23 ` Dmitry Vyukov 2020-09-30 16:29 ` Dmitry Vyukov @ 2020-09-30 16:31 ` Borislav Petkov 2020-10-01 10:23 ` Dmitry Vyukov 1 sibling, 1 reply; 41+ messages in thread From: Borislav Petkov @ 2020-09-30 16:31 UTC (permalink / raw) To: Dmitry Vyukov Cc: Alexander Potapenko, Marco Elver, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux, syzkaller On Wed, Sep 30, 2020 at 06:23:44PM +0200, Dmitry Vyukov wrote: > Here is the answer: > https://github.com/google/syzkaller/blob/master/tools/create-gce-image.sh#L189 > > # rodata=n: mark_rodata_ro becomes very slow with KASAN (lots of PGDs) > > I have some vague memory that there was some debug double checking > that pages are indeed read-only and that debug check was slow, but it > was always executed without rodata=n. Sounds like debug_checkwx() which is disabled by turning off CONFIG_DEBUG_WX. You could either disable it in your .configs or, provided there's even such an option, disable KASAN checking around it until that one-time boot test completes and then reenable KASAN. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-30 16:31 ` Borislav Petkov @ 2020-10-01 10:23 ` Dmitry Vyukov 2020-10-01 11:05 ` Borislav Petkov 0 siblings, 1 reply; 41+ messages in thread From: Dmitry Vyukov @ 2020-10-01 10:23 UTC (permalink / raw) To: Borislav Petkov Cc: Alexander Potapenko, Marco Elver, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux, syzkaller On Wed, Sep 30, 2020 at 6:31 PM Borislav Petkov <bp@alien8.de> wrote: > > On Wed, Sep 30, 2020 at 06:23:44PM +0200, Dmitry Vyukov wrote: > > Here is the answer: > > https://github.com/google/syzkaller/blob/master/tools/create-gce-image.sh#L189 > > > > # rodata=n: mark_rodata_ro becomes very slow with KASAN (lots of PGDs) > > > > I have some vague memory that there was some debug double checking > > that pages are indeed read-only and that debug check was slow, but it > > was always executed without rodata=n. > > Sounds like debug_checkwx() which is disabled by turning off > CONFIG_DEBUG_WX. > > You could either disable it in your .configs or, provided there's even > such an option, disable KASAN checking around it until that one-time > boot test completes and then reenable KASAN. Thanks! I've prepared a change that removes rodata=n: https://github.com/google/syzkaller/pull/2155 I think we will be able to indirectly evaluate if it helps or not over some period of time based on occurrence of any new similar crashes. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-10-01 10:23 ` Dmitry Vyukov @ 2020-10-01 11:05 ` Borislav Petkov 0 siblings, 0 replies; 41+ messages in thread From: Borislav Petkov @ 2020-10-01 11:05 UTC (permalink / raw) To: Dmitry Vyukov Cc: Alexander Potapenko, Marco Elver, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux, syzkaller On Thu, Oct 01, 2020 at 12:23:08PM +0200, Dmitry Vyukov wrote: > I've prepared a change that removes rodata=n: > https://github.com/google/syzkaller/pull/2155 Looks good. > I think we will be able to indirectly evaluate if it helps or not over > some period of time based on occurrence of any new similar crashes. Yap, that would be interesting to see whether those corruptions disappear. If they do, you probably should start getting #PFs of writes to RO memory, instead, resulting from those stray writes. We'll see. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-28 10:33 ` Dmitry Vyukov 2020-09-28 20:23 ` Borislav Petkov @ 2020-09-28 20:51 ` Nick Desaulniers 2020-09-28 21:19 ` Andy Lutomirski 1 sibling, 1 reply; 41+ messages in thread From: Nick Desaulniers @ 2020-09-28 20:51 UTC (permalink / raw) To: Dmitry Vyukov Cc: Borislav Petkov, Alexander Potapenko, Marco Elver, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Mon, Sep 28, 2020 at 3:34 AM 'Dmitry Vyukov' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote: > > On Mon, Sep 28, 2020 at 10:54 AM Borislav Petkov <bp@alien8.de> wrote: > > > > On Mon, Sep 28, 2020 at 10:40:19AM +0200, Dmitry Vyukov wrote: > > > I meant the kernel self-corrupts itself, that just wasn't detected by > > > KASAN, page protections, etc. > > > > Well, Nick already asked this but we're marking all kernel text RO early > > during boot. So it either is happening before that or something else > > altogether is going on. On Sun, Sep 27, 2020 at 11:06 PM 'Dmitry Vyukov' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote: > > Interestingly there is a new crash, which looks similar: > > general protection fault in map_vdso > https://syzkaller.appspot.com/bug?extid=c2ae01c2b1b385384a06 > > The code is also with 4 0's: > Code: 00 00 00 48 b8 00 00 00 00 00 fc ff df 41 57 49 89 ff 41 56 41 > 55 41 54 55 65 48 8b 2c 25 c0 fe 01 00 48 8d bd 28 04 00 00 53 <48> 00 > 00 00 00 fa 48 83 ec 10 48 c1 ea 03 80 3c 02 00 0f 85 51 02 > > But it happened with gcc. > > Also I found this older one: > general protection fault in map_vdso_randomized > https://syzkaller.appspot.com/bug?id=8366fd024559946137b9db23b26fd2235d43b383 > > which also has code smashed and happened with gcc: > Code: 00 fc ff df 48 89 f9 48 c1 e9 03 80 3c 01 00 0f 85 eb 00 00 00 > 65 48 8b 1c 25 c0 fe 01 00 48 8d bb 28 04 00 00 41 2b 54 24 20 <00> 00 > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 If this is related to vdso's, they seem mapped as `r-xp` (not `w): $ sudo cat /proc/1/maps | grep vdso 7ffc667f5000-7ffc667f7000 r-xp 00000000 00:00 0 [vdso] map_vdso() in arch/x86/entry/vdso/vma.c doesn't map the VMA as writable, but it uses VM_MAYWRITE with a comment about GDB setting breakpoints. So it sounds like the page protections on the vdso can be changed at runtime (via mprotect). Maybe syzkaller is tickling that first? map_vdso_randomized() does call map_vdso(). Maybe if we mprotect the vdso to be writable, it may be easier to spot the write. -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-28 20:51 ` Nick Desaulniers @ 2020-09-28 21:19 ` Andy Lutomirski 0 siblings, 0 replies; 41+ messages in thread From: Andy Lutomirski @ 2020-09-28 21:19 UTC (permalink / raw) To: Nick Desaulniers Cc: Dmitry Vyukov, Borislav Petkov, Alexander Potapenko, Marco Elver, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux > On Sep 28, 2020, at 1:52 PM, Nick Desaulniers <ndesaulniers@google.com> wrote: > > On Mon, Sep 28, 2020 at 3:34 AM 'Dmitry Vyukov' via Clang Built Linux > <clang-built-linux@googlegroups.com> wrote: >> >>> On Mon, Sep 28, 2020 at 10:54 AM Borislav Petkov <bp@alien8.de> wrote: >>> >>> On Mon, Sep 28, 2020 at 10:40:19AM +0200, Dmitry Vyukov wrote: >>>> I meant the kernel self-corrupts itself, that just wasn't detected by >>>> KASAN, page protections, etc. >>> >>> Well, Nick already asked this but we're marking all kernel text RO early >>> during boot. So it either is happening before that or something else >>> altogether is going on. > >> On Sun, Sep 27, 2020 at 11:06 PM 'Dmitry Vyukov' via Clang Built Linux >> <clang-built-linux@googlegroups.com> wrote: >> >> Interestingly there is a new crash, which looks similar: >> >> general protection fault in map_vdso >> https://syzkaller.appspot.com/bug?extid=c2ae01c2b1b385384a06 >> >> The code is also with 4 0's: >> Code: 00 00 00 48 b8 00 00 00 00 00 fc ff df 41 57 49 89 ff 41 56 41 >> 55 41 54 55 65 48 8b 2c 25 c0 fe 01 00 48 8d bd 28 04 00 00 53 <48> 00 >> 00 00 00 fa 48 83 ec 10 48 c1 ea 03 80 3c 02 00 0f 85 51 02 >> >> But it happened with gcc. >> >> Also I found this older one: >> general protection fault in map_vdso_randomized >> https://syzkaller.appspot.com/bug?id=8366fd024559946137b9db23b26fd2235d43b383 >> >> which also has code smashed and happened with gcc: >> Code: 00 fc ff df 48 89 f9 48 c1 e9 03 80 3c 01 00 0f 85 eb 00 00 00 >> 65 48 8b 1c 25 c0 fe 01 00 48 8d bb 28 04 00 00 41 2b 54 24 20 <00> 00 >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > If this is related to vdso's, they seem mapped as `r-xp` (not `w): > $ sudo cat /proc/1/maps | grep vdso > 7ffc667f5000-7ffc667f7000 r-xp 00000000 00:00 0 [vdso] > > map_vdso() in arch/x86/entry/vdso/vma.c doesn't map the VMA as > writable, but it uses VM_MAYWRITE with a comment about GDB setting > breakpoints. > > So it sounds like the page protections on the vdso can be changed at > runtime (via mprotect). Maybe syzkaller is tickling that first? > > map_vdso_randomized() does call map_vdso(). Maybe if we mprotect the > vdso to be writable, it may be easier to spot the write. > > The kernel shouldn’t be executing the vDSO code. Unless I’ve misread it, Te crash is that the map_vdso() text itself was corrupted. This isn’t the same thing. The VM_MAYWRITE means that a program may CoW the page and write to the copy, which still won’t allow changing the vDSO text or executing it inside the kernel. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-28 5:18 ` Dmitry Vyukov 2020-09-28 6:06 ` Dmitry Vyukov @ 2020-09-28 7:25 ` Marco Elver 2020-09-28 20:32 ` Nick Desaulniers 2 siblings, 0 replies; 41+ messages in thread From: Marco Elver @ 2020-09-28 7:25 UTC (permalink / raw) To: Dmitry Vyukov Cc: Borislav Petkov, Alexander Potapenko, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Mon, 28 Sep 2020 at 07:18, Dmitry Vyukov <dvyukov@google.com> wrote: > > On Sun, Sep 27, 2020 at 4:57 PM Borislav Petkov <bp@alien8.de> wrote: > > > > On Sat, Sep 19, 2020 at 01:32:14AM -0700, syzbot wrote: > > > Hello, > > > > > > syzbot found the following issue on: > > > > > > HEAD commit: 92ab97ad Merge tag 'sh-for-5.9-part2' of git://git.libc.or.. > > > git tree: upstream > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1069669b900000 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=cd992d74d6c7e62 > > > dashboard link: https://syzkaller.appspot.com/bug?extid=ce179bc99e64377c24bc > > > compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81) > > > > All below is AFAICT: > > > > This compiler you're using is not some official release but some random > > commit before the v10 release: > > > > $ git show c2443155a0fb245c8f17f2c1c72b6ea391e86e81 > > Author: Hans Wennborg <hans@chromium.org> > > Date: Sat Nov 30 14:20:11 2019 +0100 > > > > Revert 651f07908a1 "[AArch64] Don't combine callee-save and local stack adjustment when optimizing for size" > > ... > > > > $ git describe c2443155a0fb245c8f17f2c1c72b6ea391e86e81 > > llvmorg-10-init-10900-gc2443155a0fb > > > > The v10 release is: > > > > $ git show llvmorg-10.0.0 > > tag llvmorg-10.0.0 > > Tagger: Hans Wennborg <hans@chromium.org> > > Date: Tue Mar 24 12:58:58 2020 +0100 > > > > Tag 10.0.0 > > > > and v10 has reached v10.0.1 in the meantime: > > > > $ git log --oneline c2443155a0fb245c8f17f2c1c72b6ea391e86e81~1..llvmorg-10.0.1 | wc -l > > 7051 > > > > so can you please update your compiler and see if you can still > > reproduce with 10.0.1 so that we don't waste time chasing a bug which > > has been likely already fixed in one of those >7K commits. > > +Alex, Marco, > > There is suspicion that these may be caused by use of unreleased clang. > Do we use the same clang as we use for the KMSAN instance? But this is > not KMSAN machine, so I am not sure who/when/why updated it last to > this revision. > I even see we have some clang 11 version: > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce Yeah, we should replace that one as well as it wasn't yet a release-candidate. > Is it possible to switch to some released version for both KMSAN and KASAN now? We should probably just switch to Clang 11-rc3 or so. Then we can use the same compiler for KMSAN and KCSAN at least. I can package up a newer Clang. Thanks, -- Marco ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-28 5:18 ` Dmitry Vyukov 2020-09-28 6:06 ` Dmitry Vyukov 2020-09-28 7:25 ` Marco Elver @ 2020-09-28 20:32 ` Nick Desaulniers 2020-09-29 13:27 ` Dmitry Vyukov 2 siblings, 1 reply; 41+ messages in thread From: Nick Desaulniers @ 2020-09-28 20:32 UTC (permalink / raw) To: Dmitry Vyukov Cc: Borislav Petkov, Alexander Potapenko, Marco Elver, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Sun, Sep 27, 2020 at 10:18 PM 'Dmitry Vyukov' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote: > > On Sun, Sep 27, 2020 at 4:57 PM Borislav Petkov <bp@alien8.de> wrote: > > > > On Sat, Sep 19, 2020 at 01:32:14AM -0700, syzbot wrote: > > > Hello, > > > > > > syzbot found the following issue on: > > > > > > HEAD commit: 92ab97ad Merge tag 'sh-for-5.9-part2' of git://git.libc.or.. > > > git tree: upstream > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1069669b900000 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=cd992d74d6c7e62 > > > dashboard link: https://syzkaller.appspot.com/bug?extid=ce179bc99e64377c24bc > > > compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81) > > > > All below is AFAICT: > > > > This compiler you're using is not some official release but some random > > commit before the v10 release: > > > > $ git show c2443155a0fb245c8f17f2c1c72b6ea391e86e81 > > Author: Hans Wennborg <hans@chromium.org> > > Date: Sat Nov 30 14:20:11 2019 +0100 > > > > Revert 651f07908a1 "[AArch64] Don't combine callee-save and local stack adjustment when optimizing for size" > > ... > > > > $ git describe c2443155a0fb245c8f17f2c1c72b6ea391e86e81 > > llvmorg-10-init-10900-gc2443155a0fb > > > > The v10 release is: > > > > $ git show llvmorg-10.0.0 > > tag llvmorg-10.0.0 > > Tagger: Hans Wennborg <hans@chromium.org> > > Date: Tue Mar 24 12:58:58 2020 +0100 > > > > Tag 10.0.0 > > > > and v10 has reached v10.0.1 in the meantime: > > > > $ git log --oneline c2443155a0fb245c8f17f2c1c72b6ea391e86e81~1..llvmorg-10.0.1 | wc -l > > 7051 > > > > so can you please update your compiler and see if you can still > > reproduce with 10.0.1 so that we don't waste time chasing a bug which > > has been likely already fixed in one of those >7K commits. Oh, shoot, sorry I didn't catch that. Good find. My next question was going to be if this is reproducible with a newer compiler release or not (later emails make this sound like it's no longer considered clang specific). Generally we want coverage of unreleased compiler versions to ensure we don't ship a broken release. Once the release exists, it's of questionable value to continue to test a pre-release version of that branch. This isn't the first time where we've had syzcaller reports that were testing old releases of clang. Maybe we can establish a process for upgrading the toolchain under test based on some time based cadence, or coinciding with the upstream LLVM release events? > > +Alex, Marco, > > There is suspicion that these may be caused by use of unreleased clang. > Do we use the same clang as we use for the KMSAN instance? But this is > not KMSAN machine, so I am not sure who/when/why updated it last to > this revision. > I even see we have some clang 11 version: > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce > > Is it possible to switch to some released version for both KMSAN and KASAN now? -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: general protection fault in perf_misc_flags 2020-09-28 20:32 ` Nick Desaulniers @ 2020-09-29 13:27 ` Dmitry Vyukov 0 siblings, 0 replies; 41+ messages in thread From: Dmitry Vyukov @ 2020-09-29 13:27 UTC (permalink / raw) To: Nick Desaulniers Cc: Borislav Petkov, Alexander Potapenko, Marco Elver, syzbot, Arnaldo Carvalho de Melo, Alexander Shishkin, H. Peter Anvin, Jiri Olsa, LKML, Mark Rutland, Ingo Molnar, Namhyung Kim, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers, clang-built-linux On Mon, Sep 28, 2020 at 10:33 PM Nick Desaulniers <ndesaulniers@google.com> wrote: > > On Sun, Sep 27, 2020 at 10:18 PM 'Dmitry Vyukov' via Clang Built Linux > <clang-built-linux@googlegroups.com> wrote: > > > > On Sun, Sep 27, 2020 at 4:57 PM Borislav Petkov <bp@alien8.de> wrote: > > > > > > On Sat, Sep 19, 2020 at 01:32:14AM -0700, syzbot wrote: > > > > Hello, > > > > > > > > syzbot found the following issue on: > > > > > > > > HEAD commit: 92ab97ad Merge tag 'sh-for-5.9-part2' of git://git.libc.or.. > > > > git tree: upstream > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1069669b900000 > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=cd992d74d6c7e62 > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=ce179bc99e64377c24bc > > > > compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81) > > > > > > All below is AFAICT: > > > > > > This compiler you're using is not some official release but some random > > > commit before the v10 release: > > > > > > $ git show c2443155a0fb245c8f17f2c1c72b6ea391e86e81 > > > Author: Hans Wennborg <hans@chromium.org> > > > Date: Sat Nov 30 14:20:11 2019 +0100 > > > > > > Revert 651f07908a1 "[AArch64] Don't combine callee-save and local stack adjustment when optimizing for size" > > > ... > > > > > > $ git describe c2443155a0fb245c8f17f2c1c72b6ea391e86e81 > > > llvmorg-10-init-10900-gc2443155a0fb > > > > > > The v10 release is: > > > > > > $ git show llvmorg-10.0.0 > > > tag llvmorg-10.0.0 > > > Tagger: Hans Wennborg <hans@chromium.org> > > > Date: Tue Mar 24 12:58:58 2020 +0100 > > > > > > Tag 10.0.0 > > > > > > and v10 has reached v10.0.1 in the meantime: > > > > > > $ git log --oneline c2443155a0fb245c8f17f2c1c72b6ea391e86e81~1..llvmorg-10.0.1 | wc -l > > > 7051 > > > > > > so can you please update your compiler and see if you can still > > > reproduce with 10.0.1 so that we don't waste time chasing a bug which > > > has been likely already fixed in one of those >7K commits. > > Oh, shoot, sorry I didn't catch that. Good find. My next question was > going to be if this is reproducible with a newer compiler release or > not (later emails make this sound like it's no longer considered clang > specific). > > Generally we want coverage of unreleased compiler versions to ensure > we don't ship a broken release. Once the release exists, it's of > questionable value to continue to test a pre-release version of that > branch. > > This isn't the first time where we've had syzcaller reports that were > testing old releases of clang. Maybe we can establish a process for > upgrading the toolchain under test based on some time based cadence, > or coinciding with the upstream LLVM release events? The current hypothesis is that this bug is not related to clang (there are similar crashes with gcc as well). We use unreleased versions of clang as we frequently need recent fixes/features. And then later nobody usually has time to update, if things work. Based on offline discussion with Marco, we probably need to update KMSAN and KASAN to 11 release when it's released. > > +Alex, Marco, > > > > There is suspicion that these may be caused by use of unreleased clang. > > Do we use the same clang as we use for the KMSAN instance? But this is > > not KMSAN machine, so I am not sure who/when/why updated it last to > > this revision. > > I even see we have some clang 11 version: > > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#crash-does-not-reproduce > > > > Is it possible to switch to some released version for both KMSAN and KASAN now? > -- > Thanks, > ~Nick Desaulniers ^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2020-10-01 11:05 UTC | newest] Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-19 8:32 general protection fault in perf_misc_flags syzbot 2020-09-19 11:08 ` Borislav Petkov 2020-09-21 5:54 ` Dmitry Vyukov 2020-09-21 8:08 ` Dmitry Vyukov 2020-09-21 20:59 ` Nick Desaulniers 2020-09-21 22:13 ` Borislav Petkov 2020-09-22 18:56 ` Nick Desaulniers 2020-09-22 19:29 ` Borislav Petkov 2020-09-23 9:03 ` Borislav Petkov 2020-09-23 9:24 ` Dmitry Vyukov 2020-09-23 10:34 ` Borislav Petkov 2020-09-23 15:20 ` Dmitry Vyukov 2020-09-25 12:22 ` Dmitry Vyukov 2020-09-26 0:32 ` Nick Desaulniers 2020-09-26 6:46 ` Dmitry Vyukov 2020-09-26 17:14 ` Borislav Petkov 2020-09-26 11:21 ` Borislav Petkov 2020-09-26 12:08 ` Dmitry Vyukov 2020-09-22 5:15 ` Dmitry Vyukov 2020-09-22 5:16 ` Dmitry Vyukov 2020-09-27 14:57 ` Borislav Petkov 2020-09-28 5:18 ` Dmitry Vyukov 2020-09-28 6:06 ` Dmitry Vyukov 2020-09-28 8:38 ` Borislav Petkov 2020-09-28 8:40 ` Dmitry Vyukov 2020-09-28 8:54 ` Borislav Petkov 2020-09-28 10:33 ` Dmitry Vyukov 2020-09-28 20:23 ` Borislav Petkov 2020-09-29 8:33 ` Borislav Petkov 2020-09-29 13:29 ` Dmitry Vyukov 2020-09-30 16:17 ` Borislav Petkov 2020-09-30 16:23 ` Dmitry Vyukov 2020-09-30 16:29 ` Dmitry Vyukov 2020-09-30 16:31 ` Borislav Petkov 2020-10-01 10:23 ` Dmitry Vyukov 2020-10-01 11:05 ` Borislav Petkov 2020-09-28 20:51 ` Nick Desaulniers 2020-09-28 21:19 ` Andy Lutomirski 2020-09-28 7:25 ` Marco Elver 2020-09-28 20:32 ` Nick Desaulniers 2020-09-29 13:27 ` Dmitry Vyukov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).