* riscv+KASAN does not boot @ 2020-12-25 14:55 ` Dmitry Vyukov 0 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2020-12-25 14:55 UTC (permalink / raw) To: Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, LKML, nylon7 Cc: Björn Töpel, Tobias Klauser, syzkaller, Palmer Dabbelt Hello, I am considering setting up a syzbot instance for riscv arch (using qemu emulation) and testing kernel config/image/etc. I can boot defconfig+kvmconfig riscv kernel, but so far I can't get a booting CONFIG_KASAN+CONFIG+KCOV kernel. But first of all I would like to ask if the riscv port is stable enough at this point and if there is interest in continuous fuzzing and receiving bugs? If there is no interest, then the rest is not worth spending time on. Second, what git tree/branch should be used for testing (to find bugs sooner and get fixes faster)? Currently it seems that riscv/fixes is the most up-to-date branch with most fixes, is it the right one? Re non-booting kernel problem. If I do: defconfig+kvm_guest.config+ scripts/config -e KASAN -e KASAN_INLINE I only see OpenSBI banner and then nothing happens (qemu consumes 100% CPU). I've tried on v5.10, current upstream head (71c5f03154ac) and riscv/fixes (20620d72c31e). The result is the same. I see this recent patch from Nylon: https://lore.kernel.org/linux-riscv/1606727599-8598-1-git-send-email-nylon7@andestech.com/ which suggests that KASAN is working for Nylon. I am using qemu 5.1.0 as: qemu-system-riscv64 \ -machine virt -bios default -smp 1 -m 2G \ -device virtio-blk-device,drive=hd0 \ -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \ -kernel arch/riscv/boot/Image \ -nographic \ -device virtio-rng-device,rng=rng0 -object rng-random,filename=/dev/urandom,id=rng0 \ -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device virtio-net-device,netdev=net0 \ -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic panic_on_warn=1 panic=86400" I've also tried this config (slightly larger than defconfig, but does NOT include KASAN nor KCOV): https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt and this is the ultimate large config that I would like to use: https://gist.githubusercontent.com/dvyukov/2b4e621d5252dbc5a2f28802b8d71d95/raw/3ef2b8d8eda60d3acfc4bf7916ffb9e77671ed76/gistfile1.txt Both of them hang after the OpenSBI banner in the same way. Is it a known issue? Am I doing something wrong? TIA ^ permalink raw reply [flat|nested] 55+ messages in thread
* riscv+KASAN does not boot @ 2020-12-25 14:55 ` Dmitry Vyukov 0 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2020-12-25 14:55 UTC (permalink / raw) To: Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, LKML, nylon7 Cc: Palmer Dabbelt, Björn Töpel, Tobias Klauser, syzkaller Hello, I am considering setting up a syzbot instance for riscv arch (using qemu emulation) and testing kernel config/image/etc. I can boot defconfig+kvmconfig riscv kernel, but so far I can't get a booting CONFIG_KASAN+CONFIG+KCOV kernel. But first of all I would like to ask if the riscv port is stable enough at this point and if there is interest in continuous fuzzing and receiving bugs? If there is no interest, then the rest is not worth spending time on. Second, what git tree/branch should be used for testing (to find bugs sooner and get fixes faster)? Currently it seems that riscv/fixes is the most up-to-date branch with most fixes, is it the right one? Re non-booting kernel problem. If I do: defconfig+kvm_guest.config+ scripts/config -e KASAN -e KASAN_INLINE I only see OpenSBI banner and then nothing happens (qemu consumes 100% CPU). I've tried on v5.10, current upstream head (71c5f03154ac) and riscv/fixes (20620d72c31e). The result is the same. I see this recent patch from Nylon: https://lore.kernel.org/linux-riscv/1606727599-8598-1-git-send-email-nylon7@andestech.com/ which suggests that KASAN is working for Nylon. I am using qemu 5.1.0 as: qemu-system-riscv64 \ -machine virt -bios default -smp 1 -m 2G \ -device virtio-blk-device,drive=hd0 \ -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \ -kernel arch/riscv/boot/Image \ -nographic \ -device virtio-rng-device,rng=rng0 -object rng-random,filename=/dev/urandom,id=rng0 \ -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device virtio-net-device,netdev=net0 \ -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic panic_on_warn=1 panic=86400" I've also tried this config (slightly larger than defconfig, but does NOT include KASAN nor KCOV): https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt and this is the ultimate large config that I would like to use: https://gist.githubusercontent.com/dvyukov/2b4e621d5252dbc5a2f28802b8d71d95/raw/3ef2b8d8eda60d3acfc4bf7916ffb9e77671ed76/gistfile1.txt Both of them hang after the OpenSBI banner in the same way. Is it a known issue? Am I doing something wrong? TIA _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2020-12-25 14:55 ` Dmitry Vyukov @ 2020-12-25 16:58 ` Andreas Schwab -1 siblings, 0 replies; 55+ messages in thread From: Andreas Schwab @ 2020-12-25 16:58 UTC (permalink / raw) To: Dmitry Vyukov Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, LKML, nylon7, Palmer Dabbelt, Björn Töpel, Tobias Klauser, syzkaller On Dez 25 2020, Dmitry Vyukov wrote: > qemu-system-riscv64 \ > -machine virt -bios default -smp 1 -m 2G \ > -device virtio-blk-device,drive=hd0 \ > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \ > -kernel arch/riscv/boot/Image \ > -nographic \ > -device virtio-rng-device,rng=rng0 -object > rng-random,filename=/dev/urandom,id=rng0 \ > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > virtio-net-device,netdev=net0 \ > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > panic_on_warn=1 panic=86400" Do you get more output with earlycon=sbi? Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2020-12-25 16:58 ` Andreas Schwab 0 siblings, 0 replies; 55+ messages in thread From: Andreas Schwab @ 2020-12-25 16:58 UTC (permalink / raw) To: Dmitry Vyukov Cc: Albert Ou, Björn Töpel, Palmer Dabbelt, LKML, nylon7, syzkaller, Palmer Dabbelt, Paul Walmsley, linux-riscv, Tobias Klauser On Dez 25 2020, Dmitry Vyukov wrote: > qemu-system-riscv64 \ > -machine virt -bios default -smp 1 -m 2G \ > -device virtio-blk-device,drive=hd0 \ > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \ > -kernel arch/riscv/boot/Image \ > -nographic \ > -device virtio-rng-device,rng=rng0 -object > rng-random,filename=/dev/urandom,id=rng0 \ > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > virtio-net-device,netdev=net0 \ > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > panic_on_warn=1 panic=86400" Do you get more output with earlycon=sbi? Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2020-12-25 16:58 ` Andreas Schwab @ 2020-12-25 17:13 ` Dmitry Vyukov -1 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2020-12-25 17:13 UTC (permalink / raw) To: Andreas Schwab Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, LKML, nylon7, Palmer Dabbelt, Björn Töpel, Tobias Klauser, syzkaller On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote: > > On Dez 25 2020, Dmitry Vyukov wrote: > > > qemu-system-riscv64 \ > > -machine virt -bios default -smp 1 -m 2G \ > > -device virtio-blk-device,drive=hd0 \ > > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \ > > -kernel arch/riscv/boot/Image \ > > -nographic \ > > -device virtio-rng-device,rng=rng0 -object > > rng-random,filename=/dev/urandom,id=rng0 \ > > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > > virtio-net-device,netdev=net0 \ > > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > > panic_on_warn=1 panic=86400" > > Do you get more output with earlycon=sbi? Hi Andreas, For defconfig+kvm_guest.config+ scripts/config -e KASAN -e KASAN_INLINE it actually gave me more output: OpenSBI v0.7 ____ _____ ____ _____ / __ \ / ____| _ \_ _| | | | |_ __ ___ _ __ | (___ | |_) || | | | | | '_ \ / _ \ '_ \ \___ \| _ < | | | |__| | |_) | __/ | | |____) | |_) || |_ \____/| .__/ \___|_| |_|_____/|____/_____| | | |_| Platform Name : QEMU Virt Machine Platform HART Features : RV64ACDFIMSU Current Hart : 0 Firmware Base : 0x80000000 Firmware Size : 132 KB Runtime SBI Version : 0.2 MIDELEG : 0x0000000000000222 MEDELEG : 0x000000000000b109 PMP0 : 0x0000000080000000-0x000000008003ffff (A) PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X) [ 0.000000] Linux version 5.10.0-01370-g71c5f03154ac (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17 SMP Fri Dec 25 18:10:12 CET 2020 [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 [ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '') [ 0.000000] printk: bootconsole [sbi0] enabled [ 0.000000] efi: UEFI not found. [ 0.000000] Zone ranges: [ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000ffffffff] [ 0.000000] Normal empty [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000080200000-0x00000000ffffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff] [ 0.000000] SBI specification v0.2 detected [ 0.000000] SBI implementation ID=0x1 Version=0x7 [ 0.000000] SBI v0.2 TIME extension detected [ 0.000000] SBI v0.2 IPI extension detected [ 0.000000] SBI v0.2 RFENCE extension detected [ 0.000000] software IO TLB: mapped [mem 0x00000000fa3f9000-0x00000000fe3f9000] (64MB) [ 0.000000] Unable to handle kernel paging request at virtual address dfffffc810040000 [ 0.000000] Oops [#1] [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0-01370-g71c5f03154ac #17 [ 0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0 [ 0.000000] gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 : dfffffc810040000 [ 0.000000] t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 : ffffffe001603f50 [ 0.000000] s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 : 0000000000000000 [ 0.000000] a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 : 0000000000000000 [ 0.000000] a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 : 0000000000000900 [ 0.000000] s2 : dfffffc820000000 s3 : dfffffc800000000 s4 : 0000000000000001 [ 0.000000] s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 : dfffffc810040000 [ 0.000000] s8 : 0000000000000080 s9 : ffffffffffffffff s10: ffffffe07a119000 [ 0.000000] s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 : 0000000000000001 [ 0.000000] t5 : ffffffc4001c150a t6 : ffffffe001603be8 [ 0.000000] status: 0000000000000100 badaddr: dfffffc810040000 cause: 000000000000000f [ 0.000000] random: get_random_bytes called from oops_exit+0x30/0x58 with crng_init=0 [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] Kernel panic - not syncing: Fatal exception [ 0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]--- But I first tried with a the kernel image I had in the dir, I think it was this config (no KASAN): https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt and earlycon=sbi did not change anything (no output after OpenSBI). So potentially there are 2 different problems. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2020-12-25 17:13 ` Dmitry Vyukov 0 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2020-12-25 17:13 UTC (permalink / raw) To: Andreas Schwab Cc: Albert Ou, Björn Töpel, Palmer Dabbelt, LKML, nylon7, syzkaller, Palmer Dabbelt, Paul Walmsley, linux-riscv, Tobias Klauser On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote: > > On Dez 25 2020, Dmitry Vyukov wrote: > > > qemu-system-riscv64 \ > > -machine virt -bios default -smp 1 -m 2G \ > > -device virtio-blk-device,drive=hd0 \ > > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \ > > -kernel arch/riscv/boot/Image \ > > -nographic \ > > -device virtio-rng-device,rng=rng0 -object > > rng-random,filename=/dev/urandom,id=rng0 \ > > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > > virtio-net-device,netdev=net0 \ > > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > > panic_on_warn=1 panic=86400" > > Do you get more output with earlycon=sbi? Hi Andreas, For defconfig+kvm_guest.config+ scripts/config -e KASAN -e KASAN_INLINE it actually gave me more output: OpenSBI v0.7 ____ _____ ____ _____ / __ \ / ____| _ \_ _| | | | |_ __ ___ _ __ | (___ | |_) || | | | | | '_ \ / _ \ '_ \ \___ \| _ < | | | |__| | |_) | __/ | | |____) | |_) || |_ \____/| .__/ \___|_| |_|_____/|____/_____| | | |_| Platform Name : QEMU Virt Machine Platform HART Features : RV64ACDFIMSU Current Hart : 0 Firmware Base : 0x80000000 Firmware Size : 132 KB Runtime SBI Version : 0.2 MIDELEG : 0x0000000000000222 MEDELEG : 0x000000000000b109 PMP0 : 0x0000000080000000-0x000000008003ffff (A) PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X) [ 0.000000] Linux version 5.10.0-01370-g71c5f03154ac (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17 SMP Fri Dec 25 18:10:12 CET 2020 [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 [ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '') [ 0.000000] printk: bootconsole [sbi0] enabled [ 0.000000] efi: UEFI not found. [ 0.000000] Zone ranges: [ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000ffffffff] [ 0.000000] Normal empty [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000080200000-0x00000000ffffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff] [ 0.000000] SBI specification v0.2 detected [ 0.000000] SBI implementation ID=0x1 Version=0x7 [ 0.000000] SBI v0.2 TIME extension detected [ 0.000000] SBI v0.2 IPI extension detected [ 0.000000] SBI v0.2 RFENCE extension detected [ 0.000000] software IO TLB: mapped [mem 0x00000000fa3f9000-0x00000000fe3f9000] (64MB) [ 0.000000] Unable to handle kernel paging request at virtual address dfffffc810040000 [ 0.000000] Oops [#1] [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0-01370-g71c5f03154ac #17 [ 0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0 [ 0.000000] gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 : dfffffc810040000 [ 0.000000] t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 : ffffffe001603f50 [ 0.000000] s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 : 0000000000000000 [ 0.000000] a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 : 0000000000000000 [ 0.000000] a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 : 0000000000000900 [ 0.000000] s2 : dfffffc820000000 s3 : dfffffc800000000 s4 : 0000000000000001 [ 0.000000] s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 : dfffffc810040000 [ 0.000000] s8 : 0000000000000080 s9 : ffffffffffffffff s10: ffffffe07a119000 [ 0.000000] s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 : 0000000000000001 [ 0.000000] t5 : ffffffc4001c150a t6 : ffffffe001603be8 [ 0.000000] status: 0000000000000100 badaddr: dfffffc810040000 cause: 000000000000000f [ 0.000000] random: get_random_bytes called from oops_exit+0x30/0x58 with crng_init=0 [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] Kernel panic - not syncing: Fatal exception [ 0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]--- But I first tried with a the kernel image I had in the dir, I think it was this config (no KASAN): https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt and earlycon=sbi did not change anything (no output after OpenSBI). So potentially there are 2 different problems. _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2020-12-25 17:13 ` Dmitry Vyukov @ 2021-01-14 4:57 ` Palmer Dabbelt -1 siblings, 0 replies; 55+ messages in thread From: Palmer Dabbelt @ 2021-01-14 4:57 UTC (permalink / raw) To: dvyukov Cc: schwab, Paul Walmsley, aou, linux-riscv, linux-kernel, nylon7, Bjorn Topel, tklauser, syzkaller On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote: > On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote: >> >> On Dez 25 2020, Dmitry Vyukov wrote: >> >> > qemu-system-riscv64 \ >> > -machine virt -bios default -smp 1 -m 2G \ >> > -device virtio-blk-device,drive=hd0 \ >> > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \ >> > -kernel arch/riscv/boot/Image \ >> > -nographic \ >> > -device virtio-rng-device,rng=rng0 -object >> > rng-random,filename=/dev/urandom,id=rng0 \ >> > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device >> > virtio-net-device,netdev=net0 \ >> > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic >> > panic_on_warn=1 panic=86400" >> >> Do you get more output with earlycon=sbi? > > Hi Andreas, > > For defconfig+kvm_guest.config+ scripts/config -e KASAN -e > KASAN_INLINE it actually gave me more output: > > > OpenSBI v0.7 > ____ _____ ____ _____ > / __ \ / ____| _ \_ _| > | | | |_ __ ___ _ __ | (___ | |_) || | > | | | | '_ \ / _ \ '_ \ \___ \| _ < | | > | |__| | |_) | __/ | | |____) | |_) || |_ > \____/| .__/ \___|_| |_|_____/|____/_____| > | | > |_| > > Platform Name : QEMU Virt Machine > Platform HART Features : RV64ACDFIMSU > Current Hart : 0 > Firmware Base : 0x80000000 > Firmware Size : 132 KB > Runtime SBI Version : 0.2 > > MIDELEG : 0x0000000000000222 > MEDELEG : 0x000000000000b109 > PMP0 : 0x0000000080000000-0x000000008003ffff (A) > PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X) > [ 0.000000] Linux version 5.10.0-01370-g71c5f03154ac > (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc > (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17 > SMP Fri Dec 25 18:10:12 CET 2020 > [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 > [ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '') > [ 0.000000] printk: bootconsole [sbi0] enabled > [ 0.000000] efi: UEFI not found. > [ 0.000000] Zone ranges: > [ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000ffffffff] > [ 0.000000] Normal empty > [ 0.000000] Movable zone start for each node > [ 0.000000] Early memory node ranges > [ 0.000000] node 0: [mem 0x0000000080200000-0x00000000ffffffff] > [ 0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff] > [ 0.000000] SBI specification v0.2 detected > [ 0.000000] SBI implementation ID=0x1 Version=0x7 > [ 0.000000] SBI v0.2 TIME extension detected > [ 0.000000] SBI v0.2 IPI extension detected > [ 0.000000] SBI v0.2 RFENCE extension detected > [ 0.000000] software IO TLB: mapped [mem > 0x00000000fa3f9000-0x00000000fe3f9000] (64MB) > [ 0.000000] Unable to handle kernel paging request at virtual > address dfffffc810040000 > [ 0.000000] Oops [#1] > [ 0.000000] Modules linked in: > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted > 5.10.0-01370-g71c5f03154ac #17 > [ 0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0 > [ 0.000000] gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 : > dfffffc810040000 > [ 0.000000] t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 : > ffffffe001603f50 > [ 0.000000] s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 : > 0000000000000000 > [ 0.000000] a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 : > 0000000000000000 > [ 0.000000] a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 : > 0000000000000900 > [ 0.000000] s2 : dfffffc820000000 s3 : dfffffc800000000 s4 : > 0000000000000001 > [ 0.000000] s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 : > dfffffc810040000 > [ 0.000000] s8 : 0000000000000080 s9 : ffffffffffffffff s10: > ffffffe07a119000 > [ 0.000000] s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 : > 0000000000000001 > [ 0.000000] t5 : ffffffc4001c150a t6 : ffffffe001603be8 > [ 0.000000] status: 0000000000000100 badaddr: dfffffc810040000 > cause: 000000000000000f > [ 0.000000] random: get_random_bytes called from > oops_exit+0x30/0x58 with crng_init=0 > [ 0.000000] ---[ end trace 0000000000000000 ]--- > [ 0.000000] Kernel panic - not syncing: Fatal exception > [ 0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > > But I first tried with a the kernel image I had in the dir, I think it > was this config (no KASAN): > https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt > > and earlycon=sbi did not change anything (no output after OpenSBI). > So potentially there are 2 different problems. Thanks for reporting this. Looks like I'd forgotten to add a kasan config to my tests. There's one in there now, and it's passing as of the fix that Nylon posted. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2021-01-14 4:57 ` Palmer Dabbelt 0 siblings, 0 replies; 55+ messages in thread From: Palmer Dabbelt @ 2021-01-14 4:57 UTC (permalink / raw) To: dvyukov Cc: aou, Bjorn Topel, linux-kernel, nylon7, syzkaller, schwab, Paul Walmsley, linux-riscv, tklauser On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote: > On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote: >> >> On Dez 25 2020, Dmitry Vyukov wrote: >> >> > qemu-system-riscv64 \ >> > -machine virt -bios default -smp 1 -m 2G \ >> > -device virtio-blk-device,drive=hd0 \ >> > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \ >> > -kernel arch/riscv/boot/Image \ >> > -nographic \ >> > -device virtio-rng-device,rng=rng0 -object >> > rng-random,filename=/dev/urandom,id=rng0 \ >> > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device >> > virtio-net-device,netdev=net0 \ >> > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic >> > panic_on_warn=1 panic=86400" >> >> Do you get more output with earlycon=sbi? > > Hi Andreas, > > For defconfig+kvm_guest.config+ scripts/config -e KASAN -e > KASAN_INLINE it actually gave me more output: > > > OpenSBI v0.7 > ____ _____ ____ _____ > / __ \ / ____| _ \_ _| > | | | |_ __ ___ _ __ | (___ | |_) || | > | | | | '_ \ / _ \ '_ \ \___ \| _ < | | > | |__| | |_) | __/ | | |____) | |_) || |_ > \____/| .__/ \___|_| |_|_____/|____/_____| > | | > |_| > > Platform Name : QEMU Virt Machine > Platform HART Features : RV64ACDFIMSU > Current Hart : 0 > Firmware Base : 0x80000000 > Firmware Size : 132 KB > Runtime SBI Version : 0.2 > > MIDELEG : 0x0000000000000222 > MEDELEG : 0x000000000000b109 > PMP0 : 0x0000000080000000-0x000000008003ffff (A) > PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X) > [ 0.000000] Linux version 5.10.0-01370-g71c5f03154ac > (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc > (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17 > SMP Fri Dec 25 18:10:12 CET 2020 > [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 > [ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '') > [ 0.000000] printk: bootconsole [sbi0] enabled > [ 0.000000] efi: UEFI not found. > [ 0.000000] Zone ranges: > [ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000ffffffff] > [ 0.000000] Normal empty > [ 0.000000] Movable zone start for each node > [ 0.000000] Early memory node ranges > [ 0.000000] node 0: [mem 0x0000000080200000-0x00000000ffffffff] > [ 0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff] > [ 0.000000] SBI specification v0.2 detected > [ 0.000000] SBI implementation ID=0x1 Version=0x7 > [ 0.000000] SBI v0.2 TIME extension detected > [ 0.000000] SBI v0.2 IPI extension detected > [ 0.000000] SBI v0.2 RFENCE extension detected > [ 0.000000] software IO TLB: mapped [mem > 0x00000000fa3f9000-0x00000000fe3f9000] (64MB) > [ 0.000000] Unable to handle kernel paging request at virtual > address dfffffc810040000 > [ 0.000000] Oops [#1] > [ 0.000000] Modules linked in: > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted > 5.10.0-01370-g71c5f03154ac #17 > [ 0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0 > [ 0.000000] gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 : > dfffffc810040000 > [ 0.000000] t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 : > ffffffe001603f50 > [ 0.000000] s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 : > 0000000000000000 > [ 0.000000] a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 : > 0000000000000000 > [ 0.000000] a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 : > 0000000000000900 > [ 0.000000] s2 : dfffffc820000000 s3 : dfffffc800000000 s4 : > 0000000000000001 > [ 0.000000] s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 : > dfffffc810040000 > [ 0.000000] s8 : 0000000000000080 s9 : ffffffffffffffff s10: > ffffffe07a119000 > [ 0.000000] s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 : > 0000000000000001 > [ 0.000000] t5 : ffffffc4001c150a t6 : ffffffe001603be8 > [ 0.000000] status: 0000000000000100 badaddr: dfffffc810040000 > cause: 000000000000000f > [ 0.000000] random: get_random_bytes called from > oops_exit+0x30/0x58 with crng_init=0 > [ 0.000000] ---[ end trace 0000000000000000 ]--- > [ 0.000000] Kernel panic - not syncing: Fatal exception > [ 0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > > But I first tried with a the kernel image I had in the dir, I think it > was this config (no KASAN): > https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt > > and earlycon=sbi did not change anything (no output after OpenSBI). > So potentially there are 2 different problems. Thanks for reporting this. Looks like I'd forgotten to add a kasan config to my tests. There's one in there now, and it's passing as of the fix that Nylon posted. _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2021-01-14 4:57 ` Palmer Dabbelt @ 2021-01-14 9:23 ` Dmitry Vyukov -1 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-01-14 9:23 UTC (permalink / raw) To: Palmer Dabbelt Cc: Andreas Schwab, Paul Walmsley, Albert Ou, linux-riscv, LKML, nylon7, Bjorn Topel, Tobias Klauser, syzkaller On Thu, Jan 14, 2021 at 5:57 AM Palmer Dabbelt <palmerdabbelt@google.com> wrote: > > On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote: > > On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote: > >> > >> On Dez 25 2020, Dmitry Vyukov wrote: > >> > >> > qemu-system-riscv64 \ > >> > -machine virt -bios default -smp 1 -m 2G \ > >> > -device virtio-blk-device,drive=hd0 \ > >> > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \ > >> > -kernel arch/riscv/boot/Image \ > >> > -nographic \ > >> > -device virtio-rng-device,rng=rng0 -object > >> > rng-random,filename=/dev/urandom,id=rng0 \ > >> > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > >> > virtio-net-device,netdev=net0 \ > >> > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > >> > panic_on_warn=1 panic=86400" > >> > >> Do you get more output with earlycon=sbi? > > > > Hi Andreas, > > > > For defconfig+kvm_guest.config+ scripts/config -e KASAN -e > > KASAN_INLINE it actually gave me more output: > > > > > > OpenSBI v0.7 > > ____ _____ ____ _____ > > / __ \ / ____| _ \_ _| > > | | | |_ __ ___ _ __ | (___ | |_) || | > > | | | | '_ \ / _ \ '_ \ \___ \| _ < | | > > | |__| | |_) | __/ | | |____) | |_) || |_ > > \____/| .__/ \___|_| |_|_____/|____/_____| > > | | > > |_| > > > > Platform Name : QEMU Virt Machine > > Platform HART Features : RV64ACDFIMSU > > Current Hart : 0 > > Firmware Base : 0x80000000 > > Firmware Size : 132 KB > > Runtime SBI Version : 0.2 > > > > MIDELEG : 0x0000000000000222 > > MEDELEG : 0x000000000000b109 > > PMP0 : 0x0000000080000000-0x000000008003ffff (A) > > PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X) > > [ 0.000000] Linux version 5.10.0-01370-g71c5f03154ac > > (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc > > (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17 > > SMP Fri Dec 25 18:10:12 CET 2020 > > [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 > > [ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '') > > [ 0.000000] printk: bootconsole [sbi0] enabled > > [ 0.000000] efi: UEFI not found. > > [ 0.000000] Zone ranges: > > [ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000ffffffff] > > [ 0.000000] Normal empty > > [ 0.000000] Movable zone start for each node > > [ 0.000000] Early memory node ranges > > [ 0.000000] node 0: [mem 0x0000000080200000-0x00000000ffffffff] > > [ 0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff] > > [ 0.000000] SBI specification v0.2 detected > > [ 0.000000] SBI implementation ID=0x1 Version=0x7 > > [ 0.000000] SBI v0.2 TIME extension detected > > [ 0.000000] SBI v0.2 IPI extension detected > > [ 0.000000] SBI v0.2 RFENCE extension detected > > [ 0.000000] software IO TLB: mapped [mem > > 0x00000000fa3f9000-0x00000000fe3f9000] (64MB) > > [ 0.000000] Unable to handle kernel paging request at virtual > > address dfffffc810040000 > > [ 0.000000] Oops [#1] > > [ 0.000000] Modules linked in: > > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted > > 5.10.0-01370-g71c5f03154ac #17 > > [ 0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0 > > [ 0.000000] gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 : > > dfffffc810040000 > > [ 0.000000] t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 : > > ffffffe001603f50 > > [ 0.000000] s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 : > > 0000000000000000 > > [ 0.000000] a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 : > > 0000000000000000 > > [ 0.000000] a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 : > > 0000000000000900 > > [ 0.000000] s2 : dfffffc820000000 s3 : dfffffc800000000 s4 : > > 0000000000000001 > > [ 0.000000] s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 : > > dfffffc810040000 > > [ 0.000000] s8 : 0000000000000080 s9 : ffffffffffffffff s10: > > ffffffe07a119000 > > [ 0.000000] s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 : > > 0000000000000001 > > [ 0.000000] t5 : ffffffc4001c150a t6 : ffffffe001603be8 > > [ 0.000000] status: 0000000000000100 badaddr: dfffffc810040000 > > cause: 000000000000000f > > [ 0.000000] random: get_random_bytes called from > > oops_exit+0x30/0x58 with crng_init=0 > > [ 0.000000] ---[ end trace 0000000000000000 ]--- > > [ 0.000000] Kernel panic - not syncing: Fatal exception > > [ 0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > > > > > But I first tried with a the kernel image I had in the dir, I think it > > was this config (no KASAN): > > https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt > > > > and earlycon=sbi did not change anything (no output after OpenSBI). > > So potentially there are 2 different problems. > > Thanks for reporting this. Looks like I'd forgotten to add a kasan config to > my tests. There's one in there now, and it's passing as of the fix that Nylon > posted. I can boot the KASAN kernel now on riscv/fixes. Next problem: I've got only to: [ 90.498967][ T1] Run /sbin/init as init process [ 91.164353][ T4022] init[4022]: unhandled signal 11 code 0x1 at 0x0000000000000bb0 in busybox[10000+d7000] [ 91.179640][ T4022] CPU: 1 PID: 4022 Comm: init Not tainted 5.11.0-rc2-00012-g0983834a8393 #19 [ 91.180853][ T4022] epc: 0000000000000bb0 ra : 0000003fccab09d0 sp : 0000003fffa8c7b0 [ 91.181861][ T4022] gp : 00000000000e8d70 tp : 0000003fccaaf820 t0 : 000000000000001e [ 91.182810][ T4022] t1 : 0000003fccab0bfc t2 : 000000000000000a s0 : 0000003fffa8c850 [ 91.183749][ T4022] s1 : 0000003fccab1070 a0 : 0000003fccab1070 a1 : 0000003fffa8c8c8 [ 91.184689][ T4022] a2 : 0000000000000001 a3 : 0000000000000020 a4 : 0000000000000000 [ 91.185620][ T4022] a5 : 0000000000000000 a6 : 0000003fcc9c4260 a7 : fffffffffffffffe [ 91.186566][ T4022] s2 : 0000000000000000 s3 : 0000003fffa8c8c8 s4 : 0000003fccab1000 [ 91.187500][ T4022] s5 : 0000003fccab1078 s6 : 0000003fffa8c8d0 s7 : 0000000000000010 [ 91.189672][ T4022] s8 : 0000000000000016 s9 : 0000000000000000 s10: 0000003fffa8c8c8 [ 91.190637][ T4022] s11: 0000000000000000 t3 : 0000000000000bb0 t4 : 0000000000000000 [ 91.191568][ T4022] t5 : 0000003fffa8c360 t6 : 0000000000000000 [ 91.192389][ T4022] status: 8000000000004020 badaddr: 0000000000000bb0 cause: 000000000000000c [ 91.201573][ T1] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 91.202906][ T1] CPU: 0 PID: 1 Comm: init Not tainted 5.11.0-rc2-00012-g0983834a8393 #19 [ 91.204139][ T1] Call Trace: [ 91.204849][ T1] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0 [ 91.206124][ T1] [<ffffffe00458b2d8>] show_stack+0x3a/0x46 [ 91.207240][ T1] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180 [ 91.208732][ T1] [<ffffffe00458b6a0>] panic+0x20a/0x5cc [ 91.209890][ T1] [<ffffffe00002eea4>] do_exit+0x1846/0x1874 [ 91.211052][ T1] [<ffffffe00002efdc>] do_group_exit+0xa0/0x192 [ 91.212224][ T1] [<ffffffe000047d30>] get_signal+0x2d6/0x13dc [ 91.213390][ T1] [<ffffffe000007eb0>] do_notify_resume+0xa8/0x912 [ 91.214567][ T1] [<ffffffe00000559c>] ret_from_exception+0x0/0x14 The image is buildroot on 2020.11.x built with this script: https://gist.githubusercontent.com/dvyukov/1a9a01ca2189e35175a021820c95b04d/raw/5c01d755e83f4eab0d56aa7dc84af3b2d5e80423/gistfile1.txt Readelf for init shows the following (is it that [10000+d7000] address is not .text at all?): $ riscv64-linux-gnu-readelf --sections image/bin/busybox There are 27 section headers, starting at offset 0xd7f20: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [ 1] .interp PROGBITS 0000000000010238 00000238 0000000000000021 0000000000000000 A 0 0 1 [ 2] .note.ABI-tag NOTE 000000000001025c 0000025c 0000000000000020 0000000000000000 A 0 0 4 [ 3] .hash HASH 0000000000010280 00000280 00000000000009cc 0000000000000004 A 5 0 8 [ 4] .gnu.hash GNU_HASH 0000000000010c50 00000c50 0000000000000ac8 0000000000000000 A 5 0 8 [ 5] .dynsym DYNSYM 0000000000011718 00001718 00000000000021f0 0000000000000018 A 6 1 8 [ 6] .dynstr STRTAB 0000000000013908 00003908 0000000000000c66 0000000000000000 A 0 0 1 [ 7] .gnu.version VERSYM 000000000001456e 0000456e 00000000000002d4 0000000000000002 A 5 0 2 [ 8] .gnu.version_r VERNEED 0000000000014848 00004848 0000000000000050 0000000000000000 A 6 2 8 [ 9] .rela.dyn RELA 0000000000014898 00004898 00000000000000c0 0000000000000018 A 5 0 8 [10] .rela.plt RELA 0000000000014958 00004958 00000000000020a0 0000000000000018 AI 5 22 8 [11] .plt PROGBITS 0000000000016a00 00006a00 00000000000015e0 0000000000000010 AX 0 0 16 [12] .text PROGBITS 0000000000017fe0 00007fe0 00000000000a3668 0000000000000000 AX 0 0 4 [13] .rodata PROGBITS 00000000000bb648 000ab648 000000000002b076 0000000000000000 A 0 0 8 [14] .sdata2 PROGBITS 00000000000e66c0 000d66c0 0000000000000163 0000000000000000 A 0 0 8 [15] .eh_frame_hdr PROGBITS 00000000000e6824 000d6824 0000000000000014 0000000000000000 A 0 0 4 [16] .eh_frame PROGBITS 00000000000e6838 000d6838 000000000000002c 0000000000000000 A 0 0 8 [17] .preinit_array PREINIT_ARRAY 00000000000e7df8 000d6df8 0000000000000008 0000000000000008 WA 0 0 1 [18] .init_array INIT_ARRAY 00000000000e7e00 000d6e00 0000000000000008 0000000000000008 WA 0 0 8 [19] .fini_array FINI_ARRAY 00000000000e7e08 000d6e08 0000000000000008 0000000000000008 WA 0 0 8 [20] .dynamic DYNAMIC 00000000000e7e10 000d6e10 00000000000001f0 0000000000000010 WA 6 0 8 [21] .data PROGBITS 00000000000e8000 000d7000 0000000000000240 0000000000000000 WA 0 0 8 [22] .got PROGBITS 00000000000e8240 000d7240 0000000000000af8 0000000000000008 WA 0 0 8 [23] .sdata PROGBITS 00000000000e8d38 000d7d38 0000000000000101 0000000000000000 WA 0 0 8 [24] .sbss NOBITS 00000000000e8e40 000d7e39 000000000000017f 0000000000000000 WA 0 0 8 [25] .bss NOBITS 00000000000e8fc0 000d7e39 00000000000005b0 0000000000000000 WA 0 0 8 [26] .shstrtab STRTAB 0000000000000000 000d7e39 00000000000000e6 0000000000000000 0 0 1 Before I spent more time on this, am I doing anything obviously wrong? Is it a known issue? Are there any fresh working recipes? ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2021-01-14 9:23 ` Dmitry Vyukov 0 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-01-14 9:23 UTC (permalink / raw) To: Palmer Dabbelt Cc: Albert Ou, Bjorn Topel, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, linux-riscv, Tobias Klauser On Thu, Jan 14, 2021 at 5:57 AM Palmer Dabbelt <palmerdabbelt@google.com> wrote: > > On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote: > > On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote: > >> > >> On Dez 25 2020, Dmitry Vyukov wrote: > >> > >> > qemu-system-riscv64 \ > >> > -machine virt -bios default -smp 1 -m 2G \ > >> > -device virtio-blk-device,drive=hd0 \ > >> > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \ > >> > -kernel arch/riscv/boot/Image \ > >> > -nographic \ > >> > -device virtio-rng-device,rng=rng0 -object > >> > rng-random,filename=/dev/urandom,id=rng0 \ > >> > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > >> > virtio-net-device,netdev=net0 \ > >> > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > >> > panic_on_warn=1 panic=86400" > >> > >> Do you get more output with earlycon=sbi? > > > > Hi Andreas, > > > > For defconfig+kvm_guest.config+ scripts/config -e KASAN -e > > KASAN_INLINE it actually gave me more output: > > > > > > OpenSBI v0.7 > > ____ _____ ____ _____ > > / __ \ / ____| _ \_ _| > > | | | |_ __ ___ _ __ | (___ | |_) || | > > | | | | '_ \ / _ \ '_ \ \___ \| _ < | | > > | |__| | |_) | __/ | | |____) | |_) || |_ > > \____/| .__/ \___|_| |_|_____/|____/_____| > > | | > > |_| > > > > Platform Name : QEMU Virt Machine > > Platform HART Features : RV64ACDFIMSU > > Current Hart : 0 > > Firmware Base : 0x80000000 > > Firmware Size : 132 KB > > Runtime SBI Version : 0.2 > > > > MIDELEG : 0x0000000000000222 > > MEDELEG : 0x000000000000b109 > > PMP0 : 0x0000000080000000-0x000000008003ffff (A) > > PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X) > > [ 0.000000] Linux version 5.10.0-01370-g71c5f03154ac > > (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc > > (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17 > > SMP Fri Dec 25 18:10:12 CET 2020 > > [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 > > [ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '') > > [ 0.000000] printk: bootconsole [sbi0] enabled > > [ 0.000000] efi: UEFI not found. > > [ 0.000000] Zone ranges: > > [ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000ffffffff] > > [ 0.000000] Normal empty > > [ 0.000000] Movable zone start for each node > > [ 0.000000] Early memory node ranges > > [ 0.000000] node 0: [mem 0x0000000080200000-0x00000000ffffffff] > > [ 0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff] > > [ 0.000000] SBI specification v0.2 detected > > [ 0.000000] SBI implementation ID=0x1 Version=0x7 > > [ 0.000000] SBI v0.2 TIME extension detected > > [ 0.000000] SBI v0.2 IPI extension detected > > [ 0.000000] SBI v0.2 RFENCE extension detected > > [ 0.000000] software IO TLB: mapped [mem > > 0x00000000fa3f9000-0x00000000fe3f9000] (64MB) > > [ 0.000000] Unable to handle kernel paging request at virtual > > address dfffffc810040000 > > [ 0.000000] Oops [#1] > > [ 0.000000] Modules linked in: > > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted > > 5.10.0-01370-g71c5f03154ac #17 > > [ 0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0 > > [ 0.000000] gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 : > > dfffffc810040000 > > [ 0.000000] t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 : > > ffffffe001603f50 > > [ 0.000000] s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 : > > 0000000000000000 > > [ 0.000000] a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 : > > 0000000000000000 > > [ 0.000000] a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 : > > 0000000000000900 > > [ 0.000000] s2 : dfffffc820000000 s3 : dfffffc800000000 s4 : > > 0000000000000001 > > [ 0.000000] s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 : > > dfffffc810040000 > > [ 0.000000] s8 : 0000000000000080 s9 : ffffffffffffffff s10: > > ffffffe07a119000 > > [ 0.000000] s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 : > > 0000000000000001 > > [ 0.000000] t5 : ffffffc4001c150a t6 : ffffffe001603be8 > > [ 0.000000] status: 0000000000000100 badaddr: dfffffc810040000 > > cause: 000000000000000f > > [ 0.000000] random: get_random_bytes called from > > oops_exit+0x30/0x58 with crng_init=0 > > [ 0.000000] ---[ end trace 0000000000000000 ]--- > > [ 0.000000] Kernel panic - not syncing: Fatal exception > > [ 0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > > > > > But I first tried with a the kernel image I had in the dir, I think it > > was this config (no KASAN): > > https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt > > > > and earlycon=sbi did not change anything (no output after OpenSBI). > > So potentially there are 2 different problems. > > Thanks for reporting this. Looks like I'd forgotten to add a kasan config to > my tests. There's one in there now, and it's passing as of the fix that Nylon > posted. I can boot the KASAN kernel now on riscv/fixes. Next problem: I've got only to: [ 90.498967][ T1] Run /sbin/init as init process [ 91.164353][ T4022] init[4022]: unhandled signal 11 code 0x1 at 0x0000000000000bb0 in busybox[10000+d7000] [ 91.179640][ T4022] CPU: 1 PID: 4022 Comm: init Not tainted 5.11.0-rc2-00012-g0983834a8393 #19 [ 91.180853][ T4022] epc: 0000000000000bb0 ra : 0000003fccab09d0 sp : 0000003fffa8c7b0 [ 91.181861][ T4022] gp : 00000000000e8d70 tp : 0000003fccaaf820 t0 : 000000000000001e [ 91.182810][ T4022] t1 : 0000003fccab0bfc t2 : 000000000000000a s0 : 0000003fffa8c850 [ 91.183749][ T4022] s1 : 0000003fccab1070 a0 : 0000003fccab1070 a1 : 0000003fffa8c8c8 [ 91.184689][ T4022] a2 : 0000000000000001 a3 : 0000000000000020 a4 : 0000000000000000 [ 91.185620][ T4022] a5 : 0000000000000000 a6 : 0000003fcc9c4260 a7 : fffffffffffffffe [ 91.186566][ T4022] s2 : 0000000000000000 s3 : 0000003fffa8c8c8 s4 : 0000003fccab1000 [ 91.187500][ T4022] s5 : 0000003fccab1078 s6 : 0000003fffa8c8d0 s7 : 0000000000000010 [ 91.189672][ T4022] s8 : 0000000000000016 s9 : 0000000000000000 s10: 0000003fffa8c8c8 [ 91.190637][ T4022] s11: 0000000000000000 t3 : 0000000000000bb0 t4 : 0000000000000000 [ 91.191568][ T4022] t5 : 0000003fffa8c360 t6 : 0000000000000000 [ 91.192389][ T4022] status: 8000000000004020 badaddr: 0000000000000bb0 cause: 000000000000000c [ 91.201573][ T1] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 91.202906][ T1] CPU: 0 PID: 1 Comm: init Not tainted 5.11.0-rc2-00012-g0983834a8393 #19 [ 91.204139][ T1] Call Trace: [ 91.204849][ T1] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0 [ 91.206124][ T1] [<ffffffe00458b2d8>] show_stack+0x3a/0x46 [ 91.207240][ T1] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180 [ 91.208732][ T1] [<ffffffe00458b6a0>] panic+0x20a/0x5cc [ 91.209890][ T1] [<ffffffe00002eea4>] do_exit+0x1846/0x1874 [ 91.211052][ T1] [<ffffffe00002efdc>] do_group_exit+0xa0/0x192 [ 91.212224][ T1] [<ffffffe000047d30>] get_signal+0x2d6/0x13dc [ 91.213390][ T1] [<ffffffe000007eb0>] do_notify_resume+0xa8/0x912 [ 91.214567][ T1] [<ffffffe00000559c>] ret_from_exception+0x0/0x14 The image is buildroot on 2020.11.x built with this script: https://gist.githubusercontent.com/dvyukov/1a9a01ca2189e35175a021820c95b04d/raw/5c01d755e83f4eab0d56aa7dc84af3b2d5e80423/gistfile1.txt Readelf for init shows the following (is it that [10000+d7000] address is not .text at all?): $ riscv64-linux-gnu-readelf --sections image/bin/busybox There are 27 section headers, starting at offset 0xd7f20: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [ 1] .interp PROGBITS 0000000000010238 00000238 0000000000000021 0000000000000000 A 0 0 1 [ 2] .note.ABI-tag NOTE 000000000001025c 0000025c 0000000000000020 0000000000000000 A 0 0 4 [ 3] .hash HASH 0000000000010280 00000280 00000000000009cc 0000000000000004 A 5 0 8 [ 4] .gnu.hash GNU_HASH 0000000000010c50 00000c50 0000000000000ac8 0000000000000000 A 5 0 8 [ 5] .dynsym DYNSYM 0000000000011718 00001718 00000000000021f0 0000000000000018 A 6 1 8 [ 6] .dynstr STRTAB 0000000000013908 00003908 0000000000000c66 0000000000000000 A 0 0 1 [ 7] .gnu.version VERSYM 000000000001456e 0000456e 00000000000002d4 0000000000000002 A 5 0 2 [ 8] .gnu.version_r VERNEED 0000000000014848 00004848 0000000000000050 0000000000000000 A 6 2 8 [ 9] .rela.dyn RELA 0000000000014898 00004898 00000000000000c0 0000000000000018 A 5 0 8 [10] .rela.plt RELA 0000000000014958 00004958 00000000000020a0 0000000000000018 AI 5 22 8 [11] .plt PROGBITS 0000000000016a00 00006a00 00000000000015e0 0000000000000010 AX 0 0 16 [12] .text PROGBITS 0000000000017fe0 00007fe0 00000000000a3668 0000000000000000 AX 0 0 4 [13] .rodata PROGBITS 00000000000bb648 000ab648 000000000002b076 0000000000000000 A 0 0 8 [14] .sdata2 PROGBITS 00000000000e66c0 000d66c0 0000000000000163 0000000000000000 A 0 0 8 [15] .eh_frame_hdr PROGBITS 00000000000e6824 000d6824 0000000000000014 0000000000000000 A 0 0 4 [16] .eh_frame PROGBITS 00000000000e6838 000d6838 000000000000002c 0000000000000000 A 0 0 8 [17] .preinit_array PREINIT_ARRAY 00000000000e7df8 000d6df8 0000000000000008 0000000000000008 WA 0 0 1 [18] .init_array INIT_ARRAY 00000000000e7e00 000d6e00 0000000000000008 0000000000000008 WA 0 0 8 [19] .fini_array FINI_ARRAY 00000000000e7e08 000d6e08 0000000000000008 0000000000000008 WA 0 0 8 [20] .dynamic DYNAMIC 00000000000e7e10 000d6e10 00000000000001f0 0000000000000010 WA 6 0 8 [21] .data PROGBITS 00000000000e8000 000d7000 0000000000000240 0000000000000000 WA 0 0 8 [22] .got PROGBITS 00000000000e8240 000d7240 0000000000000af8 0000000000000008 WA 0 0 8 [23] .sdata PROGBITS 00000000000e8d38 000d7d38 0000000000000101 0000000000000000 WA 0 0 8 [24] .sbss NOBITS 00000000000e8e40 000d7e39 000000000000017f 0000000000000000 WA 0 0 8 [25] .bss NOBITS 00000000000e8fc0 000d7e39 00000000000005b0 0000000000000000 WA 0 0 8 [26] .shstrtab STRTAB 0000000000000000 000d7e39 00000000000000e6 0000000000000000 0 0 1 Before I spent more time on this, am I doing anything obviously wrong? Is it a known issue? Are there any fresh working recipes? _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2021-01-14 9:23 ` Dmitry Vyukov @ 2021-01-14 10:24 ` Dmitry Vyukov -1 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-01-14 10:24 UTC (permalink / raw) To: Palmer Dabbelt Cc: Andreas Schwab, Paul Walmsley, Albert Ou, linux-riscv, LKML, nylon7, Bjorn Topel, Tobias Klauser, syzkaller On Thu, Jan 14, 2021 at 10:23 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Thu, Jan 14, 2021 at 5:57 AM Palmer Dabbelt <palmerdabbelt@google.com> wrote: > > > > On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote: > > > On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote: > > >> > > >> On Dez 25 2020, Dmitry Vyukov wrote: > > >> > > >> > qemu-system-riscv64 \ > > >> > -machine virt -bios default -smp 1 -m 2G \ > > >> > -device virtio-blk-device,drive=hd0 \ > > >> > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \ > > >> > -kernel arch/riscv/boot/Image \ > > >> > -nographic \ > > >> > -device virtio-rng-device,rng=rng0 -object > > >> > rng-random,filename=/dev/urandom,id=rng0 \ > > >> > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > > >> > virtio-net-device,netdev=net0 \ > > >> > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > > >> > panic_on_warn=1 panic=86400" > > >> > > >> Do you get more output with earlycon=sbi? > > > > > > Hi Andreas, > > > > > > For defconfig+kvm_guest.config+ scripts/config -e KASAN -e > > > KASAN_INLINE it actually gave me more output: > > > > > > > > > OpenSBI v0.7 > > > ____ _____ ____ _____ > > > / __ \ / ____| _ \_ _| > > > | | | |_ __ ___ _ __ | (___ | |_) || | > > > | | | | '_ \ / _ \ '_ \ \___ \| _ < | | > > > | |__| | |_) | __/ | | |____) | |_) || |_ > > > \____/| .__/ \___|_| |_|_____/|____/_____| > > > | | > > > |_| > > > > > > Platform Name : QEMU Virt Machine > > > Platform HART Features : RV64ACDFIMSU > > > Current Hart : 0 > > > Firmware Base : 0x80000000 > > > Firmware Size : 132 KB > > > Runtime SBI Version : 0.2 > > > > > > MIDELEG : 0x0000000000000222 > > > MEDELEG : 0x000000000000b109 > > > PMP0 : 0x0000000080000000-0x000000008003ffff (A) > > > PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X) > > > [ 0.000000] Linux version 5.10.0-01370-g71c5f03154ac > > > (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc > > > (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17 > > > SMP Fri Dec 25 18:10:12 CET 2020 > > > [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 > > > [ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '') > > > [ 0.000000] printk: bootconsole [sbi0] enabled > > > [ 0.000000] efi: UEFI not found. > > > [ 0.000000] Zone ranges: > > > [ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000ffffffff] > > > [ 0.000000] Normal empty > > > [ 0.000000] Movable zone start for each node > > > [ 0.000000] Early memory node ranges > > > [ 0.000000] node 0: [mem 0x0000000080200000-0x00000000ffffffff] > > > [ 0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff] > > > [ 0.000000] SBI specification v0.2 detected > > > [ 0.000000] SBI implementation ID=0x1 Version=0x7 > > > [ 0.000000] SBI v0.2 TIME extension detected > > > [ 0.000000] SBI v0.2 IPI extension detected > > > [ 0.000000] SBI v0.2 RFENCE extension detected > > > [ 0.000000] software IO TLB: mapped [mem > > > 0x00000000fa3f9000-0x00000000fe3f9000] (64MB) > > > [ 0.000000] Unable to handle kernel paging request at virtual > > > address dfffffc810040000 > > > [ 0.000000] Oops [#1] > > > [ 0.000000] Modules linked in: > > > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted > > > 5.10.0-01370-g71c5f03154ac #17 > > > [ 0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0 > > > [ 0.000000] gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 : > > > dfffffc810040000 > > > [ 0.000000] t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 : > > > ffffffe001603f50 > > > [ 0.000000] s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 : > > > 0000000000000000 > > > [ 0.000000] a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 : > > > 0000000000000000 > > > [ 0.000000] a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 : > > > 0000000000000900 > > > [ 0.000000] s2 : dfffffc820000000 s3 : dfffffc800000000 s4 : > > > 0000000000000001 > > > [ 0.000000] s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 : > > > dfffffc810040000 > > > [ 0.000000] s8 : 0000000000000080 s9 : ffffffffffffffff s10: > > > ffffffe07a119000 > > > [ 0.000000] s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 : > > > 0000000000000001 > > > [ 0.000000] t5 : ffffffc4001c150a t6 : ffffffe001603be8 > > > [ 0.000000] status: 0000000000000100 badaddr: dfffffc810040000 > > > cause: 000000000000000f > > > [ 0.000000] random: get_random_bytes called from > > > oops_exit+0x30/0x58 with crng_init=0 > > > [ 0.000000] ---[ end trace 0000000000000000 ]--- > > > [ 0.000000] Kernel panic - not syncing: Fatal exception > > > [ 0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > > > > > > > > But I first tried with a the kernel image I had in the dir, I think it > > > was this config (no KASAN): > > > https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt > > > > > > and earlycon=sbi did not change anything (no output after OpenSBI). > > > So potentially there are 2 different problems. > > > > Thanks for reporting this. Looks like I'd forgotten to add a kasan config to > > my tests. There's one in there now, and it's passing as of the fix that Nylon > > posted. > > I can boot the KASAN kernel now on riscv/fixes. > > Next problem: I've got only to: > > [ 90.498967][ T1] Run /sbin/init as init process > [ 91.164353][ T4022] init[4022]: unhandled signal 11 code 0x1 at > 0x0000000000000bb0 in busybox[10000+d7000] > [ 91.179640][ T4022] CPU: 1 PID: 4022 Comm: init Not tainted > 5.11.0-rc2-00012-g0983834a8393 #19 > [ 91.180853][ T4022] epc: 0000000000000bb0 ra : 0000003fccab09d0 sp > : 0000003fffa8c7b0 > [ 91.181861][ T4022] gp : 00000000000e8d70 tp : 0000003fccaaf820 t0 > : 000000000000001e > [ 91.182810][ T4022] t1 : 0000003fccab0bfc t2 : 000000000000000a s0 > : 0000003fffa8c850 > [ 91.183749][ T4022] s1 : 0000003fccab1070 a0 : 0000003fccab1070 a1 > : 0000003fffa8c8c8 > [ 91.184689][ T4022] a2 : 0000000000000001 a3 : 0000000000000020 a4 > : 0000000000000000 > [ 91.185620][ T4022] a5 : 0000000000000000 a6 : 0000003fcc9c4260 a7 > : fffffffffffffffe > [ 91.186566][ T4022] s2 : 0000000000000000 s3 : 0000003fffa8c8c8 s4 > : 0000003fccab1000 > [ 91.187500][ T4022] s5 : 0000003fccab1078 s6 : 0000003fffa8c8d0 s7 > : 0000000000000010 > [ 91.189672][ T4022] s8 : 0000000000000016 s9 : 0000000000000000 > s10: 0000003fffa8c8c8 > [ 91.190637][ T4022] s11: 0000000000000000 t3 : 0000000000000bb0 t4 > : 0000000000000000 > [ 91.191568][ T4022] t5 : 0000003fffa8c360 t6 : 0000000000000000 > [ 91.192389][ T4022] status: 8000000000004020 badaddr: > 0000000000000bb0 cause: 000000000000000c > [ 91.201573][ T1] Kernel panic - not syncing: Attempted to kill > init! exitcode=0x0000000b > [ 91.202906][ T1] CPU: 0 PID: 1 Comm: init Not tainted > 5.11.0-rc2-00012-g0983834a8393 #19 > [ 91.204139][ T1] Call Trace: > [ 91.204849][ T1] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0 > [ 91.206124][ T1] [<ffffffe00458b2d8>] show_stack+0x3a/0x46 > [ 91.207240][ T1] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180 > [ 91.208732][ T1] [<ffffffe00458b6a0>] panic+0x20a/0x5cc > [ 91.209890][ T1] [<ffffffe00002eea4>] do_exit+0x1846/0x1874 > [ 91.211052][ T1] [<ffffffe00002efdc>] do_group_exit+0xa0/0x192 > [ 91.212224][ T1] [<ffffffe000047d30>] get_signal+0x2d6/0x13dc > [ 91.213390][ T1] [<ffffffe000007eb0>] do_notify_resume+0xa8/0x912 > [ 91.214567][ T1] [<ffffffe00000559c>] ret_from_exception+0x0/0x14 > > The image is buildroot on 2020.11.x built with this script: > https://gist.githubusercontent.com/dvyukov/1a9a01ca2189e35175a021820c95b04d/raw/5c01d755e83f4eab0d56aa7dc84af3b2d5e80423/gistfile1.txt > > Readelf for init shows the following (is it that [10000+d7000] address > is not .text at all?): > > $ riscv64-linux-gnu-readelf --sections image/bin/busybox > There are 27 section headers, starting at offset 0xd7f20: > > Section Headers: > [Nr] Name Type Address Offset > Size EntSize Flags Link Info Align > [ 0] NULL 0000000000000000 00000000 > 0000000000000000 0000000000000000 0 0 0 > [ 1] .interp PROGBITS 0000000000010238 00000238 > 0000000000000021 0000000000000000 A 0 0 1 > [ 2] .note.ABI-tag NOTE 000000000001025c 0000025c > 0000000000000020 0000000000000000 A 0 0 4 > [ 3] .hash HASH 0000000000010280 00000280 > 00000000000009cc 0000000000000004 A 5 0 8 > [ 4] .gnu.hash GNU_HASH 0000000000010c50 00000c50 > 0000000000000ac8 0000000000000000 A 5 0 8 > [ 5] .dynsym DYNSYM 0000000000011718 00001718 > 00000000000021f0 0000000000000018 A 6 1 8 > [ 6] .dynstr STRTAB 0000000000013908 00003908 > 0000000000000c66 0000000000000000 A 0 0 1 > [ 7] .gnu.version VERSYM 000000000001456e 0000456e > 00000000000002d4 0000000000000002 A 5 0 2 > [ 8] .gnu.version_r VERNEED 0000000000014848 00004848 > 0000000000000050 0000000000000000 A 6 2 8 > [ 9] .rela.dyn RELA 0000000000014898 00004898 > 00000000000000c0 0000000000000018 A 5 0 8 > [10] .rela.plt RELA 0000000000014958 00004958 > 00000000000020a0 0000000000000018 AI 5 22 8 > [11] .plt PROGBITS 0000000000016a00 00006a00 > 00000000000015e0 0000000000000010 AX 0 0 16 > [12] .text PROGBITS 0000000000017fe0 00007fe0 > 00000000000a3668 0000000000000000 AX 0 0 4 > [13] .rodata PROGBITS 00000000000bb648 000ab648 > 000000000002b076 0000000000000000 A 0 0 8 > [14] .sdata2 PROGBITS 00000000000e66c0 000d66c0 > 0000000000000163 0000000000000000 A 0 0 8 > [15] .eh_frame_hdr PROGBITS 00000000000e6824 000d6824 > 0000000000000014 0000000000000000 A 0 0 4 > [16] .eh_frame PROGBITS 00000000000e6838 000d6838 > 000000000000002c 0000000000000000 A 0 0 8 > [17] .preinit_array PREINIT_ARRAY 00000000000e7df8 000d6df8 > 0000000000000008 0000000000000008 WA 0 0 1 > [18] .init_array INIT_ARRAY 00000000000e7e00 000d6e00 > 0000000000000008 0000000000000008 WA 0 0 8 > [19] .fini_array FINI_ARRAY 00000000000e7e08 000d6e08 > 0000000000000008 0000000000000008 WA 0 0 8 > [20] .dynamic DYNAMIC 00000000000e7e10 000d6e10 > 00000000000001f0 0000000000000010 WA 6 0 8 > [21] .data PROGBITS 00000000000e8000 000d7000 > 0000000000000240 0000000000000000 WA 0 0 8 > [22] .got PROGBITS 00000000000e8240 000d7240 > 0000000000000af8 0000000000000008 WA 0 0 8 > [23] .sdata PROGBITS 00000000000e8d38 000d7d38 > 0000000000000101 0000000000000000 WA 0 0 8 > [24] .sbss NOBITS 00000000000e8e40 000d7e39 > 000000000000017f 0000000000000000 WA 0 0 8 > [25] .bss NOBITS 00000000000e8fc0 000d7e39 > 00000000000005b0 0000000000000000 WA 0 0 8 > [26] .shstrtab STRTAB 0000000000000000 000d7e39 > 00000000000000e6 0000000000000000 0 0 1 > > > Before I spent more time on this, am I doing anything obviously wrong? > Is it a known issue? Are there any fresh working recipes? Humm.. I tried to use 2020.05 which Tobias used here: https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_riscv64-kernel.md#image But there is no make qemu_riscv64_virt_defconfig target... though I remember I tested these instructions at the time... To be precise I used 2020.11, I see there is now 2020.11.1 but I don't see any mentions of riscv in the log. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2021-01-14 10:24 ` Dmitry Vyukov 0 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-01-14 10:24 UTC (permalink / raw) To: Palmer Dabbelt Cc: Albert Ou, Bjorn Topel, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, linux-riscv, Tobias Klauser On Thu, Jan 14, 2021 at 10:23 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Thu, Jan 14, 2021 at 5:57 AM Palmer Dabbelt <palmerdabbelt@google.com> wrote: > > > > On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote: > > > On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote: > > >> > > >> On Dez 25 2020, Dmitry Vyukov wrote: > > >> > > >> > qemu-system-riscv64 \ > > >> > -machine virt -bios default -smp 1 -m 2G \ > > >> > -device virtio-blk-device,drive=hd0 \ > > >> > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \ > > >> > -kernel arch/riscv/boot/Image \ > > >> > -nographic \ > > >> > -device virtio-rng-device,rng=rng0 -object > > >> > rng-random,filename=/dev/urandom,id=rng0 \ > > >> > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > > >> > virtio-net-device,netdev=net0 \ > > >> > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > > >> > panic_on_warn=1 panic=86400" > > >> > > >> Do you get more output with earlycon=sbi? > > > > > > Hi Andreas, > > > > > > For defconfig+kvm_guest.config+ scripts/config -e KASAN -e > > > KASAN_INLINE it actually gave me more output: > > > > > > > > > OpenSBI v0.7 > > > ____ _____ ____ _____ > > > / __ \ / ____| _ \_ _| > > > | | | |_ __ ___ _ __ | (___ | |_) || | > > > | | | | '_ \ / _ \ '_ \ \___ \| _ < | | > > > | |__| | |_) | __/ | | |____) | |_) || |_ > > > \____/| .__/ \___|_| |_|_____/|____/_____| > > > | | > > > |_| > > > > > > Platform Name : QEMU Virt Machine > > > Platform HART Features : RV64ACDFIMSU > > > Current Hart : 0 > > > Firmware Base : 0x80000000 > > > Firmware Size : 132 KB > > > Runtime SBI Version : 0.2 > > > > > > MIDELEG : 0x0000000000000222 > > > MEDELEG : 0x000000000000b109 > > > PMP0 : 0x0000000080000000-0x000000008003ffff (A) > > > PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X) > > > [ 0.000000] Linux version 5.10.0-01370-g71c5f03154ac > > > (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc > > > (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17 > > > SMP Fri Dec 25 18:10:12 CET 2020 > > > [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 > > > [ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '') > > > [ 0.000000] printk: bootconsole [sbi0] enabled > > > [ 0.000000] efi: UEFI not found. > > > [ 0.000000] Zone ranges: > > > [ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000ffffffff] > > > [ 0.000000] Normal empty > > > [ 0.000000] Movable zone start for each node > > > [ 0.000000] Early memory node ranges > > > [ 0.000000] node 0: [mem 0x0000000080200000-0x00000000ffffffff] > > > [ 0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff] > > > [ 0.000000] SBI specification v0.2 detected > > > [ 0.000000] SBI implementation ID=0x1 Version=0x7 > > > [ 0.000000] SBI v0.2 TIME extension detected > > > [ 0.000000] SBI v0.2 IPI extension detected > > > [ 0.000000] SBI v0.2 RFENCE extension detected > > > [ 0.000000] software IO TLB: mapped [mem > > > 0x00000000fa3f9000-0x00000000fe3f9000] (64MB) > > > [ 0.000000] Unable to handle kernel paging request at virtual > > > address dfffffc810040000 > > > [ 0.000000] Oops [#1] > > > [ 0.000000] Modules linked in: > > > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted > > > 5.10.0-01370-g71c5f03154ac #17 > > > [ 0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0 > > > [ 0.000000] gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 : > > > dfffffc810040000 > > > [ 0.000000] t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 : > > > ffffffe001603f50 > > > [ 0.000000] s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 : > > > 0000000000000000 > > > [ 0.000000] a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 : > > > 0000000000000000 > > > [ 0.000000] a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 : > > > 0000000000000900 > > > [ 0.000000] s2 : dfffffc820000000 s3 : dfffffc800000000 s4 : > > > 0000000000000001 > > > [ 0.000000] s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 : > > > dfffffc810040000 > > > [ 0.000000] s8 : 0000000000000080 s9 : ffffffffffffffff s10: > > > ffffffe07a119000 > > > [ 0.000000] s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 : > > > 0000000000000001 > > > [ 0.000000] t5 : ffffffc4001c150a t6 : ffffffe001603be8 > > > [ 0.000000] status: 0000000000000100 badaddr: dfffffc810040000 > > > cause: 000000000000000f > > > [ 0.000000] random: get_random_bytes called from > > > oops_exit+0x30/0x58 with crng_init=0 > > > [ 0.000000] ---[ end trace 0000000000000000 ]--- > > > [ 0.000000] Kernel panic - not syncing: Fatal exception > > > [ 0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > > > > > > > > But I first tried with a the kernel image I had in the dir, I think it > > > was this config (no KASAN): > > > https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt > > > > > > and earlycon=sbi did not change anything (no output after OpenSBI). > > > So potentially there are 2 different problems. > > > > Thanks for reporting this. Looks like I'd forgotten to add a kasan config to > > my tests. There's one in there now, and it's passing as of the fix that Nylon > > posted. > > I can boot the KASAN kernel now on riscv/fixes. > > Next problem: I've got only to: > > [ 90.498967][ T1] Run /sbin/init as init process > [ 91.164353][ T4022] init[4022]: unhandled signal 11 code 0x1 at > 0x0000000000000bb0 in busybox[10000+d7000] > [ 91.179640][ T4022] CPU: 1 PID: 4022 Comm: init Not tainted > 5.11.0-rc2-00012-g0983834a8393 #19 > [ 91.180853][ T4022] epc: 0000000000000bb0 ra : 0000003fccab09d0 sp > : 0000003fffa8c7b0 > [ 91.181861][ T4022] gp : 00000000000e8d70 tp : 0000003fccaaf820 t0 > : 000000000000001e > [ 91.182810][ T4022] t1 : 0000003fccab0bfc t2 : 000000000000000a s0 > : 0000003fffa8c850 > [ 91.183749][ T4022] s1 : 0000003fccab1070 a0 : 0000003fccab1070 a1 > : 0000003fffa8c8c8 > [ 91.184689][ T4022] a2 : 0000000000000001 a3 : 0000000000000020 a4 > : 0000000000000000 > [ 91.185620][ T4022] a5 : 0000000000000000 a6 : 0000003fcc9c4260 a7 > : fffffffffffffffe > [ 91.186566][ T4022] s2 : 0000000000000000 s3 : 0000003fffa8c8c8 s4 > : 0000003fccab1000 > [ 91.187500][ T4022] s5 : 0000003fccab1078 s6 : 0000003fffa8c8d0 s7 > : 0000000000000010 > [ 91.189672][ T4022] s8 : 0000000000000016 s9 : 0000000000000000 > s10: 0000003fffa8c8c8 > [ 91.190637][ T4022] s11: 0000000000000000 t3 : 0000000000000bb0 t4 > : 0000000000000000 > [ 91.191568][ T4022] t5 : 0000003fffa8c360 t6 : 0000000000000000 > [ 91.192389][ T4022] status: 8000000000004020 badaddr: > 0000000000000bb0 cause: 000000000000000c > [ 91.201573][ T1] Kernel panic - not syncing: Attempted to kill > init! exitcode=0x0000000b > [ 91.202906][ T1] CPU: 0 PID: 1 Comm: init Not tainted > 5.11.0-rc2-00012-g0983834a8393 #19 > [ 91.204139][ T1] Call Trace: > [ 91.204849][ T1] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0 > [ 91.206124][ T1] [<ffffffe00458b2d8>] show_stack+0x3a/0x46 > [ 91.207240][ T1] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180 > [ 91.208732][ T1] [<ffffffe00458b6a0>] panic+0x20a/0x5cc > [ 91.209890][ T1] [<ffffffe00002eea4>] do_exit+0x1846/0x1874 > [ 91.211052][ T1] [<ffffffe00002efdc>] do_group_exit+0xa0/0x192 > [ 91.212224][ T1] [<ffffffe000047d30>] get_signal+0x2d6/0x13dc > [ 91.213390][ T1] [<ffffffe000007eb0>] do_notify_resume+0xa8/0x912 > [ 91.214567][ T1] [<ffffffe00000559c>] ret_from_exception+0x0/0x14 > > The image is buildroot on 2020.11.x built with this script: > https://gist.githubusercontent.com/dvyukov/1a9a01ca2189e35175a021820c95b04d/raw/5c01d755e83f4eab0d56aa7dc84af3b2d5e80423/gistfile1.txt > > Readelf for init shows the following (is it that [10000+d7000] address > is not .text at all?): > > $ riscv64-linux-gnu-readelf --sections image/bin/busybox > There are 27 section headers, starting at offset 0xd7f20: > > Section Headers: > [Nr] Name Type Address Offset > Size EntSize Flags Link Info Align > [ 0] NULL 0000000000000000 00000000 > 0000000000000000 0000000000000000 0 0 0 > [ 1] .interp PROGBITS 0000000000010238 00000238 > 0000000000000021 0000000000000000 A 0 0 1 > [ 2] .note.ABI-tag NOTE 000000000001025c 0000025c > 0000000000000020 0000000000000000 A 0 0 4 > [ 3] .hash HASH 0000000000010280 00000280 > 00000000000009cc 0000000000000004 A 5 0 8 > [ 4] .gnu.hash GNU_HASH 0000000000010c50 00000c50 > 0000000000000ac8 0000000000000000 A 5 0 8 > [ 5] .dynsym DYNSYM 0000000000011718 00001718 > 00000000000021f0 0000000000000018 A 6 1 8 > [ 6] .dynstr STRTAB 0000000000013908 00003908 > 0000000000000c66 0000000000000000 A 0 0 1 > [ 7] .gnu.version VERSYM 000000000001456e 0000456e > 00000000000002d4 0000000000000002 A 5 0 2 > [ 8] .gnu.version_r VERNEED 0000000000014848 00004848 > 0000000000000050 0000000000000000 A 6 2 8 > [ 9] .rela.dyn RELA 0000000000014898 00004898 > 00000000000000c0 0000000000000018 A 5 0 8 > [10] .rela.plt RELA 0000000000014958 00004958 > 00000000000020a0 0000000000000018 AI 5 22 8 > [11] .plt PROGBITS 0000000000016a00 00006a00 > 00000000000015e0 0000000000000010 AX 0 0 16 > [12] .text PROGBITS 0000000000017fe0 00007fe0 > 00000000000a3668 0000000000000000 AX 0 0 4 > [13] .rodata PROGBITS 00000000000bb648 000ab648 > 000000000002b076 0000000000000000 A 0 0 8 > [14] .sdata2 PROGBITS 00000000000e66c0 000d66c0 > 0000000000000163 0000000000000000 A 0 0 8 > [15] .eh_frame_hdr PROGBITS 00000000000e6824 000d6824 > 0000000000000014 0000000000000000 A 0 0 4 > [16] .eh_frame PROGBITS 00000000000e6838 000d6838 > 000000000000002c 0000000000000000 A 0 0 8 > [17] .preinit_array PREINIT_ARRAY 00000000000e7df8 000d6df8 > 0000000000000008 0000000000000008 WA 0 0 1 > [18] .init_array INIT_ARRAY 00000000000e7e00 000d6e00 > 0000000000000008 0000000000000008 WA 0 0 8 > [19] .fini_array FINI_ARRAY 00000000000e7e08 000d6e08 > 0000000000000008 0000000000000008 WA 0 0 8 > [20] .dynamic DYNAMIC 00000000000e7e10 000d6e10 > 00000000000001f0 0000000000000010 WA 6 0 8 > [21] .data PROGBITS 00000000000e8000 000d7000 > 0000000000000240 0000000000000000 WA 0 0 8 > [22] .got PROGBITS 00000000000e8240 000d7240 > 0000000000000af8 0000000000000008 WA 0 0 8 > [23] .sdata PROGBITS 00000000000e8d38 000d7d38 > 0000000000000101 0000000000000000 WA 0 0 8 > [24] .sbss NOBITS 00000000000e8e40 000d7e39 > 000000000000017f 0000000000000000 WA 0 0 8 > [25] .bss NOBITS 00000000000e8fc0 000d7e39 > 00000000000005b0 0000000000000000 WA 0 0 8 > [26] .shstrtab STRTAB 0000000000000000 000d7e39 > 00000000000000e6 0000000000000000 0 0 1 > > > Before I spent more time on this, am I doing anything obviously wrong? > Is it a known issue? Are there any fresh working recipes? Humm.. I tried to use 2020.05 which Tobias used here: https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_riscv64-kernel.md#image But there is no make qemu_riscv64_virt_defconfig target... though I remember I tested these instructions at the time... To be precise I used 2020.11, I see there is now 2020.11.1 but I don't see any mentions of riscv in the log. _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2021-01-14 10:24 ` Dmitry Vyukov @ 2021-01-14 11:24 ` Dmitry Vyukov -1 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-01-14 11:24 UTC (permalink / raw) To: Palmer Dabbelt Cc: Andreas Schwab, Paul Walmsley, Albert Ou, linux-riscv, LKML, nylon7, Bjorn Topel, Tobias Klauser, syzkaller On Thu, Jan 14, 2021 at 11:24 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Thu, Jan 14, 2021 at 10:23 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > On Thu, Jan 14, 2021 at 5:57 AM Palmer Dabbelt <palmerdabbelt@google.com> wrote: > > > > > > On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote: > > > > On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote: > > > >> > > > >> On Dez 25 2020, Dmitry Vyukov wrote: > > > >> > > > >> > qemu-system-riscv64 \ > > > >> > -machine virt -bios default -smp 1 -m 2G \ > > > >> > -device virtio-blk-device,drive=hd0 \ > > > >> > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \ > > > >> > -kernel arch/riscv/boot/Image \ > > > >> > -nographic \ > > > >> > -device virtio-rng-device,rng=rng0 -object > > > >> > rng-random,filename=/dev/urandom,id=rng0 \ > > > >> > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > > > >> > virtio-net-device,netdev=net0 \ > > > >> > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > > > >> > panic_on_warn=1 panic=86400" > > > >> > > > >> Do you get more output with earlycon=sbi? > > > > > > > > Hi Andreas, > > > > > > > > For defconfig+kvm_guest.config+ scripts/config -e KASAN -e > > > > KASAN_INLINE it actually gave me more output: > > > > > > > > > > > > OpenSBI v0.7 > > > > ____ _____ ____ _____ > > > > / __ \ / ____| _ \_ _| > > > > | | | |_ __ ___ _ __ | (___ | |_) || | > > > > | | | | '_ \ / _ \ '_ \ \___ \| _ < | | > > > > | |__| | |_) | __/ | | |____) | |_) || |_ > > > > \____/| .__/ \___|_| |_|_____/|____/_____| > > > > | | > > > > |_| > > > > > > > > Platform Name : QEMU Virt Machine > > > > Platform HART Features : RV64ACDFIMSU > > > > Current Hart : 0 > > > > Firmware Base : 0x80000000 > > > > Firmware Size : 132 KB > > > > Runtime SBI Version : 0.2 > > > > > > > > MIDELEG : 0x0000000000000222 > > > > MEDELEG : 0x000000000000b109 > > > > PMP0 : 0x0000000080000000-0x000000008003ffff (A) > > > > PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X) > > > > [ 0.000000] Linux version 5.10.0-01370-g71c5f03154ac > > > > (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc > > > > (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17 > > > > SMP Fri Dec 25 18:10:12 CET 2020 > > > > [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 > > > > [ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '') > > > > [ 0.000000] printk: bootconsole [sbi0] enabled > > > > [ 0.000000] efi: UEFI not found. > > > > [ 0.000000] Zone ranges: > > > > [ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000ffffffff] > > > > [ 0.000000] Normal empty > > > > [ 0.000000] Movable zone start for each node > > > > [ 0.000000] Early memory node ranges > > > > [ 0.000000] node 0: [mem 0x0000000080200000-0x00000000ffffffff] > > > > [ 0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff] > > > > [ 0.000000] SBI specification v0.2 detected > > > > [ 0.000000] SBI implementation ID=0x1 Version=0x7 > > > > [ 0.000000] SBI v0.2 TIME extension detected > > > > [ 0.000000] SBI v0.2 IPI extension detected > > > > [ 0.000000] SBI v0.2 RFENCE extension detected > > > > [ 0.000000] software IO TLB: mapped [mem > > > > 0x00000000fa3f9000-0x00000000fe3f9000] (64MB) > > > > [ 0.000000] Unable to handle kernel paging request at virtual > > > > address dfffffc810040000 > > > > [ 0.000000] Oops [#1] > > > > [ 0.000000] Modules linked in: > > > > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted > > > > 5.10.0-01370-g71c5f03154ac #17 > > > > [ 0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0 > > > > [ 0.000000] gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 : > > > > dfffffc810040000 > > > > [ 0.000000] t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 : > > > > ffffffe001603f50 > > > > [ 0.000000] s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 : > > > > 0000000000000000 > > > > [ 0.000000] a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 : > > > > 0000000000000000 > > > > [ 0.000000] a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 : > > > > 0000000000000900 > > > > [ 0.000000] s2 : dfffffc820000000 s3 : dfffffc800000000 s4 : > > > > 0000000000000001 > > > > [ 0.000000] s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 : > > > > dfffffc810040000 > > > > [ 0.000000] s8 : 0000000000000080 s9 : ffffffffffffffff s10: > > > > ffffffe07a119000 > > > > [ 0.000000] s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 : > > > > 0000000000000001 > > > > [ 0.000000] t5 : ffffffc4001c150a t6 : ffffffe001603be8 > > > > [ 0.000000] status: 0000000000000100 badaddr: dfffffc810040000 > > > > cause: 000000000000000f > > > > [ 0.000000] random: get_random_bytes called from > > > > oops_exit+0x30/0x58 with crng_init=0 > > > > [ 0.000000] ---[ end trace 0000000000000000 ]--- > > > > [ 0.000000] Kernel panic - not syncing: Fatal exception > > > > [ 0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > > > > > > > > > > > But I first tried with a the kernel image I had in the dir, I think it > > > > was this config (no KASAN): > > > > https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt > > > > > > > > and earlycon=sbi did not change anything (no output after OpenSBI). > > > > So potentially there are 2 different problems. > > > > > > Thanks for reporting this. Looks like I'd forgotten to add a kasan config to > > > my tests. There's one in there now, and it's passing as of the fix that Nylon > > > posted. > > > > I can boot the KASAN kernel now on riscv/fixes. > > > > Next problem: I've got only to: > > > > [ 90.498967][ T1] Run /sbin/init as init process > > [ 91.164353][ T4022] init[4022]: unhandled signal 11 code 0x1 at > > 0x0000000000000bb0 in busybox[10000+d7000] > > [ 91.179640][ T4022] CPU: 1 PID: 4022 Comm: init Not tainted > > 5.11.0-rc2-00012-g0983834a8393 #19 > > [ 91.180853][ T4022] epc: 0000000000000bb0 ra : 0000003fccab09d0 sp > > : 0000003fffa8c7b0 > > [ 91.181861][ T4022] gp : 00000000000e8d70 tp : 0000003fccaaf820 t0 > > : 000000000000001e > > [ 91.182810][ T4022] t1 : 0000003fccab0bfc t2 : 000000000000000a s0 > > : 0000003fffa8c850 > > [ 91.183749][ T4022] s1 : 0000003fccab1070 a0 : 0000003fccab1070 a1 > > : 0000003fffa8c8c8 > > [ 91.184689][ T4022] a2 : 0000000000000001 a3 : 0000000000000020 a4 > > : 0000000000000000 > > [ 91.185620][ T4022] a5 : 0000000000000000 a6 : 0000003fcc9c4260 a7 > > : fffffffffffffffe > > [ 91.186566][ T4022] s2 : 0000000000000000 s3 : 0000003fffa8c8c8 s4 > > : 0000003fccab1000 > > [ 91.187500][ T4022] s5 : 0000003fccab1078 s6 : 0000003fffa8c8d0 s7 > > : 0000000000000010 > > [ 91.189672][ T4022] s8 : 0000000000000016 s9 : 0000000000000000 > > s10: 0000003fffa8c8c8 > > [ 91.190637][ T4022] s11: 0000000000000000 t3 : 0000000000000bb0 t4 > > : 0000000000000000 > > [ 91.191568][ T4022] t5 : 0000003fffa8c360 t6 : 0000000000000000 > > [ 91.192389][ T4022] status: 8000000000004020 badaddr: > > 0000000000000bb0 cause: 000000000000000c > > [ 91.201573][ T1] Kernel panic - not syncing: Attempted to kill > > init! exitcode=0x0000000b > > [ 91.202906][ T1] CPU: 0 PID: 1 Comm: init Not tainted > > 5.11.0-rc2-00012-g0983834a8393 #19 > > [ 91.204139][ T1] Call Trace: > > [ 91.204849][ T1] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0 > > [ 91.206124][ T1] [<ffffffe00458b2d8>] show_stack+0x3a/0x46 > > [ 91.207240][ T1] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180 > > [ 91.208732][ T1] [<ffffffe00458b6a0>] panic+0x20a/0x5cc > > [ 91.209890][ T1] [<ffffffe00002eea4>] do_exit+0x1846/0x1874 > > [ 91.211052][ T1] [<ffffffe00002efdc>] do_group_exit+0xa0/0x192 > > [ 91.212224][ T1] [<ffffffe000047d30>] get_signal+0x2d6/0x13dc > > [ 91.213390][ T1] [<ffffffe000007eb0>] do_notify_resume+0xa8/0x912 > > [ 91.214567][ T1] [<ffffffe00000559c>] ret_from_exception+0x0/0x14 > > > > The image is buildroot on 2020.11.x built with this script: > > https://gist.githubusercontent.com/dvyukov/1a9a01ca2189e35175a021820c95b04d/raw/5c01d755e83f4eab0d56aa7dc84af3b2d5e80423/gistfile1.txt > > > > Readelf for init shows the following (is it that [10000+d7000] address > > is not .text at all?): > > > > $ riscv64-linux-gnu-readelf --sections image/bin/busybox > > There are 27 section headers, starting at offset 0xd7f20: > > > > Section Headers: > > [Nr] Name Type Address Offset > > Size EntSize Flags Link Info Align > > [ 0] NULL 0000000000000000 00000000 > > 0000000000000000 0000000000000000 0 0 0 > > [ 1] .interp PROGBITS 0000000000010238 00000238 > > 0000000000000021 0000000000000000 A 0 0 1 > > [ 2] .note.ABI-tag NOTE 000000000001025c 0000025c > > 0000000000000020 0000000000000000 A 0 0 4 > > [ 3] .hash HASH 0000000000010280 00000280 > > 00000000000009cc 0000000000000004 A 5 0 8 > > [ 4] .gnu.hash GNU_HASH 0000000000010c50 00000c50 > > 0000000000000ac8 0000000000000000 A 5 0 8 > > [ 5] .dynsym DYNSYM 0000000000011718 00001718 > > 00000000000021f0 0000000000000018 A 6 1 8 > > [ 6] .dynstr STRTAB 0000000000013908 00003908 > > 0000000000000c66 0000000000000000 A 0 0 1 > > [ 7] .gnu.version VERSYM 000000000001456e 0000456e > > 00000000000002d4 0000000000000002 A 5 0 2 > > [ 8] .gnu.version_r VERNEED 0000000000014848 00004848 > > 0000000000000050 0000000000000000 A 6 2 8 > > [ 9] .rela.dyn RELA 0000000000014898 00004898 > > 00000000000000c0 0000000000000018 A 5 0 8 > > [10] .rela.plt RELA 0000000000014958 00004958 > > 00000000000020a0 0000000000000018 AI 5 22 8 > > [11] .plt PROGBITS 0000000000016a00 00006a00 > > 00000000000015e0 0000000000000010 AX 0 0 16 > > [12] .text PROGBITS 0000000000017fe0 00007fe0 > > 00000000000a3668 0000000000000000 AX 0 0 4 > > [13] .rodata PROGBITS 00000000000bb648 000ab648 > > 000000000002b076 0000000000000000 A 0 0 8 > > [14] .sdata2 PROGBITS 00000000000e66c0 000d66c0 > > 0000000000000163 0000000000000000 A 0 0 8 > > [15] .eh_frame_hdr PROGBITS 00000000000e6824 000d6824 > > 0000000000000014 0000000000000000 A 0 0 4 > > [16] .eh_frame PROGBITS 00000000000e6838 000d6838 > > 000000000000002c 0000000000000000 A 0 0 8 > > [17] .preinit_array PREINIT_ARRAY 00000000000e7df8 000d6df8 > > 0000000000000008 0000000000000008 WA 0 0 1 > > [18] .init_array INIT_ARRAY 00000000000e7e00 000d6e00 > > 0000000000000008 0000000000000008 WA 0 0 8 > > [19] .fini_array FINI_ARRAY 00000000000e7e08 000d6e08 > > 0000000000000008 0000000000000008 WA 0 0 8 > > [20] .dynamic DYNAMIC 00000000000e7e10 000d6e10 > > 00000000000001f0 0000000000000010 WA 6 0 8 > > [21] .data PROGBITS 00000000000e8000 000d7000 > > 0000000000000240 0000000000000000 WA 0 0 8 > > [22] .got PROGBITS 00000000000e8240 000d7240 > > 0000000000000af8 0000000000000008 WA 0 0 8 > > [23] .sdata PROGBITS 00000000000e8d38 000d7d38 > > 0000000000000101 0000000000000000 WA 0 0 8 > > [24] .sbss NOBITS 00000000000e8e40 000d7e39 > > 000000000000017f 0000000000000000 WA 0 0 8 > > [25] .bss NOBITS 00000000000e8fc0 000d7e39 > > 00000000000005b0 0000000000000000 WA 0 0 8 > > [26] .shstrtab STRTAB 0000000000000000 000d7e39 > > 00000000000000e6 0000000000000000 0 0 1 > > > > > > Before I spent more time on this, am I doing anything obviously wrong? > > Is it a known issue? Are there any fresh working recipes? > > Humm.. I tried to use 2020.05 which Tobias used here: > https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_riscv64-kernel.md#image > But there is no make qemu_riscv64_virt_defconfig target... though I > remember I tested these instructions at the time... > > To be precise I used 2020.11, I see there is now 2020.11.1 but I don't > see any mentions of riscv in the log. For completeness, kernel config I used is: https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2021-01-14 11:24 ` Dmitry Vyukov 0 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-01-14 11:24 UTC (permalink / raw) To: Palmer Dabbelt Cc: Albert Ou, Bjorn Topel, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, linux-riscv, Tobias Klauser On Thu, Jan 14, 2021 at 11:24 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Thu, Jan 14, 2021 at 10:23 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > On Thu, Jan 14, 2021 at 5:57 AM Palmer Dabbelt <palmerdabbelt@google.com> wrote: > > > > > > On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote: > > > > On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote: > > > >> > > > >> On Dez 25 2020, Dmitry Vyukov wrote: > > > >> > > > >> > qemu-system-riscv64 \ > > > >> > -machine virt -bios default -smp 1 -m 2G \ > > > >> > -device virtio-blk-device,drive=hd0 \ > > > >> > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \ > > > >> > -kernel arch/riscv/boot/Image \ > > > >> > -nographic \ > > > >> > -device virtio-rng-device,rng=rng0 -object > > > >> > rng-random,filename=/dev/urandom,id=rng0 \ > > > >> > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > > > >> > virtio-net-device,netdev=net0 \ > > > >> > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > > > >> > panic_on_warn=1 panic=86400" > > > >> > > > >> Do you get more output with earlycon=sbi? > > > > > > > > Hi Andreas, > > > > > > > > For defconfig+kvm_guest.config+ scripts/config -e KASAN -e > > > > KASAN_INLINE it actually gave me more output: > > > > > > > > > > > > OpenSBI v0.7 > > > > ____ _____ ____ _____ > > > > / __ \ / ____| _ \_ _| > > > > | | | |_ __ ___ _ __ | (___ | |_) || | > > > > | | | | '_ \ / _ \ '_ \ \___ \| _ < | | > > > > | |__| | |_) | __/ | | |____) | |_) || |_ > > > > \____/| .__/ \___|_| |_|_____/|____/_____| > > > > | | > > > > |_| > > > > > > > > Platform Name : QEMU Virt Machine > > > > Platform HART Features : RV64ACDFIMSU > > > > Current Hart : 0 > > > > Firmware Base : 0x80000000 > > > > Firmware Size : 132 KB > > > > Runtime SBI Version : 0.2 > > > > > > > > MIDELEG : 0x0000000000000222 > > > > MEDELEG : 0x000000000000b109 > > > > PMP0 : 0x0000000080000000-0x000000008003ffff (A) > > > > PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X) > > > > [ 0.000000] Linux version 5.10.0-01370-g71c5f03154ac > > > > (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc > > > > (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17 > > > > SMP Fri Dec 25 18:10:12 CET 2020 > > > > [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 > > > > [ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '') > > > > [ 0.000000] printk: bootconsole [sbi0] enabled > > > > [ 0.000000] efi: UEFI not found. > > > > [ 0.000000] Zone ranges: > > > > [ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000ffffffff] > > > > [ 0.000000] Normal empty > > > > [ 0.000000] Movable zone start for each node > > > > [ 0.000000] Early memory node ranges > > > > [ 0.000000] node 0: [mem 0x0000000080200000-0x00000000ffffffff] > > > > [ 0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff] > > > > [ 0.000000] SBI specification v0.2 detected > > > > [ 0.000000] SBI implementation ID=0x1 Version=0x7 > > > > [ 0.000000] SBI v0.2 TIME extension detected > > > > [ 0.000000] SBI v0.2 IPI extension detected > > > > [ 0.000000] SBI v0.2 RFENCE extension detected > > > > [ 0.000000] software IO TLB: mapped [mem > > > > 0x00000000fa3f9000-0x00000000fe3f9000] (64MB) > > > > [ 0.000000] Unable to handle kernel paging request at virtual > > > > address dfffffc810040000 > > > > [ 0.000000] Oops [#1] > > > > [ 0.000000] Modules linked in: > > > > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted > > > > 5.10.0-01370-g71c5f03154ac #17 > > > > [ 0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0 > > > > [ 0.000000] gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 : > > > > dfffffc810040000 > > > > [ 0.000000] t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 : > > > > ffffffe001603f50 > > > > [ 0.000000] s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 : > > > > 0000000000000000 > > > > [ 0.000000] a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 : > > > > 0000000000000000 > > > > [ 0.000000] a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 : > > > > 0000000000000900 > > > > [ 0.000000] s2 : dfffffc820000000 s3 : dfffffc800000000 s4 : > > > > 0000000000000001 > > > > [ 0.000000] s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 : > > > > dfffffc810040000 > > > > [ 0.000000] s8 : 0000000000000080 s9 : ffffffffffffffff s10: > > > > ffffffe07a119000 > > > > [ 0.000000] s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 : > > > > 0000000000000001 > > > > [ 0.000000] t5 : ffffffc4001c150a t6 : ffffffe001603be8 > > > > [ 0.000000] status: 0000000000000100 badaddr: dfffffc810040000 > > > > cause: 000000000000000f > > > > [ 0.000000] random: get_random_bytes called from > > > > oops_exit+0x30/0x58 with crng_init=0 > > > > [ 0.000000] ---[ end trace 0000000000000000 ]--- > > > > [ 0.000000] Kernel panic - not syncing: Fatal exception > > > > [ 0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > > > > > > > > > > > But I first tried with a the kernel image I had in the dir, I think it > > > > was this config (no KASAN): > > > > https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt > > > > > > > > and earlycon=sbi did not change anything (no output after OpenSBI). > > > > So potentially there are 2 different problems. > > > > > > Thanks for reporting this. Looks like I'd forgotten to add a kasan config to > > > my tests. There's one in there now, and it's passing as of the fix that Nylon > > > posted. > > > > I can boot the KASAN kernel now on riscv/fixes. > > > > Next problem: I've got only to: > > > > [ 90.498967][ T1] Run /sbin/init as init process > > [ 91.164353][ T4022] init[4022]: unhandled signal 11 code 0x1 at > > 0x0000000000000bb0 in busybox[10000+d7000] > > [ 91.179640][ T4022] CPU: 1 PID: 4022 Comm: init Not tainted > > 5.11.0-rc2-00012-g0983834a8393 #19 > > [ 91.180853][ T4022] epc: 0000000000000bb0 ra : 0000003fccab09d0 sp > > : 0000003fffa8c7b0 > > [ 91.181861][ T4022] gp : 00000000000e8d70 tp : 0000003fccaaf820 t0 > > : 000000000000001e > > [ 91.182810][ T4022] t1 : 0000003fccab0bfc t2 : 000000000000000a s0 > > : 0000003fffa8c850 > > [ 91.183749][ T4022] s1 : 0000003fccab1070 a0 : 0000003fccab1070 a1 > > : 0000003fffa8c8c8 > > [ 91.184689][ T4022] a2 : 0000000000000001 a3 : 0000000000000020 a4 > > : 0000000000000000 > > [ 91.185620][ T4022] a5 : 0000000000000000 a6 : 0000003fcc9c4260 a7 > > : fffffffffffffffe > > [ 91.186566][ T4022] s2 : 0000000000000000 s3 : 0000003fffa8c8c8 s4 > > : 0000003fccab1000 > > [ 91.187500][ T4022] s5 : 0000003fccab1078 s6 : 0000003fffa8c8d0 s7 > > : 0000000000000010 > > [ 91.189672][ T4022] s8 : 0000000000000016 s9 : 0000000000000000 > > s10: 0000003fffa8c8c8 > > [ 91.190637][ T4022] s11: 0000000000000000 t3 : 0000000000000bb0 t4 > > : 0000000000000000 > > [ 91.191568][ T4022] t5 : 0000003fffa8c360 t6 : 0000000000000000 > > [ 91.192389][ T4022] status: 8000000000004020 badaddr: > > 0000000000000bb0 cause: 000000000000000c > > [ 91.201573][ T1] Kernel panic - not syncing: Attempted to kill > > init! exitcode=0x0000000b > > [ 91.202906][ T1] CPU: 0 PID: 1 Comm: init Not tainted > > 5.11.0-rc2-00012-g0983834a8393 #19 > > [ 91.204139][ T1] Call Trace: > > [ 91.204849][ T1] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0 > > [ 91.206124][ T1] [<ffffffe00458b2d8>] show_stack+0x3a/0x46 > > [ 91.207240][ T1] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180 > > [ 91.208732][ T1] [<ffffffe00458b6a0>] panic+0x20a/0x5cc > > [ 91.209890][ T1] [<ffffffe00002eea4>] do_exit+0x1846/0x1874 > > [ 91.211052][ T1] [<ffffffe00002efdc>] do_group_exit+0xa0/0x192 > > [ 91.212224][ T1] [<ffffffe000047d30>] get_signal+0x2d6/0x13dc > > [ 91.213390][ T1] [<ffffffe000007eb0>] do_notify_resume+0xa8/0x912 > > [ 91.214567][ T1] [<ffffffe00000559c>] ret_from_exception+0x0/0x14 > > > > The image is buildroot on 2020.11.x built with this script: > > https://gist.githubusercontent.com/dvyukov/1a9a01ca2189e35175a021820c95b04d/raw/5c01d755e83f4eab0d56aa7dc84af3b2d5e80423/gistfile1.txt > > > > Readelf for init shows the following (is it that [10000+d7000] address > > is not .text at all?): > > > > $ riscv64-linux-gnu-readelf --sections image/bin/busybox > > There are 27 section headers, starting at offset 0xd7f20: > > > > Section Headers: > > [Nr] Name Type Address Offset > > Size EntSize Flags Link Info Align > > [ 0] NULL 0000000000000000 00000000 > > 0000000000000000 0000000000000000 0 0 0 > > [ 1] .interp PROGBITS 0000000000010238 00000238 > > 0000000000000021 0000000000000000 A 0 0 1 > > [ 2] .note.ABI-tag NOTE 000000000001025c 0000025c > > 0000000000000020 0000000000000000 A 0 0 4 > > [ 3] .hash HASH 0000000000010280 00000280 > > 00000000000009cc 0000000000000004 A 5 0 8 > > [ 4] .gnu.hash GNU_HASH 0000000000010c50 00000c50 > > 0000000000000ac8 0000000000000000 A 5 0 8 > > [ 5] .dynsym DYNSYM 0000000000011718 00001718 > > 00000000000021f0 0000000000000018 A 6 1 8 > > [ 6] .dynstr STRTAB 0000000000013908 00003908 > > 0000000000000c66 0000000000000000 A 0 0 1 > > [ 7] .gnu.version VERSYM 000000000001456e 0000456e > > 00000000000002d4 0000000000000002 A 5 0 2 > > [ 8] .gnu.version_r VERNEED 0000000000014848 00004848 > > 0000000000000050 0000000000000000 A 6 2 8 > > [ 9] .rela.dyn RELA 0000000000014898 00004898 > > 00000000000000c0 0000000000000018 A 5 0 8 > > [10] .rela.plt RELA 0000000000014958 00004958 > > 00000000000020a0 0000000000000018 AI 5 22 8 > > [11] .plt PROGBITS 0000000000016a00 00006a00 > > 00000000000015e0 0000000000000010 AX 0 0 16 > > [12] .text PROGBITS 0000000000017fe0 00007fe0 > > 00000000000a3668 0000000000000000 AX 0 0 4 > > [13] .rodata PROGBITS 00000000000bb648 000ab648 > > 000000000002b076 0000000000000000 A 0 0 8 > > [14] .sdata2 PROGBITS 00000000000e66c0 000d66c0 > > 0000000000000163 0000000000000000 A 0 0 8 > > [15] .eh_frame_hdr PROGBITS 00000000000e6824 000d6824 > > 0000000000000014 0000000000000000 A 0 0 4 > > [16] .eh_frame PROGBITS 00000000000e6838 000d6838 > > 000000000000002c 0000000000000000 A 0 0 8 > > [17] .preinit_array PREINIT_ARRAY 00000000000e7df8 000d6df8 > > 0000000000000008 0000000000000008 WA 0 0 1 > > [18] .init_array INIT_ARRAY 00000000000e7e00 000d6e00 > > 0000000000000008 0000000000000008 WA 0 0 8 > > [19] .fini_array FINI_ARRAY 00000000000e7e08 000d6e08 > > 0000000000000008 0000000000000008 WA 0 0 8 > > [20] .dynamic DYNAMIC 00000000000e7e10 000d6e10 > > 00000000000001f0 0000000000000010 WA 6 0 8 > > [21] .data PROGBITS 00000000000e8000 000d7000 > > 0000000000000240 0000000000000000 WA 0 0 8 > > [22] .got PROGBITS 00000000000e8240 000d7240 > > 0000000000000af8 0000000000000008 WA 0 0 8 > > [23] .sdata PROGBITS 00000000000e8d38 000d7d38 > > 0000000000000101 0000000000000000 WA 0 0 8 > > [24] .sbss NOBITS 00000000000e8e40 000d7e39 > > 000000000000017f 0000000000000000 WA 0 0 8 > > [25] .bss NOBITS 00000000000e8fc0 000d7e39 > > 00000000000005b0 0000000000000000 WA 0 0 8 > > [26] .shstrtab STRTAB 0000000000000000 000d7e39 > > 00000000000000e6 0000000000000000 0 0 1 > > > > > > Before I spent more time on this, am I doing anything obviously wrong? > > Is it a known issue? Are there any fresh working recipes? > > Humm.. I tried to use 2020.05 which Tobias used here: > https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_riscv64-kernel.md#image > But there is no make qemu_riscv64_virt_defconfig target... though I > remember I tested these instructions at the time... > > To be precise I used 2020.11, I see there is now 2020.11.1 but I don't > see any mentions of riscv in the log. For completeness, kernel config I used is: https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2021-01-14 10:24 ` Dmitry Vyukov @ 2021-01-18 14:53 ` Tobias Klauser -1 siblings, 0 replies; 55+ messages in thread From: Tobias Klauser @ 2021-01-18 14:53 UTC (permalink / raw) To: Dmitry Vyukov Cc: Palmer Dabbelt, Andreas Schwab, Paul Walmsley, Albert Ou, linux-riscv, LKML, nylon7, Bjorn Topel, syzkaller On 2021-01-14 at 11:24:07 +0100, Dmitry Vyukov <dvyukov@google.com> wrote: > On Thu, Jan 14, 2021 at 10:23 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > On Thu, Jan 14, 2021 at 5:57 AM Palmer Dabbelt <palmerdabbelt@google.com> wrote: > > > > > > On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote: > > > > On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote: > > > >> > > > >> On Dez 25 2020, Dmitry Vyukov wrote: > > > >> > > > >> > qemu-system-riscv64 \ > > > >> > -machine virt -bios default -smp 1 -m 2G \ > > > >> > -device virtio-blk-device,drive=hd0 \ > > > >> > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \ > > > >> > -kernel arch/riscv/boot/Image \ > > > >> > -nographic \ > > > >> > -device virtio-rng-device,rng=rng0 -object > > > >> > rng-random,filename=/dev/urandom,id=rng0 \ > > > >> > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > > > >> > virtio-net-device,netdev=net0 \ > > > >> > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > > > >> > panic_on_warn=1 panic=86400" > > > >> > > > >> Do you get more output with earlycon=sbi? > > > > > > > > Hi Andreas, > > > > > > > > For defconfig+kvm_guest.config+ scripts/config -e KASAN -e > > > > KASAN_INLINE it actually gave me more output: > > > > > > > > > > > > OpenSBI v0.7 > > > > ____ _____ ____ _____ > > > > / __ \ / ____| _ \_ _| > > > > | | | |_ __ ___ _ __ | (___ | |_) || | > > > > | | | | '_ \ / _ \ '_ \ \___ \| _ < | | > > > > | |__| | |_) | __/ | | |____) | |_) || |_ > > > > \____/| .__/ \___|_| |_|_____/|____/_____| > > > > | | > > > > |_| > > > > > > > > Platform Name : QEMU Virt Machine > > > > Platform HART Features : RV64ACDFIMSU > > > > Current Hart : 0 > > > > Firmware Base : 0x80000000 > > > > Firmware Size : 132 KB > > > > Runtime SBI Version : 0.2 > > > > > > > > MIDELEG : 0x0000000000000222 > > > > MEDELEG : 0x000000000000b109 > > > > PMP0 : 0x0000000080000000-0x000000008003ffff (A) > > > > PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X) > > > > [ 0.000000] Linux version 5.10.0-01370-g71c5f03154ac > > > > (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc > > > > (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17 > > > > SMP Fri Dec 25 18:10:12 CET 2020 > > > > [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 > > > > [ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '') > > > > [ 0.000000] printk: bootconsole [sbi0] enabled > > > > [ 0.000000] efi: UEFI not found. > > > > [ 0.000000] Zone ranges: > > > > [ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000ffffffff] > > > > [ 0.000000] Normal empty > > > > [ 0.000000] Movable zone start for each node > > > > [ 0.000000] Early memory node ranges > > > > [ 0.000000] node 0: [mem 0x0000000080200000-0x00000000ffffffff] > > > > [ 0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff] > > > > [ 0.000000] SBI specification v0.2 detected > > > > [ 0.000000] SBI implementation ID=0x1 Version=0x7 > > > > [ 0.000000] SBI v0.2 TIME extension detected > > > > [ 0.000000] SBI v0.2 IPI extension detected > > > > [ 0.000000] SBI v0.2 RFENCE extension detected > > > > [ 0.000000] software IO TLB: mapped [mem > > > > 0x00000000fa3f9000-0x00000000fe3f9000] (64MB) > > > > [ 0.000000] Unable to handle kernel paging request at virtual > > > > address dfffffc810040000 > > > > [ 0.000000] Oops [#1] > > > > [ 0.000000] Modules linked in: > > > > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted > > > > 5.10.0-01370-g71c5f03154ac #17 > > > > [ 0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0 > > > > [ 0.000000] gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 : > > > > dfffffc810040000 > > > > [ 0.000000] t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 : > > > > ffffffe001603f50 > > > > [ 0.000000] s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 : > > > > 0000000000000000 > > > > [ 0.000000] a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 : > > > > 0000000000000000 > > > > [ 0.000000] a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 : > > > > 0000000000000900 > > > > [ 0.000000] s2 : dfffffc820000000 s3 : dfffffc800000000 s4 : > > > > 0000000000000001 > > > > [ 0.000000] s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 : > > > > dfffffc810040000 > > > > [ 0.000000] s8 : 0000000000000080 s9 : ffffffffffffffff s10: > > > > ffffffe07a119000 > > > > [ 0.000000] s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 : > > > > 0000000000000001 > > > > [ 0.000000] t5 : ffffffc4001c150a t6 : ffffffe001603be8 > > > > [ 0.000000] status: 0000000000000100 badaddr: dfffffc810040000 > > > > cause: 000000000000000f > > > > [ 0.000000] random: get_random_bytes called from > > > > oops_exit+0x30/0x58 with crng_init=0 > > > > [ 0.000000] ---[ end trace 0000000000000000 ]--- > > > > [ 0.000000] Kernel panic - not syncing: Fatal exception > > > > [ 0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > > > > > > > > > > > But I first tried with a the kernel image I had in the dir, I think it > > > > was this config (no KASAN): > > > > https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt > > > > > > > > and earlycon=sbi did not change anything (no output after OpenSBI). > > > > So potentially there are 2 different problems. > > > > > > Thanks for reporting this. Looks like I'd forgotten to add a kasan config to > > > my tests. There's one in there now, and it's passing as of the fix that Nylon > > > posted. > > > > I can boot the KASAN kernel now on riscv/fixes. > > > > Next problem: I've got only to: > > > > [ 90.498967][ T1] Run /sbin/init as init process > > [ 91.164353][ T4022] init[4022]: unhandled signal 11 code 0x1 at > > 0x0000000000000bb0 in busybox[10000+d7000] > > [ 91.179640][ T4022] CPU: 1 PID: 4022 Comm: init Not tainted > > 5.11.0-rc2-00012-g0983834a8393 #19 > > [ 91.180853][ T4022] epc: 0000000000000bb0 ra : 0000003fccab09d0 sp > > : 0000003fffa8c7b0 > > [ 91.181861][ T4022] gp : 00000000000e8d70 tp : 0000003fccaaf820 t0 > > : 000000000000001e > > [ 91.182810][ T4022] t1 : 0000003fccab0bfc t2 : 000000000000000a s0 > > : 0000003fffa8c850 > > [ 91.183749][ T4022] s1 : 0000003fccab1070 a0 : 0000003fccab1070 a1 > > : 0000003fffa8c8c8 > > [ 91.184689][ T4022] a2 : 0000000000000001 a3 : 0000000000000020 a4 > > : 0000000000000000 > > [ 91.185620][ T4022] a5 : 0000000000000000 a6 : 0000003fcc9c4260 a7 > > : fffffffffffffffe > > [ 91.186566][ T4022] s2 : 0000000000000000 s3 : 0000003fffa8c8c8 s4 > > : 0000003fccab1000 > > [ 91.187500][ T4022] s5 : 0000003fccab1078 s6 : 0000003fffa8c8d0 s7 > > : 0000000000000010 > > [ 91.189672][ T4022] s8 : 0000000000000016 s9 : 0000000000000000 > > s10: 0000003fffa8c8c8 > > [ 91.190637][ T4022] s11: 0000000000000000 t3 : 0000000000000bb0 t4 > > : 0000000000000000 > > [ 91.191568][ T4022] t5 : 0000003fffa8c360 t6 : 0000000000000000 > > [ 91.192389][ T4022] status: 8000000000004020 badaddr: > > 0000000000000bb0 cause: 000000000000000c > > [ 91.201573][ T1] Kernel panic - not syncing: Attempted to kill > > init! exitcode=0x0000000b > > [ 91.202906][ T1] CPU: 0 PID: 1 Comm: init Not tainted > > 5.11.0-rc2-00012-g0983834a8393 #19 > > [ 91.204139][ T1] Call Trace: > > [ 91.204849][ T1] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0 > > [ 91.206124][ T1] [<ffffffe00458b2d8>] show_stack+0x3a/0x46 > > [ 91.207240][ T1] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180 > > [ 91.208732][ T1] [<ffffffe00458b6a0>] panic+0x20a/0x5cc > > [ 91.209890][ T1] [<ffffffe00002eea4>] do_exit+0x1846/0x1874 > > [ 91.211052][ T1] [<ffffffe00002efdc>] do_group_exit+0xa0/0x192 > > [ 91.212224][ T1] [<ffffffe000047d30>] get_signal+0x2d6/0x13dc > > [ 91.213390][ T1] [<ffffffe000007eb0>] do_notify_resume+0xa8/0x912 > > [ 91.214567][ T1] [<ffffffe00000559c>] ret_from_exception+0x0/0x14 > > > > The image is buildroot on 2020.11.x built with this script: > > https://gist.githubusercontent.com/dvyukov/1a9a01ca2189e35175a021820c95b04d/raw/5c01d755e83f4eab0d56aa7dc84af3b2d5e80423/gistfile1.txt > > > > Readelf for init shows the following (is it that [10000+d7000] address > > is not .text at all?): > > > > $ riscv64-linux-gnu-readelf --sections image/bin/busybox > > There are 27 section headers, starting at offset 0xd7f20: > > > > Section Headers: > > [Nr] Name Type Address Offset > > Size EntSize Flags Link Info Align > > [ 0] NULL 0000000000000000 00000000 > > 0000000000000000 0000000000000000 0 0 0 > > [ 1] .interp PROGBITS 0000000000010238 00000238 > > 0000000000000021 0000000000000000 A 0 0 1 > > [ 2] .note.ABI-tag NOTE 000000000001025c 0000025c > > 0000000000000020 0000000000000000 A 0 0 4 > > [ 3] .hash HASH 0000000000010280 00000280 > > 00000000000009cc 0000000000000004 A 5 0 8 > > [ 4] .gnu.hash GNU_HASH 0000000000010c50 00000c50 > > 0000000000000ac8 0000000000000000 A 5 0 8 > > [ 5] .dynsym DYNSYM 0000000000011718 00001718 > > 00000000000021f0 0000000000000018 A 6 1 8 > > [ 6] .dynstr STRTAB 0000000000013908 00003908 > > 0000000000000c66 0000000000000000 A 0 0 1 > > [ 7] .gnu.version VERSYM 000000000001456e 0000456e > > 00000000000002d4 0000000000000002 A 5 0 2 > > [ 8] .gnu.version_r VERNEED 0000000000014848 00004848 > > 0000000000000050 0000000000000000 A 6 2 8 > > [ 9] .rela.dyn RELA 0000000000014898 00004898 > > 00000000000000c0 0000000000000018 A 5 0 8 > > [10] .rela.plt RELA 0000000000014958 00004958 > > 00000000000020a0 0000000000000018 AI 5 22 8 > > [11] .plt PROGBITS 0000000000016a00 00006a00 > > 00000000000015e0 0000000000000010 AX 0 0 16 > > [12] .text PROGBITS 0000000000017fe0 00007fe0 > > 00000000000a3668 0000000000000000 AX 0 0 4 > > [13] .rodata PROGBITS 00000000000bb648 000ab648 > > 000000000002b076 0000000000000000 A 0 0 8 > > [14] .sdata2 PROGBITS 00000000000e66c0 000d66c0 > > 0000000000000163 0000000000000000 A 0 0 8 > > [15] .eh_frame_hdr PROGBITS 00000000000e6824 000d6824 > > 0000000000000014 0000000000000000 A 0 0 4 > > [16] .eh_frame PROGBITS 00000000000e6838 000d6838 > > 000000000000002c 0000000000000000 A 0 0 8 > > [17] .preinit_array PREINIT_ARRAY 00000000000e7df8 000d6df8 > > 0000000000000008 0000000000000008 WA 0 0 1 > > [18] .init_array INIT_ARRAY 00000000000e7e00 000d6e00 > > 0000000000000008 0000000000000008 WA 0 0 8 > > [19] .fini_array FINI_ARRAY 00000000000e7e08 000d6e08 > > 0000000000000008 0000000000000008 WA 0 0 8 > > [20] .dynamic DYNAMIC 00000000000e7e10 000d6e10 > > 00000000000001f0 0000000000000010 WA 6 0 8 > > [21] .data PROGBITS 00000000000e8000 000d7000 > > 0000000000000240 0000000000000000 WA 0 0 8 > > [22] .got PROGBITS 00000000000e8240 000d7240 > > 0000000000000af8 0000000000000008 WA 0 0 8 > > [23] .sdata PROGBITS 00000000000e8d38 000d7d38 > > 0000000000000101 0000000000000000 WA 0 0 8 > > [24] .sbss NOBITS 00000000000e8e40 000d7e39 > > 000000000000017f 0000000000000000 WA 0 0 8 > > [25] .bss NOBITS 00000000000e8fc0 000d7e39 > > 00000000000005b0 0000000000000000 WA 0 0 8 > > [26] .shstrtab STRTAB 0000000000000000 000d7e39 > > 00000000000000e6 0000000000000000 0 0 1 > > > > > > Before I spent more time on this, am I doing anything obviously wrong? > > Is it a known issue? Are there any fresh working recipes? > > Humm.. I tried to use 2020.05 which Tobias used here: > https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_riscv64-kernel.md#image > But there is no make qemu_riscv64_virt_defconfig target... though I > remember I tested these instructions at the time... Weird. `make qemu_riscv64_virt_defconfig` works here on buildroot 202.05, 2020.11.1 and on latest master. Do you see these in your configs/ directory? $ ls -l configs/qemu_riscv* -rw-rw-r-- 1 tklauser tklauser 673 Jan 18 15:51 configs/qemu_riscv32_virt_defconfig -rw-rw-r-- 1 tklauser tklauser 682 Jan 18 15:51 configs/qemu_riscv64_virt_defconfig ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2021-01-18 14:53 ` Tobias Klauser 0 siblings, 0 replies; 55+ messages in thread From: Tobias Klauser @ 2021-01-18 14:53 UTC (permalink / raw) To: Dmitry Vyukov Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, linux-riscv On 2021-01-14 at 11:24:07 +0100, Dmitry Vyukov <dvyukov@google.com> wrote: > On Thu, Jan 14, 2021 at 10:23 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > On Thu, Jan 14, 2021 at 5:57 AM Palmer Dabbelt <palmerdabbelt@google.com> wrote: > > > > > > On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote: > > > > On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote: > > > >> > > > >> On Dez 25 2020, Dmitry Vyukov wrote: > > > >> > > > >> > qemu-system-riscv64 \ > > > >> > -machine virt -bios default -smp 1 -m 2G \ > > > >> > -device virtio-blk-device,drive=hd0 \ > > > >> > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \ > > > >> > -kernel arch/riscv/boot/Image \ > > > >> > -nographic \ > > > >> > -device virtio-rng-device,rng=rng0 -object > > > >> > rng-random,filename=/dev/urandom,id=rng0 \ > > > >> > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > > > >> > virtio-net-device,netdev=net0 \ > > > >> > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > > > >> > panic_on_warn=1 panic=86400" > > > >> > > > >> Do you get more output with earlycon=sbi? > > > > > > > > Hi Andreas, > > > > > > > > For defconfig+kvm_guest.config+ scripts/config -e KASAN -e > > > > KASAN_INLINE it actually gave me more output: > > > > > > > > > > > > OpenSBI v0.7 > > > > ____ _____ ____ _____ > > > > / __ \ / ____| _ \_ _| > > > > | | | |_ __ ___ _ __ | (___ | |_) || | > > > > | | | | '_ \ / _ \ '_ \ \___ \| _ < | | > > > > | |__| | |_) | __/ | | |____) | |_) || |_ > > > > \____/| .__/ \___|_| |_|_____/|____/_____| > > > > | | > > > > |_| > > > > > > > > Platform Name : QEMU Virt Machine > > > > Platform HART Features : RV64ACDFIMSU > > > > Current Hart : 0 > > > > Firmware Base : 0x80000000 > > > > Firmware Size : 132 KB > > > > Runtime SBI Version : 0.2 > > > > > > > > MIDELEG : 0x0000000000000222 > > > > MEDELEG : 0x000000000000b109 > > > > PMP0 : 0x0000000080000000-0x000000008003ffff (A) > > > > PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X) > > > > [ 0.000000] Linux version 5.10.0-01370-g71c5f03154ac > > > > (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc > > > > (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17 > > > > SMP Fri Dec 25 18:10:12 CET 2020 > > > > [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 > > > > [ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '') > > > > [ 0.000000] printk: bootconsole [sbi0] enabled > > > > [ 0.000000] efi: UEFI not found. > > > > [ 0.000000] Zone ranges: > > > > [ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000ffffffff] > > > > [ 0.000000] Normal empty > > > > [ 0.000000] Movable zone start for each node > > > > [ 0.000000] Early memory node ranges > > > > [ 0.000000] node 0: [mem 0x0000000080200000-0x00000000ffffffff] > > > > [ 0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff] > > > > [ 0.000000] SBI specification v0.2 detected > > > > [ 0.000000] SBI implementation ID=0x1 Version=0x7 > > > > [ 0.000000] SBI v0.2 TIME extension detected > > > > [ 0.000000] SBI v0.2 IPI extension detected > > > > [ 0.000000] SBI v0.2 RFENCE extension detected > > > > [ 0.000000] software IO TLB: mapped [mem > > > > 0x00000000fa3f9000-0x00000000fe3f9000] (64MB) > > > > [ 0.000000] Unable to handle kernel paging request at virtual > > > > address dfffffc810040000 > > > > [ 0.000000] Oops [#1] > > > > [ 0.000000] Modules linked in: > > > > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted > > > > 5.10.0-01370-g71c5f03154ac #17 > > > > [ 0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0 > > > > [ 0.000000] gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 : > > > > dfffffc810040000 > > > > [ 0.000000] t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 : > > > > ffffffe001603f50 > > > > [ 0.000000] s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 : > > > > 0000000000000000 > > > > [ 0.000000] a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 : > > > > 0000000000000000 > > > > [ 0.000000] a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 : > > > > 0000000000000900 > > > > [ 0.000000] s2 : dfffffc820000000 s3 : dfffffc800000000 s4 : > > > > 0000000000000001 > > > > [ 0.000000] s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 : > > > > dfffffc810040000 > > > > [ 0.000000] s8 : 0000000000000080 s9 : ffffffffffffffff s10: > > > > ffffffe07a119000 > > > > [ 0.000000] s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 : > > > > 0000000000000001 > > > > [ 0.000000] t5 : ffffffc4001c150a t6 : ffffffe001603be8 > > > > [ 0.000000] status: 0000000000000100 badaddr: dfffffc810040000 > > > > cause: 000000000000000f > > > > [ 0.000000] random: get_random_bytes called from > > > > oops_exit+0x30/0x58 with crng_init=0 > > > > [ 0.000000] ---[ end trace 0000000000000000 ]--- > > > > [ 0.000000] Kernel panic - not syncing: Fatal exception > > > > [ 0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > > > > > > > > > > > But I first tried with a the kernel image I had in the dir, I think it > > > > was this config (no KASAN): > > > > https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt > > > > > > > > and earlycon=sbi did not change anything (no output after OpenSBI). > > > > So potentially there are 2 different problems. > > > > > > Thanks for reporting this. Looks like I'd forgotten to add a kasan config to > > > my tests. There's one in there now, and it's passing as of the fix that Nylon > > > posted. > > > > I can boot the KASAN kernel now on riscv/fixes. > > > > Next problem: I've got only to: > > > > [ 90.498967][ T1] Run /sbin/init as init process > > [ 91.164353][ T4022] init[4022]: unhandled signal 11 code 0x1 at > > 0x0000000000000bb0 in busybox[10000+d7000] > > [ 91.179640][ T4022] CPU: 1 PID: 4022 Comm: init Not tainted > > 5.11.0-rc2-00012-g0983834a8393 #19 > > [ 91.180853][ T4022] epc: 0000000000000bb0 ra : 0000003fccab09d0 sp > > : 0000003fffa8c7b0 > > [ 91.181861][ T4022] gp : 00000000000e8d70 tp : 0000003fccaaf820 t0 > > : 000000000000001e > > [ 91.182810][ T4022] t1 : 0000003fccab0bfc t2 : 000000000000000a s0 > > : 0000003fffa8c850 > > [ 91.183749][ T4022] s1 : 0000003fccab1070 a0 : 0000003fccab1070 a1 > > : 0000003fffa8c8c8 > > [ 91.184689][ T4022] a2 : 0000000000000001 a3 : 0000000000000020 a4 > > : 0000000000000000 > > [ 91.185620][ T4022] a5 : 0000000000000000 a6 : 0000003fcc9c4260 a7 > > : fffffffffffffffe > > [ 91.186566][ T4022] s2 : 0000000000000000 s3 : 0000003fffa8c8c8 s4 > > : 0000003fccab1000 > > [ 91.187500][ T4022] s5 : 0000003fccab1078 s6 : 0000003fffa8c8d0 s7 > > : 0000000000000010 > > [ 91.189672][ T4022] s8 : 0000000000000016 s9 : 0000000000000000 > > s10: 0000003fffa8c8c8 > > [ 91.190637][ T4022] s11: 0000000000000000 t3 : 0000000000000bb0 t4 > > : 0000000000000000 > > [ 91.191568][ T4022] t5 : 0000003fffa8c360 t6 : 0000000000000000 > > [ 91.192389][ T4022] status: 8000000000004020 badaddr: > > 0000000000000bb0 cause: 000000000000000c > > [ 91.201573][ T1] Kernel panic - not syncing: Attempted to kill > > init! exitcode=0x0000000b > > [ 91.202906][ T1] CPU: 0 PID: 1 Comm: init Not tainted > > 5.11.0-rc2-00012-g0983834a8393 #19 > > [ 91.204139][ T1] Call Trace: > > [ 91.204849][ T1] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0 > > [ 91.206124][ T1] [<ffffffe00458b2d8>] show_stack+0x3a/0x46 > > [ 91.207240][ T1] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180 > > [ 91.208732][ T1] [<ffffffe00458b6a0>] panic+0x20a/0x5cc > > [ 91.209890][ T1] [<ffffffe00002eea4>] do_exit+0x1846/0x1874 > > [ 91.211052][ T1] [<ffffffe00002efdc>] do_group_exit+0xa0/0x192 > > [ 91.212224][ T1] [<ffffffe000047d30>] get_signal+0x2d6/0x13dc > > [ 91.213390][ T1] [<ffffffe000007eb0>] do_notify_resume+0xa8/0x912 > > [ 91.214567][ T1] [<ffffffe00000559c>] ret_from_exception+0x0/0x14 > > > > The image is buildroot on 2020.11.x built with this script: > > https://gist.githubusercontent.com/dvyukov/1a9a01ca2189e35175a021820c95b04d/raw/5c01d755e83f4eab0d56aa7dc84af3b2d5e80423/gistfile1.txt > > > > Readelf for init shows the following (is it that [10000+d7000] address > > is not .text at all?): > > > > $ riscv64-linux-gnu-readelf --sections image/bin/busybox > > There are 27 section headers, starting at offset 0xd7f20: > > > > Section Headers: > > [Nr] Name Type Address Offset > > Size EntSize Flags Link Info Align > > [ 0] NULL 0000000000000000 00000000 > > 0000000000000000 0000000000000000 0 0 0 > > [ 1] .interp PROGBITS 0000000000010238 00000238 > > 0000000000000021 0000000000000000 A 0 0 1 > > [ 2] .note.ABI-tag NOTE 000000000001025c 0000025c > > 0000000000000020 0000000000000000 A 0 0 4 > > [ 3] .hash HASH 0000000000010280 00000280 > > 00000000000009cc 0000000000000004 A 5 0 8 > > [ 4] .gnu.hash GNU_HASH 0000000000010c50 00000c50 > > 0000000000000ac8 0000000000000000 A 5 0 8 > > [ 5] .dynsym DYNSYM 0000000000011718 00001718 > > 00000000000021f0 0000000000000018 A 6 1 8 > > [ 6] .dynstr STRTAB 0000000000013908 00003908 > > 0000000000000c66 0000000000000000 A 0 0 1 > > [ 7] .gnu.version VERSYM 000000000001456e 0000456e > > 00000000000002d4 0000000000000002 A 5 0 2 > > [ 8] .gnu.version_r VERNEED 0000000000014848 00004848 > > 0000000000000050 0000000000000000 A 6 2 8 > > [ 9] .rela.dyn RELA 0000000000014898 00004898 > > 00000000000000c0 0000000000000018 A 5 0 8 > > [10] .rela.plt RELA 0000000000014958 00004958 > > 00000000000020a0 0000000000000018 AI 5 22 8 > > [11] .plt PROGBITS 0000000000016a00 00006a00 > > 00000000000015e0 0000000000000010 AX 0 0 16 > > [12] .text PROGBITS 0000000000017fe0 00007fe0 > > 00000000000a3668 0000000000000000 AX 0 0 4 > > [13] .rodata PROGBITS 00000000000bb648 000ab648 > > 000000000002b076 0000000000000000 A 0 0 8 > > [14] .sdata2 PROGBITS 00000000000e66c0 000d66c0 > > 0000000000000163 0000000000000000 A 0 0 8 > > [15] .eh_frame_hdr PROGBITS 00000000000e6824 000d6824 > > 0000000000000014 0000000000000000 A 0 0 4 > > [16] .eh_frame PROGBITS 00000000000e6838 000d6838 > > 000000000000002c 0000000000000000 A 0 0 8 > > [17] .preinit_array PREINIT_ARRAY 00000000000e7df8 000d6df8 > > 0000000000000008 0000000000000008 WA 0 0 1 > > [18] .init_array INIT_ARRAY 00000000000e7e00 000d6e00 > > 0000000000000008 0000000000000008 WA 0 0 8 > > [19] .fini_array FINI_ARRAY 00000000000e7e08 000d6e08 > > 0000000000000008 0000000000000008 WA 0 0 8 > > [20] .dynamic DYNAMIC 00000000000e7e10 000d6e10 > > 00000000000001f0 0000000000000010 WA 6 0 8 > > [21] .data PROGBITS 00000000000e8000 000d7000 > > 0000000000000240 0000000000000000 WA 0 0 8 > > [22] .got PROGBITS 00000000000e8240 000d7240 > > 0000000000000af8 0000000000000008 WA 0 0 8 > > [23] .sdata PROGBITS 00000000000e8d38 000d7d38 > > 0000000000000101 0000000000000000 WA 0 0 8 > > [24] .sbss NOBITS 00000000000e8e40 000d7e39 > > 000000000000017f 0000000000000000 WA 0 0 8 > > [25] .bss NOBITS 00000000000e8fc0 000d7e39 > > 00000000000005b0 0000000000000000 WA 0 0 8 > > [26] .shstrtab STRTAB 0000000000000000 000d7e39 > > 00000000000000e6 0000000000000000 0 0 1 > > > > > > Before I spent more time on this, am I doing anything obviously wrong? > > Is it a known issue? Are there any fresh working recipes? > > Humm.. I tried to use 2020.05 which Tobias used here: > https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_riscv64-kernel.md#image > But there is no make qemu_riscv64_virt_defconfig target... though I > remember I tested these instructions at the time... Weird. `make qemu_riscv64_virt_defconfig` works here on buildroot 202.05, 2020.11.1 and on latest master. Do you see these in your configs/ directory? $ ls -l configs/qemu_riscv* -rw-rw-r-- 1 tklauser tklauser 673 Jan 18 15:51 configs/qemu_riscv32_virt_defconfig -rw-rw-r-- 1 tklauser tklauser 682 Jan 18 15:51 configs/qemu_riscv64_virt_defconfig _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2021-01-18 14:53 ` Tobias Klauser @ 2021-01-18 15:05 ` Dmitry Vyukov -1 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-01-18 15:05 UTC (permalink / raw) To: Tobias Klauser Cc: Palmer Dabbelt, Andreas Schwab, Paul Walmsley, Albert Ou, linux-riscv, LKML, nylon7, Bjorn Topel, syzkaller On Mon, Jan 18, 2021 at 3:53 PM Tobias Klauser <tklauser@distanz.ch> wrote: > > > On Thu, Jan 14, 2021 at 5:57 AM Palmer Dabbelt <palmerdabbelt@google.com> wrote: > > > > > > > > On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote: > > > > > On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote: > > > > >> > > > > >> On Dez 25 2020, Dmitry Vyukov wrote: > > > > >> > > > > >> > qemu-system-riscv64 \ > > > > >> > -machine virt -bios default -smp 1 -m 2G \ > > > > >> > -device virtio-blk-device,drive=hd0 \ > > > > >> > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \ > > > > >> > -kernel arch/riscv/boot/Image \ > > > > >> > -nographic \ > > > > >> > -device virtio-rng-device,rng=rng0 -object > > > > >> > rng-random,filename=/dev/urandom,id=rng0 \ > > > > >> > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > > > > >> > virtio-net-device,netdev=net0 \ > > > > >> > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > > > > >> > panic_on_warn=1 panic=86400" > > > > >> > > > > >> Do you get more output with earlycon=sbi? > > > > > > > > > > Hi Andreas, > > > > > > > > > > For defconfig+kvm_guest.config+ scripts/config -e KASAN -e > > > > > KASAN_INLINE it actually gave me more output: > > > > > > > > > > > > > > > OpenSBI v0.7 > > > > > ____ _____ ____ _____ > > > > > / __ \ / ____| _ \_ _| > > > > > | | | |_ __ ___ _ __ | (___ | |_) || | > > > > > | | | | '_ \ / _ \ '_ \ \___ \| _ < | | > > > > > | |__| | |_) | __/ | | |____) | |_) || |_ > > > > > \____/| .__/ \___|_| |_|_____/|____/_____| > > > > > | | > > > > > |_| > > > > > > > > > > Platform Name : QEMU Virt Machine > > > > > Platform HART Features : RV64ACDFIMSU > > > > > Current Hart : 0 > > > > > Firmware Base : 0x80000000 > > > > > Firmware Size : 132 KB > > > > > Runtime SBI Version : 0.2 > > > > > > > > > > MIDELEG : 0x0000000000000222 > > > > > MEDELEG : 0x000000000000b109 > > > > > PMP0 : 0x0000000080000000-0x000000008003ffff (A) > > > > > PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X) > > > > > [ 0.000000] Linux version 5.10.0-01370-g71c5f03154ac > > > > > (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc > > > > > (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17 > > > > > SMP Fri Dec 25 18:10:12 CET 2020 > > > > > [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 > > > > > [ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '') > > > > > [ 0.000000] printk: bootconsole [sbi0] enabled > > > > > [ 0.000000] efi: UEFI not found. > > > > > [ 0.000000] Zone ranges: > > > > > [ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000ffffffff] > > > > > [ 0.000000] Normal empty > > > > > [ 0.000000] Movable zone start for each node > > > > > [ 0.000000] Early memory node ranges > > > > > [ 0.000000] node 0: [mem 0x0000000080200000-0x00000000ffffffff] > > > > > [ 0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff] > > > > > [ 0.000000] SBI specification v0.2 detected > > > > > [ 0.000000] SBI implementation ID=0x1 Version=0x7 > > > > > [ 0.000000] SBI v0.2 TIME extension detected > > > > > [ 0.000000] SBI v0.2 IPI extension detected > > > > > [ 0.000000] SBI v0.2 RFENCE extension detected > > > > > [ 0.000000] software IO TLB: mapped [mem > > > > > 0x00000000fa3f9000-0x00000000fe3f9000] (64MB) > > > > > [ 0.000000] Unable to handle kernel paging request at virtual > > > > > address dfffffc810040000 > > > > > [ 0.000000] Oops [#1] > > > > > [ 0.000000] Modules linked in: > > > > > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted > > > > > 5.10.0-01370-g71c5f03154ac #17 > > > > > [ 0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0 > > > > > [ 0.000000] gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 : > > > > > dfffffc810040000 > > > > > [ 0.000000] t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 : > > > > > ffffffe001603f50 > > > > > [ 0.000000] s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 : > > > > > 0000000000000000 > > > > > [ 0.000000] a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 : > > > > > 0000000000000000 > > > > > [ 0.000000] a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 : > > > > > 0000000000000900 > > > > > [ 0.000000] s2 : dfffffc820000000 s3 : dfffffc800000000 s4 : > > > > > 0000000000000001 > > > > > [ 0.000000] s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 : > > > > > dfffffc810040000 > > > > > [ 0.000000] s8 : 0000000000000080 s9 : ffffffffffffffff s10: > > > > > ffffffe07a119000 > > > > > [ 0.000000] s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 : > > > > > 0000000000000001 > > > > > [ 0.000000] t5 : ffffffc4001c150a t6 : ffffffe001603be8 > > > > > [ 0.000000] status: 0000000000000100 badaddr: dfffffc810040000 > > > > > cause: 000000000000000f > > > > > [ 0.000000] random: get_random_bytes called from > > > > > oops_exit+0x30/0x58 with crng_init=0 > > > > > [ 0.000000] ---[ end trace 0000000000000000 ]--- > > > > > [ 0.000000] Kernel panic - not syncing: Fatal exception > > > > > [ 0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > > > > > > > > > > > > > > But I first tried with a the kernel image I had in the dir, I think it > > > > > was this config (no KASAN): > > > > > https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt > > > > > > > > > > and earlycon=sbi did not change anything (no output after OpenSBI). > > > > > So potentially there are 2 different problems. > > > > > > > > Thanks for reporting this. Looks like I'd forgotten to add a kasan config to > > > > my tests. There's one in there now, and it's passing as of the fix that Nylon > > > > posted. > > > > > > I can boot the KASAN kernel now on riscv/fixes. > > > > > > Next problem: I've got only to: > > > > > > [ 90.498967][ T1] Run /sbin/init as init process > > > [ 91.164353][ T4022] init[4022]: unhandled signal 11 code 0x1 at > > > 0x0000000000000bb0 in busybox[10000+d7000] > > > [ 91.179640][ T4022] CPU: 1 PID: 4022 Comm: init Not tainted > > > 5.11.0-rc2-00012-g0983834a8393 #19 > > > [ 91.180853][ T4022] epc: 0000000000000bb0 ra : 0000003fccab09d0 sp > > > : 0000003fffa8c7b0 > > > [ 91.181861][ T4022] gp : 00000000000e8d70 tp : 0000003fccaaf820 t0 > > > : 000000000000001e > > > [ 91.182810][ T4022] t1 : 0000003fccab0bfc t2 : 000000000000000a s0 > > > : 0000003fffa8c850 > > > [ 91.183749][ T4022] s1 : 0000003fccab1070 a0 : 0000003fccab1070 a1 > > > : 0000003fffa8c8c8 > > > [ 91.184689][ T4022] a2 : 0000000000000001 a3 : 0000000000000020 a4 > > > : 0000000000000000 > > > [ 91.185620][ T4022] a5 : 0000000000000000 a6 : 0000003fcc9c4260 a7 > > > : fffffffffffffffe > > > [ 91.186566][ T4022] s2 : 0000000000000000 s3 : 0000003fffa8c8c8 s4 > > > : 0000003fccab1000 > > > [ 91.187500][ T4022] s5 : 0000003fccab1078 s6 : 0000003fffa8c8d0 s7 > > > : 0000000000000010 > > > [ 91.189672][ T4022] s8 : 0000000000000016 s9 : 0000000000000000 > > > s10: 0000003fffa8c8c8 > > > [ 91.190637][ T4022] s11: 0000000000000000 t3 : 0000000000000bb0 t4 > > > : 0000000000000000 > > > [ 91.191568][ T4022] t5 : 0000003fffa8c360 t6 : 0000000000000000 > > > [ 91.192389][ T4022] status: 8000000000004020 badaddr: > > > 0000000000000bb0 cause: 000000000000000c > > > [ 91.201573][ T1] Kernel panic - not syncing: Attempted to kill > > > init! exitcode=0x0000000b > > > [ 91.202906][ T1] CPU: 0 PID: 1 Comm: init Not tainted > > > 5.11.0-rc2-00012-g0983834a8393 #19 > > > [ 91.204139][ T1] Call Trace: > > > [ 91.204849][ T1] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0 > > > [ 91.206124][ T1] [<ffffffe00458b2d8>] show_stack+0x3a/0x46 > > > [ 91.207240][ T1] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180 > > > [ 91.208732][ T1] [<ffffffe00458b6a0>] panic+0x20a/0x5cc > > > [ 91.209890][ T1] [<ffffffe00002eea4>] do_exit+0x1846/0x1874 > > > [ 91.211052][ T1] [<ffffffe00002efdc>] do_group_exit+0xa0/0x192 > > > [ 91.212224][ T1] [<ffffffe000047d30>] get_signal+0x2d6/0x13dc > > > [ 91.213390][ T1] [<ffffffe000007eb0>] do_notify_resume+0xa8/0x912 > > > [ 91.214567][ T1] [<ffffffe00000559c>] ret_from_exception+0x0/0x14 > > > > > > The image is buildroot on 2020.11.x built with this script: > > > https://gist.githubusercontent.com/dvyukov/1a9a01ca2189e35175a021820c95b04d/raw/5c01d755e83f4eab0d56aa7dc84af3b2d5e80423/gistfile1.txt > > > > > > Readelf for init shows the following (is it that [10000+d7000] address > > > is not .text at all?): > > > > > > $ riscv64-linux-gnu-readelf --sections image/bin/busybox > > > There are 27 section headers, starting at offset 0xd7f20: > > > > > > Section Headers: > > > [Nr] Name Type Address Offset > > > Size EntSize Flags Link Info Align > > > [ 0] NULL 0000000000000000 00000000 > > > 0000000000000000 0000000000000000 0 0 0 > > > [ 1] .interp PROGBITS 0000000000010238 00000238 > > > 0000000000000021 0000000000000000 A 0 0 1 > > > [ 2] .note.ABI-tag NOTE 000000000001025c 0000025c > > > 0000000000000020 0000000000000000 A 0 0 4 > > > [ 3] .hash HASH 0000000000010280 00000280 > > > 00000000000009cc 0000000000000004 A 5 0 8 > > > [ 4] .gnu.hash GNU_HASH 0000000000010c50 00000c50 > > > 0000000000000ac8 0000000000000000 A 5 0 8 > > > [ 5] .dynsym DYNSYM 0000000000011718 00001718 > > > 00000000000021f0 0000000000000018 A 6 1 8 > > > [ 6] .dynstr STRTAB 0000000000013908 00003908 > > > 0000000000000c66 0000000000000000 A 0 0 1 > > > [ 7] .gnu.version VERSYM 000000000001456e 0000456e > > > 00000000000002d4 0000000000000002 A 5 0 2 > > > [ 8] .gnu.version_r VERNEED 0000000000014848 00004848 > > > 0000000000000050 0000000000000000 A 6 2 8 > > > [ 9] .rela.dyn RELA 0000000000014898 00004898 > > > 00000000000000c0 0000000000000018 A 5 0 8 > > > [10] .rela.plt RELA 0000000000014958 00004958 > > > 00000000000020a0 0000000000000018 AI 5 22 8 > > > [11] .plt PROGBITS 0000000000016a00 00006a00 > > > 00000000000015e0 0000000000000010 AX 0 0 16 > > > [12] .text PROGBITS 0000000000017fe0 00007fe0 > > > 00000000000a3668 0000000000000000 AX 0 0 4 > > > [13] .rodata PROGBITS 00000000000bb648 000ab648 > > > 000000000002b076 0000000000000000 A 0 0 8 > > > [14] .sdata2 PROGBITS 00000000000e66c0 000d66c0 > > > 0000000000000163 0000000000000000 A 0 0 8 > > > [15] .eh_frame_hdr PROGBITS 00000000000e6824 000d6824 > > > 0000000000000014 0000000000000000 A 0 0 4 > > > [16] .eh_frame PROGBITS 00000000000e6838 000d6838 > > > 000000000000002c 0000000000000000 A 0 0 8 > > > [17] .preinit_array PREINIT_ARRAY 00000000000e7df8 000d6df8 > > > 0000000000000008 0000000000000008 WA 0 0 1 > > > [18] .init_array INIT_ARRAY 00000000000e7e00 000d6e00 > > > 0000000000000008 0000000000000008 WA 0 0 8 > > > [19] .fini_array FINI_ARRAY 00000000000e7e08 000d6e08 > > > 0000000000000008 0000000000000008 WA 0 0 8 > > > [20] .dynamic DYNAMIC 00000000000e7e10 000d6e10 > > > 00000000000001f0 0000000000000010 WA 6 0 8 > > > [21] .data PROGBITS 00000000000e8000 000d7000 > > > 0000000000000240 0000000000000000 WA 0 0 8 > > > [22] .got PROGBITS 00000000000e8240 000d7240 > > > 0000000000000af8 0000000000000008 WA 0 0 8 > > > [23] .sdata PROGBITS 00000000000e8d38 000d7d38 > > > 0000000000000101 0000000000000000 WA 0 0 8 > > > [24] .sbss NOBITS 00000000000e8e40 000d7e39 > > > 000000000000017f 0000000000000000 WA 0 0 8 > > > [25] .bss NOBITS 00000000000e8fc0 000d7e39 > > > 00000000000005b0 0000000000000000 WA 0 0 8 > > > [26] .shstrtab STRTAB 0000000000000000 000d7e39 > > > 00000000000000e6 0000000000000000 0 0 1 > > > > > > > > > Before I spent more time on this, am I doing anything obviously wrong? > > > Is it a known issue? Are there any fresh working recipes? > > > > Humm.. I tried to use 2020.05 which Tobias used here: > > https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_riscv64-kernel.md#image > > But there is no make qemu_riscv64_virt_defconfig target... though I > > remember I tested these instructions at the time... > > Weird. `make qemu_riscv64_virt_defconfig` works here on buildroot > 202.05, 2020.11.1 and on latest master. > > Do you see these in your configs/ directory? > > $ ls -l configs/qemu_riscv* > -rw-rw-r-- 1 tklauser tklauser 673 Jan 18 15:51 configs/qemu_riscv32_virt_defconfig > -rw-rw-r-- 1 tklauser tklauser 682 Jan 18 15:51 configs/qemu_riscv64_virt_defconfig Oh, turned out I previously checked out 2011.05 somehow... Yes, 2020.05 has qemu_riscv64_virt_defconfig and I am building it now. 2020.11 has the config, but init crashes (see above). ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2021-01-18 15:05 ` Dmitry Vyukov 0 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-01-18 15:05 UTC (permalink / raw) To: Tobias Klauser Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, linux-riscv On Mon, Jan 18, 2021 at 3:53 PM Tobias Klauser <tklauser@distanz.ch> wrote: > > > On Thu, Jan 14, 2021 at 5:57 AM Palmer Dabbelt <palmerdabbelt@google.com> wrote: > > > > > > > > On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote: > > > > > On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote: > > > > >> > > > > >> On Dez 25 2020, Dmitry Vyukov wrote: > > > > >> > > > > >> > qemu-system-riscv64 \ > > > > >> > -machine virt -bios default -smp 1 -m 2G \ > > > > >> > -device virtio-blk-device,drive=hd0 \ > > > > >> > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \ > > > > >> > -kernel arch/riscv/boot/Image \ > > > > >> > -nographic \ > > > > >> > -device virtio-rng-device,rng=rng0 -object > > > > >> > rng-random,filename=/dev/urandom,id=rng0 \ > > > > >> > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > > > > >> > virtio-net-device,netdev=net0 \ > > > > >> > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > > > > >> > panic_on_warn=1 panic=86400" > > > > >> > > > > >> Do you get more output with earlycon=sbi? > > > > > > > > > > Hi Andreas, > > > > > > > > > > For defconfig+kvm_guest.config+ scripts/config -e KASAN -e > > > > > KASAN_INLINE it actually gave me more output: > > > > > > > > > > > > > > > OpenSBI v0.7 > > > > > ____ _____ ____ _____ > > > > > / __ \ / ____| _ \_ _| > > > > > | | | |_ __ ___ _ __ | (___ | |_) || | > > > > > | | | | '_ \ / _ \ '_ \ \___ \| _ < | | > > > > > | |__| | |_) | __/ | | |____) | |_) || |_ > > > > > \____/| .__/ \___|_| |_|_____/|____/_____| > > > > > | | > > > > > |_| > > > > > > > > > > Platform Name : QEMU Virt Machine > > > > > Platform HART Features : RV64ACDFIMSU > > > > > Current Hart : 0 > > > > > Firmware Base : 0x80000000 > > > > > Firmware Size : 132 KB > > > > > Runtime SBI Version : 0.2 > > > > > > > > > > MIDELEG : 0x0000000000000222 > > > > > MEDELEG : 0x000000000000b109 > > > > > PMP0 : 0x0000000080000000-0x000000008003ffff (A) > > > > > PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X) > > > > > [ 0.000000] Linux version 5.10.0-01370-g71c5f03154ac > > > > > (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc > > > > > (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17 > > > > > SMP Fri Dec 25 18:10:12 CET 2020 > > > > > [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 > > > > > [ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '') > > > > > [ 0.000000] printk: bootconsole [sbi0] enabled > > > > > [ 0.000000] efi: UEFI not found. > > > > > [ 0.000000] Zone ranges: > > > > > [ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000ffffffff] > > > > > [ 0.000000] Normal empty > > > > > [ 0.000000] Movable zone start for each node > > > > > [ 0.000000] Early memory node ranges > > > > > [ 0.000000] node 0: [mem 0x0000000080200000-0x00000000ffffffff] > > > > > [ 0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff] > > > > > [ 0.000000] SBI specification v0.2 detected > > > > > [ 0.000000] SBI implementation ID=0x1 Version=0x7 > > > > > [ 0.000000] SBI v0.2 TIME extension detected > > > > > [ 0.000000] SBI v0.2 IPI extension detected > > > > > [ 0.000000] SBI v0.2 RFENCE extension detected > > > > > [ 0.000000] software IO TLB: mapped [mem > > > > > 0x00000000fa3f9000-0x00000000fe3f9000] (64MB) > > > > > [ 0.000000] Unable to handle kernel paging request at virtual > > > > > address dfffffc810040000 > > > > > [ 0.000000] Oops [#1] > > > > > [ 0.000000] Modules linked in: > > > > > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted > > > > > 5.10.0-01370-g71c5f03154ac #17 > > > > > [ 0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0 > > > > > [ 0.000000] gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 : > > > > > dfffffc810040000 > > > > > [ 0.000000] t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 : > > > > > ffffffe001603f50 > > > > > [ 0.000000] s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 : > > > > > 0000000000000000 > > > > > [ 0.000000] a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 : > > > > > 0000000000000000 > > > > > [ 0.000000] a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 : > > > > > 0000000000000900 > > > > > [ 0.000000] s2 : dfffffc820000000 s3 : dfffffc800000000 s4 : > > > > > 0000000000000001 > > > > > [ 0.000000] s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 : > > > > > dfffffc810040000 > > > > > [ 0.000000] s8 : 0000000000000080 s9 : ffffffffffffffff s10: > > > > > ffffffe07a119000 > > > > > [ 0.000000] s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 : > > > > > 0000000000000001 > > > > > [ 0.000000] t5 : ffffffc4001c150a t6 : ffffffe001603be8 > > > > > [ 0.000000] status: 0000000000000100 badaddr: dfffffc810040000 > > > > > cause: 000000000000000f > > > > > [ 0.000000] random: get_random_bytes called from > > > > > oops_exit+0x30/0x58 with crng_init=0 > > > > > [ 0.000000] ---[ end trace 0000000000000000 ]--- > > > > > [ 0.000000] Kernel panic - not syncing: Fatal exception > > > > > [ 0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > > > > > > > > > > > > > > But I first tried with a the kernel image I had in the dir, I think it > > > > > was this config (no KASAN): > > > > > https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt > > > > > > > > > > and earlycon=sbi did not change anything (no output after OpenSBI). > > > > > So potentially there are 2 different problems. > > > > > > > > Thanks for reporting this. Looks like I'd forgotten to add a kasan config to > > > > my tests. There's one in there now, and it's passing as of the fix that Nylon > > > > posted. > > > > > > I can boot the KASAN kernel now on riscv/fixes. > > > > > > Next problem: I've got only to: > > > > > > [ 90.498967][ T1] Run /sbin/init as init process > > > [ 91.164353][ T4022] init[4022]: unhandled signal 11 code 0x1 at > > > 0x0000000000000bb0 in busybox[10000+d7000] > > > [ 91.179640][ T4022] CPU: 1 PID: 4022 Comm: init Not tainted > > > 5.11.0-rc2-00012-g0983834a8393 #19 > > > [ 91.180853][ T4022] epc: 0000000000000bb0 ra : 0000003fccab09d0 sp > > > : 0000003fffa8c7b0 > > > [ 91.181861][ T4022] gp : 00000000000e8d70 tp : 0000003fccaaf820 t0 > > > : 000000000000001e > > > [ 91.182810][ T4022] t1 : 0000003fccab0bfc t2 : 000000000000000a s0 > > > : 0000003fffa8c850 > > > [ 91.183749][ T4022] s1 : 0000003fccab1070 a0 : 0000003fccab1070 a1 > > > : 0000003fffa8c8c8 > > > [ 91.184689][ T4022] a2 : 0000000000000001 a3 : 0000000000000020 a4 > > > : 0000000000000000 > > > [ 91.185620][ T4022] a5 : 0000000000000000 a6 : 0000003fcc9c4260 a7 > > > : fffffffffffffffe > > > [ 91.186566][ T4022] s2 : 0000000000000000 s3 : 0000003fffa8c8c8 s4 > > > : 0000003fccab1000 > > > [ 91.187500][ T4022] s5 : 0000003fccab1078 s6 : 0000003fffa8c8d0 s7 > > > : 0000000000000010 > > > [ 91.189672][ T4022] s8 : 0000000000000016 s9 : 0000000000000000 > > > s10: 0000003fffa8c8c8 > > > [ 91.190637][ T4022] s11: 0000000000000000 t3 : 0000000000000bb0 t4 > > > : 0000000000000000 > > > [ 91.191568][ T4022] t5 : 0000003fffa8c360 t6 : 0000000000000000 > > > [ 91.192389][ T4022] status: 8000000000004020 badaddr: > > > 0000000000000bb0 cause: 000000000000000c > > > [ 91.201573][ T1] Kernel panic - not syncing: Attempted to kill > > > init! exitcode=0x0000000b > > > [ 91.202906][ T1] CPU: 0 PID: 1 Comm: init Not tainted > > > 5.11.0-rc2-00012-g0983834a8393 #19 > > > [ 91.204139][ T1] Call Trace: > > > [ 91.204849][ T1] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0 > > > [ 91.206124][ T1] [<ffffffe00458b2d8>] show_stack+0x3a/0x46 > > > [ 91.207240][ T1] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180 > > > [ 91.208732][ T1] [<ffffffe00458b6a0>] panic+0x20a/0x5cc > > > [ 91.209890][ T1] [<ffffffe00002eea4>] do_exit+0x1846/0x1874 > > > [ 91.211052][ T1] [<ffffffe00002efdc>] do_group_exit+0xa0/0x192 > > > [ 91.212224][ T1] [<ffffffe000047d30>] get_signal+0x2d6/0x13dc > > > [ 91.213390][ T1] [<ffffffe000007eb0>] do_notify_resume+0xa8/0x912 > > > [ 91.214567][ T1] [<ffffffe00000559c>] ret_from_exception+0x0/0x14 > > > > > > The image is buildroot on 2020.11.x built with this script: > > > https://gist.githubusercontent.com/dvyukov/1a9a01ca2189e35175a021820c95b04d/raw/5c01d755e83f4eab0d56aa7dc84af3b2d5e80423/gistfile1.txt > > > > > > Readelf for init shows the following (is it that [10000+d7000] address > > > is not .text at all?): > > > > > > $ riscv64-linux-gnu-readelf --sections image/bin/busybox > > > There are 27 section headers, starting at offset 0xd7f20: > > > > > > Section Headers: > > > [Nr] Name Type Address Offset > > > Size EntSize Flags Link Info Align > > > [ 0] NULL 0000000000000000 00000000 > > > 0000000000000000 0000000000000000 0 0 0 > > > [ 1] .interp PROGBITS 0000000000010238 00000238 > > > 0000000000000021 0000000000000000 A 0 0 1 > > > [ 2] .note.ABI-tag NOTE 000000000001025c 0000025c > > > 0000000000000020 0000000000000000 A 0 0 4 > > > [ 3] .hash HASH 0000000000010280 00000280 > > > 00000000000009cc 0000000000000004 A 5 0 8 > > > [ 4] .gnu.hash GNU_HASH 0000000000010c50 00000c50 > > > 0000000000000ac8 0000000000000000 A 5 0 8 > > > [ 5] .dynsym DYNSYM 0000000000011718 00001718 > > > 00000000000021f0 0000000000000018 A 6 1 8 > > > [ 6] .dynstr STRTAB 0000000000013908 00003908 > > > 0000000000000c66 0000000000000000 A 0 0 1 > > > [ 7] .gnu.version VERSYM 000000000001456e 0000456e > > > 00000000000002d4 0000000000000002 A 5 0 2 > > > [ 8] .gnu.version_r VERNEED 0000000000014848 00004848 > > > 0000000000000050 0000000000000000 A 6 2 8 > > > [ 9] .rela.dyn RELA 0000000000014898 00004898 > > > 00000000000000c0 0000000000000018 A 5 0 8 > > > [10] .rela.plt RELA 0000000000014958 00004958 > > > 00000000000020a0 0000000000000018 AI 5 22 8 > > > [11] .plt PROGBITS 0000000000016a00 00006a00 > > > 00000000000015e0 0000000000000010 AX 0 0 16 > > > [12] .text PROGBITS 0000000000017fe0 00007fe0 > > > 00000000000a3668 0000000000000000 AX 0 0 4 > > > [13] .rodata PROGBITS 00000000000bb648 000ab648 > > > 000000000002b076 0000000000000000 A 0 0 8 > > > [14] .sdata2 PROGBITS 00000000000e66c0 000d66c0 > > > 0000000000000163 0000000000000000 A 0 0 8 > > > [15] .eh_frame_hdr PROGBITS 00000000000e6824 000d6824 > > > 0000000000000014 0000000000000000 A 0 0 4 > > > [16] .eh_frame PROGBITS 00000000000e6838 000d6838 > > > 000000000000002c 0000000000000000 A 0 0 8 > > > [17] .preinit_array PREINIT_ARRAY 00000000000e7df8 000d6df8 > > > 0000000000000008 0000000000000008 WA 0 0 1 > > > [18] .init_array INIT_ARRAY 00000000000e7e00 000d6e00 > > > 0000000000000008 0000000000000008 WA 0 0 8 > > > [19] .fini_array FINI_ARRAY 00000000000e7e08 000d6e08 > > > 0000000000000008 0000000000000008 WA 0 0 8 > > > [20] .dynamic DYNAMIC 00000000000e7e10 000d6e10 > > > 00000000000001f0 0000000000000010 WA 6 0 8 > > > [21] .data PROGBITS 00000000000e8000 000d7000 > > > 0000000000000240 0000000000000000 WA 0 0 8 > > > [22] .got PROGBITS 00000000000e8240 000d7240 > > > 0000000000000af8 0000000000000008 WA 0 0 8 > > > [23] .sdata PROGBITS 00000000000e8d38 000d7d38 > > > 0000000000000101 0000000000000000 WA 0 0 8 > > > [24] .sbss NOBITS 00000000000e8e40 000d7e39 > > > 000000000000017f 0000000000000000 WA 0 0 8 > > > [25] .bss NOBITS 00000000000e8fc0 000d7e39 > > > 00000000000005b0 0000000000000000 WA 0 0 8 > > > [26] .shstrtab STRTAB 0000000000000000 000d7e39 > > > 00000000000000e6 0000000000000000 0 0 1 > > > > > > > > > Before I spent more time on this, am I doing anything obviously wrong? > > > Is it a known issue? Are there any fresh working recipes? > > > > Humm.. I tried to use 2020.05 which Tobias used here: > > https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_riscv64-kernel.md#image > > But there is no make qemu_riscv64_virt_defconfig target... though I > > remember I tested these instructions at the time... > > Weird. `make qemu_riscv64_virt_defconfig` works here on buildroot > 202.05, 2020.11.1 and on latest master. > > Do you see these in your configs/ directory? > > $ ls -l configs/qemu_riscv* > -rw-rw-r-- 1 tklauser tklauser 673 Jan 18 15:51 configs/qemu_riscv32_virt_defconfig > -rw-rw-r-- 1 tklauser tklauser 682 Jan 18 15:51 configs/qemu_riscv64_virt_defconfig Oh, turned out I previously checked out 2011.05 somehow... Yes, 2020.05 has qemu_riscv64_virt_defconfig and I am building it now. 2020.11 has the config, but init crashes (see above). _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2021-01-18 15:05 ` Dmitry Vyukov @ 2021-01-18 15:43 ` Dmitry Vyukov -1 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-01-18 15:43 UTC (permalink / raw) To: Tobias Klauser Cc: Palmer Dabbelt, Andreas Schwab, Paul Walmsley, Albert Ou, linux-riscv, LKML, nylon7, Bjorn Topel, syzkaller On Mon, Jan 18, 2021 at 4:05 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Mon, Jan 18, 2021 at 3:53 PM Tobias Klauser <tklauser@distanz.ch> wrote: > > > > On Thu, Jan 14, 2021 at 5:57 AM Palmer Dabbelt <palmerdabbelt@google.com> wrote: > > > > > > > > > > On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote: > > > > > > On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote: > > > > > >> > > > > > >> On Dez 25 2020, Dmitry Vyukov wrote: > > > > > >> > > > > > >> > qemu-system-riscv64 \ > > > > > >> > -machine virt -bios default -smp 1 -m 2G \ > > > > > >> > -device virtio-blk-device,drive=hd0 \ > > > > > >> > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \ > > > > > >> > -kernel arch/riscv/boot/Image \ > > > > > >> > -nographic \ > > > > > >> > -device virtio-rng-device,rng=rng0 -object > > > > > >> > rng-random,filename=/dev/urandom,id=rng0 \ > > > > > >> > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > > > > > >> > virtio-net-device,netdev=net0 \ > > > > > >> > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > > > > > >> > panic_on_warn=1 panic=86400" > > > > > >> > > > > > >> Do you get more output with earlycon=sbi? > > > > > > > > > > > > Hi Andreas, > > > > > > > > > > > > For defconfig+kvm_guest.config+ scripts/config -e KASAN -e > > > > > > KASAN_INLINE it actually gave me more output: > > > > > > > > > > > > > > > > > > OpenSBI v0.7 > > > > > > ____ _____ ____ _____ > > > > > > / __ \ / ____| _ \_ _| > > > > > > | | | |_ __ ___ _ __ | (___ | |_) || | > > > > > > | | | | '_ \ / _ \ '_ \ \___ \| _ < | | > > > > > > | |__| | |_) | __/ | | |____) | |_) || |_ > > > > > > \____/| .__/ \___|_| |_|_____/|____/_____| > > > > > > | | > > > > > > |_| > > > > > > > > > > > > Platform Name : QEMU Virt Machine > > > > > > Platform HART Features : RV64ACDFIMSU > > > > > > Current Hart : 0 > > > > > > Firmware Base : 0x80000000 > > > > > > Firmware Size : 132 KB > > > > > > Runtime SBI Version : 0.2 > > > > > > > > > > > > MIDELEG : 0x0000000000000222 > > > > > > MEDELEG : 0x000000000000b109 > > > > > > PMP0 : 0x0000000080000000-0x000000008003ffff (A) > > > > > > PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X) > > > > > > [ 0.000000] Linux version 5.10.0-01370-g71c5f03154ac > > > > > > (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc > > > > > > (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17 > > > > > > SMP Fri Dec 25 18:10:12 CET 2020 > > > > > > [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 > > > > > > [ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '') > > > > > > [ 0.000000] printk: bootconsole [sbi0] enabled > > > > > > [ 0.000000] efi: UEFI not found. > > > > > > [ 0.000000] Zone ranges: > > > > > > [ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000ffffffff] > > > > > > [ 0.000000] Normal empty > > > > > > [ 0.000000] Movable zone start for each node > > > > > > [ 0.000000] Early memory node ranges > > > > > > [ 0.000000] node 0: [mem 0x0000000080200000-0x00000000ffffffff] > > > > > > [ 0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff] > > > > > > [ 0.000000] SBI specification v0.2 detected > > > > > > [ 0.000000] SBI implementation ID=0x1 Version=0x7 > > > > > > [ 0.000000] SBI v0.2 TIME extension detected > > > > > > [ 0.000000] SBI v0.2 IPI extension detected > > > > > > [ 0.000000] SBI v0.2 RFENCE extension detected > > > > > > [ 0.000000] software IO TLB: mapped [mem > > > > > > 0x00000000fa3f9000-0x00000000fe3f9000] (64MB) > > > > > > [ 0.000000] Unable to handle kernel paging request at virtual > > > > > > address dfffffc810040000 > > > > > > [ 0.000000] Oops [#1] > > > > > > [ 0.000000] Modules linked in: > > > > > > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted > > > > > > 5.10.0-01370-g71c5f03154ac #17 > > > > > > [ 0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0 > > > > > > [ 0.000000] gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 : > > > > > > dfffffc810040000 > > > > > > [ 0.000000] t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 : > > > > > > ffffffe001603f50 > > > > > > [ 0.000000] s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 : > > > > > > 0000000000000000 > > > > > > [ 0.000000] a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 : > > > > > > 0000000000000000 > > > > > > [ 0.000000] a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 : > > > > > > 0000000000000900 > > > > > > [ 0.000000] s2 : dfffffc820000000 s3 : dfffffc800000000 s4 : > > > > > > 0000000000000001 > > > > > > [ 0.000000] s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 : > > > > > > dfffffc810040000 > > > > > > [ 0.000000] s8 : 0000000000000080 s9 : ffffffffffffffff s10: > > > > > > ffffffe07a119000 > > > > > > [ 0.000000] s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 : > > > > > > 0000000000000001 > > > > > > [ 0.000000] t5 : ffffffc4001c150a t6 : ffffffe001603be8 > > > > > > [ 0.000000] status: 0000000000000100 badaddr: dfffffc810040000 > > > > > > cause: 000000000000000f > > > > > > [ 0.000000] random: get_random_bytes called from > > > > > > oops_exit+0x30/0x58 with crng_init=0 > > > > > > [ 0.000000] ---[ end trace 0000000000000000 ]--- > > > > > > [ 0.000000] Kernel panic - not syncing: Fatal exception > > > > > > [ 0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > > > > > > > > > > > > > > > > > But I first tried with a the kernel image I had in the dir, I think it > > > > > > was this config (no KASAN): > > > > > > https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt > > > > > > > > > > > > and earlycon=sbi did not change anything (no output after OpenSBI). > > > > > > So potentially there are 2 different problems. > > > > > > > > > > Thanks for reporting this. Looks like I'd forgotten to add a kasan config to > > > > > my tests. There's one in there now, and it's passing as of the fix that Nylon > > > > > posted. > > > > > > > > I can boot the KASAN kernel now on riscv/fixes. > > > > > > > > Next problem: I've got only to: > > > > > > > > [ 90.498967][ T1] Run /sbin/init as init process > > > > [ 91.164353][ T4022] init[4022]: unhandled signal 11 code 0x1 at > > > > 0x0000000000000bb0 in busybox[10000+d7000] > > > > [ 91.179640][ T4022] CPU: 1 PID: 4022 Comm: init Not tainted > > > > 5.11.0-rc2-00012-g0983834a8393 #19 > > > > [ 91.180853][ T4022] epc: 0000000000000bb0 ra : 0000003fccab09d0 sp > > > > : 0000003fffa8c7b0 > > > > [ 91.181861][ T4022] gp : 00000000000e8d70 tp : 0000003fccaaf820 t0 > > > > : 000000000000001e > > > > [ 91.182810][ T4022] t1 : 0000003fccab0bfc t2 : 000000000000000a s0 > > > > : 0000003fffa8c850 > > > > [ 91.183749][ T4022] s1 : 0000003fccab1070 a0 : 0000003fccab1070 a1 > > > > : 0000003fffa8c8c8 > > > > [ 91.184689][ T4022] a2 : 0000000000000001 a3 : 0000000000000020 a4 > > > > : 0000000000000000 > > > > [ 91.185620][ T4022] a5 : 0000000000000000 a6 : 0000003fcc9c4260 a7 > > > > : fffffffffffffffe > > > > [ 91.186566][ T4022] s2 : 0000000000000000 s3 : 0000003fffa8c8c8 s4 > > > > : 0000003fccab1000 > > > > [ 91.187500][ T4022] s5 : 0000003fccab1078 s6 : 0000003fffa8c8d0 s7 > > > > : 0000000000000010 > > > > [ 91.189672][ T4022] s8 : 0000000000000016 s9 : 0000000000000000 > > > > s10: 0000003fffa8c8c8 > > > > [ 91.190637][ T4022] s11: 0000000000000000 t3 : 0000000000000bb0 t4 > > > > : 0000000000000000 > > > > [ 91.191568][ T4022] t5 : 0000003fffa8c360 t6 : 0000000000000000 > > > > [ 91.192389][ T4022] status: 8000000000004020 badaddr: > > > > 0000000000000bb0 cause: 000000000000000c > > > > [ 91.201573][ T1] Kernel panic - not syncing: Attempted to kill > > > > init! exitcode=0x0000000b > > > > [ 91.202906][ T1] CPU: 0 PID: 1 Comm: init Not tainted > > > > 5.11.0-rc2-00012-g0983834a8393 #19 > > > > [ 91.204139][ T1] Call Trace: > > > > [ 91.204849][ T1] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0 > > > > [ 91.206124][ T1] [<ffffffe00458b2d8>] show_stack+0x3a/0x46 > > > > [ 91.207240][ T1] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180 > > > > [ 91.208732][ T1] [<ffffffe00458b6a0>] panic+0x20a/0x5cc > > > > [ 91.209890][ T1] [<ffffffe00002eea4>] do_exit+0x1846/0x1874 > > > > [ 91.211052][ T1] [<ffffffe00002efdc>] do_group_exit+0xa0/0x192 > > > > [ 91.212224][ T1] [<ffffffe000047d30>] get_signal+0x2d6/0x13dc > > > > [ 91.213390][ T1] [<ffffffe000007eb0>] do_notify_resume+0xa8/0x912 > > > > [ 91.214567][ T1] [<ffffffe00000559c>] ret_from_exception+0x0/0x14 > > > > > > > > The image is buildroot on 2020.11.x built with this script: > > > > https://gist.githubusercontent.com/dvyukov/1a9a01ca2189e35175a021820c95b04d/raw/5c01d755e83f4eab0d56aa7dc84af3b2d5e80423/gistfile1.txt > > > > > > > > Readelf for init shows the following (is it that [10000+d7000] address > > > > is not .text at all?): > > > > > > > > $ riscv64-linux-gnu-readelf --sections image/bin/busybox > > > > There are 27 section headers, starting at offset 0xd7f20: > > > > > > > > Section Headers: > > > > [Nr] Name Type Address Offset > > > > Size EntSize Flags Link Info Align > > > > [ 0] NULL 0000000000000000 00000000 > > > > 0000000000000000 0000000000000000 0 0 0 > > > > [ 1] .interp PROGBITS 0000000000010238 00000238 > > > > 0000000000000021 0000000000000000 A 0 0 1 > > > > [ 2] .note.ABI-tag NOTE 000000000001025c 0000025c > > > > 0000000000000020 0000000000000000 A 0 0 4 > > > > [ 3] .hash HASH 0000000000010280 00000280 > > > > 00000000000009cc 0000000000000004 A 5 0 8 > > > > [ 4] .gnu.hash GNU_HASH 0000000000010c50 00000c50 > > > > 0000000000000ac8 0000000000000000 A 5 0 8 > > > > [ 5] .dynsym DYNSYM 0000000000011718 00001718 > > > > 00000000000021f0 0000000000000018 A 6 1 8 > > > > [ 6] .dynstr STRTAB 0000000000013908 00003908 > > > > 0000000000000c66 0000000000000000 A 0 0 1 > > > > [ 7] .gnu.version VERSYM 000000000001456e 0000456e > > > > 00000000000002d4 0000000000000002 A 5 0 2 > > > > [ 8] .gnu.version_r VERNEED 0000000000014848 00004848 > > > > 0000000000000050 0000000000000000 A 6 2 8 > > > > [ 9] .rela.dyn RELA 0000000000014898 00004898 > > > > 00000000000000c0 0000000000000018 A 5 0 8 > > > > [10] .rela.plt RELA 0000000000014958 00004958 > > > > 00000000000020a0 0000000000000018 AI 5 22 8 > > > > [11] .plt PROGBITS 0000000000016a00 00006a00 > > > > 00000000000015e0 0000000000000010 AX 0 0 16 > > > > [12] .text PROGBITS 0000000000017fe0 00007fe0 > > > > 00000000000a3668 0000000000000000 AX 0 0 4 > > > > [13] .rodata PROGBITS 00000000000bb648 000ab648 > > > > 000000000002b076 0000000000000000 A 0 0 8 > > > > [14] .sdata2 PROGBITS 00000000000e66c0 000d66c0 > > > > 0000000000000163 0000000000000000 A 0 0 8 > > > > [15] .eh_frame_hdr PROGBITS 00000000000e6824 000d6824 > > > > 0000000000000014 0000000000000000 A 0 0 4 > > > > [16] .eh_frame PROGBITS 00000000000e6838 000d6838 > > > > 000000000000002c 0000000000000000 A 0 0 8 > > > > [17] .preinit_array PREINIT_ARRAY 00000000000e7df8 000d6df8 > > > > 0000000000000008 0000000000000008 WA 0 0 1 > > > > [18] .init_array INIT_ARRAY 00000000000e7e00 000d6e00 > > > > 0000000000000008 0000000000000008 WA 0 0 8 > > > > [19] .fini_array FINI_ARRAY 00000000000e7e08 000d6e08 > > > > 0000000000000008 0000000000000008 WA 0 0 8 > > > > [20] .dynamic DYNAMIC 00000000000e7e10 000d6e10 > > > > 00000000000001f0 0000000000000010 WA 6 0 8 > > > > [21] .data PROGBITS 00000000000e8000 000d7000 > > > > 0000000000000240 0000000000000000 WA 0 0 8 > > > > [22] .got PROGBITS 00000000000e8240 000d7240 > > > > 0000000000000af8 0000000000000008 WA 0 0 8 > > > > [23] .sdata PROGBITS 00000000000e8d38 000d7d38 > > > > 0000000000000101 0000000000000000 WA 0 0 8 > > > > [24] .sbss NOBITS 00000000000e8e40 000d7e39 > > > > 000000000000017f 0000000000000000 WA 0 0 8 > > > > [25] .bss NOBITS 00000000000e8fc0 000d7e39 > > > > 00000000000005b0 0000000000000000 WA 0 0 8 > > > > [26] .shstrtab STRTAB 0000000000000000 000d7e39 > > > > 00000000000000e6 0000000000000000 0 0 1 > > > > > > > > > > > > Before I spent more time on this, am I doing anything obviously wrong? > > > > Is it a known issue? Are there any fresh working recipes? > > > > > > Humm.. I tried to use 2020.05 which Tobias used here: > > > https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_riscv64-kernel.md#image > > > But there is no make qemu_riscv64_virt_defconfig target... though I > > > remember I tested these instructions at the time... > > > > Weird. `make qemu_riscv64_virt_defconfig` works here on buildroot > > 202.05, 2020.11.1 and on latest master. > > > > Do you see these in your configs/ directory? > > > > $ ls -l configs/qemu_riscv* > > -rw-rw-r-- 1 tklauser tklauser 673 Jan 18 15:51 configs/qemu_riscv32_virt_defconfig > > -rw-rw-r-- 1 tklauser tklauser 682 Jan 18 15:51 configs/qemu_riscv64_virt_defconfig > > Oh, turned out I previously checked out 2011.05 somehow... > Yes, 2020.05 has qemu_riscv64_virt_defconfig and I am building it now. > 2020.11 has the config, but init crashes (see above). 2020.05 is a bit better, but still failed in several ways. First, a number of user-space services including sshd still crashed. Second, kernel also crashed a bit later. And 2020.11 seems to regress even more. It's with the same kernel from the previous email (I did not rebuilt it). 2020.05 buildroot: [ 90.381218][ T1] devtmpfs: mounted [ 90.534531][ T1] Freeing unused kernel memory: 2328K [ 90.537085][ T1] Run /sbin/init as init process [ 91.754610][ T4022] EXT4-fs (vda): re-mounted. Opts: (null). Quota mode: none. Starting syslogd: OK Starting klogd: OK Running sysctl: OK Populating /dev using udev: [ 99.413418][ T4051] udevd[4051]: starting version 3.2.9 [ 100.480500][ T4052] udevd[4052]: starting eudev-3.2.9 [ 101.904876][ T4052] udevd[4052]: unhandled signal 11 code 0x1 at 0x0000000000000bb0 in udevd[10000+35000] [ 101.911401][ T4052] CPU: 1 PID: 4052 Comm: udevd Not tainted 5.11.0-rc2-00012-g0983834a8393 #19 [ 101.913136][ T4052] epc: 0000000000000bb0 ra : 0000003ff5921872 sp : 0000003fffb0c3a0 [ 101.914593][ T4052] gp : 000000000004f908 tp : 0000003ff552b720 t0 : 0000003ff5943160 [ 101.915740][ T4052] t1 : 0000003ff5921bec t2 : 000000000004f450 s0 : 0000003fffb0c440 [ 101.916872][ T4052] s1 : 0000003ff5922000 a0 : 0000003ff5922000 a1 : 0000003fffb0c460 [ 101.949318][ T4052] a2 : 0000000000000001 a3 : 0000000000000002 a4 : 0000000000000002 [ 101.950529][ T4052] a5 : 000000000000000f a6 : 0000000000000007 a7 : 0000000000000016 [ 101.951653][ T4052] s2 : 0000000000000001 s3 : 0000003fffb0c460 s4 : 0000003ff5922030 [ 101.952771][ T4052] s5 : 0000003ff5922010 s6 : 0000000000000000 s7 : 0000000000000000 [ 101.953878][ T4052] s8 : 0000003ff5922004 s9 : 0000003ff5922010 s10: 0000003ff5922008 [ 101.955016][ T4052] s11: 0000003ff5922038 t3 : 0000000000000bb0 t4 : 0000000000000002 [ 101.956122][ T4052] t5 : 0000000000000002 t6 : 0000000000003d40 [ 101.957072][ T4052] status: 8000000000004020 badaddr: 0000000000000bb0 cause: 000000000000000c [ 154.349233][ T4055] udevadm[4055]: unhandled signal 11 code 0x1 at 0x0000000000000bb0 in udevadm[10000+38000] [ 154.351201][ T4055] CPU: 0 PID: 4055 Comm: udevadm Not tainted 5.11.0-rc2-00012-g0983834a8393 #19 [ 154.352227][ T4055] epc: 0000000000000bb0 ra : 0000003ff2cd3872 sp : 0000003fffe26a50 [ 154.353136][ T4055] gp : 0000000000052808 tp : 0000003ff28dd720 t0 : 0000003ff2cf5160 [ 154.354047][ T4055] t1 : 0000003ff2cd3bec t2 : 0000000000052570 s0 : 0000003fffe26af0 [ 154.354957][ T4055] s1 : 0000003ff2cd4000 a0 : 0000003ff2cd4000 a1 : 0000003fffe26b10 [ 154.355860][ T4055] a2 : 000000000003d790 a3 : 0000000000000002 a4 : 0000000000000002 [ 154.356739][ T4055] a5 : 000000000000000f a6 : ffffffffffffffff a7 : 0000000000000000 [ 154.366998][ T4055] s2 : 0000000000000001 s3 : 0000003fffe26b10 s4 : 0000003ff2cd4030 [ 154.372223][ T4055] s5 : 0000003ff2cd4010 s6 : 0000000000000068 s7 : 000000000003d000 [ 154.373192][ T4055] s8 : 0000003ff2cd4004 s9 : 0000003ff2cd4010 s10: 0000003ff2cd4008 [ 154.374114][ T4055] s11: 0000003ff2cd4038 t3 : 0000000000000bb0 t4 : 0000000000000002 [ 154.375023][ T4055] t5 : 0000000000000002 t6 : 0000000000003d40 [ 154.375793][ T4055] status: 0000000000004020 badaddr: 0000000000000bb0 cause: 000000000000000c Segmentation fault udevadm settle failed done Saving random seed: OK Starting network: [ 160.769276][ T4073] 8021q: adding VLAN 0 to HW filter on device eth0 udhcpc: started, v1.31.1 [ 161.642968][ T4074] udhcpc[4074]: unhandled signal 11 code 0x1 at 0x0000000000000bb0 in busybox[10000+d6000] [ 161.645275][ T4074] CPU: 0 PID: 4074 Comm: udhcpc Not tainted 5.11.0-rc2-00012-g0983834a8393 #19 [ 161.646515][ T4074] epc: 0000000000000bb0 ra : 0000003fd4d43872 sp : 0000003fffedf5c0 [ 161.661669][ T4074] gp : 00000000000e7c90 tp : 0000003fd4d42820 t0 : 0000003fd4d65160 [ 161.662875][ T4074] t1 : 0000003fd4d43bec t2 : 00000000000e7960 s0 : 0000003fffedf660 [ 161.663979][ T4074] s1 : 0000003fd4d44000 a0 : 0000003fd4d44000 a1 : 0000003fffedf690 [ 161.665110][ T4074] a2 : 0000000000000019 a3 : 0000000000000002 a4 : 0000000000000002 [ 161.666351][ T4074] a5 : 000000000000000f a6 : fefefefefefefeff a7 : 0000000000000040 [ 161.668642][ T4074] s2 : 0000000000000001 s3 : 0000003fffedf690 s4 : 0000003fd4d44030 [ 161.669785][ T4074] s5 : 0000003fd4d44010 s6 : 00000000149d82c3 s7 : 00000000000000fe [ 161.670921][ T4074] s8 : 0000003fd4d44004 s9 : 0000003fd4d44010 s10: 0000003fd4d44008 [ 161.672150][ T4074] s11: 0000003fd4d44038 t3 : 0000000000000bb0 t4 : 0000000000000002 [ 161.673355][ T4074] t5 : 0000000000000002 t6 : 0000000000003d40 [ 161.674303][ T4074] status: 8000000000004020 badaddr: 0000000000000bb0 cause: 000000000000000c FAIL Starting dhcpcd... [ 162.771471][ T4077] dhcpcd[4077]: unhandled signal 11 code 0x1 at 0x0000000000000bb0 in dhcpcd[10000+39000] [ 162.773414][ T4077] CPU: 0 PID: 4077 Comm: dhcpcd Not tainted 5.11.0-rc2-00012-g0983834a8393 #19 [ 162.774462][ T4077] epc: 0000000000000bb0 ra : 0000003fe6d12872 sp : 0000003fff8527e0 [ 162.775366][ T4077] gp : 000000000004adb8 tp : 0000003fe6d11250 t0 : 0000003fe6d34160 [ 162.776274][ T4077] t1 : 0000003fe6d12bec t2 : 0000000000049a00 s0 : 0000003fff852880 [ 162.777167][ T4077] s1 : 0000003fe6d13000 a0 : 0000003fe6d13000 a1 : 0000003fff8528a0 [ 162.779363][ T4077] a2 : 0000000000000004 a3 : 0000000000000002 a4 : 0000000000000002 [ 162.780279][ T4077] a5 : 000000000000000f a6 : 7efefefefefefeff a7 : fffffffffffff000 [ 162.781194][ T4077] s2 : 0000000000000001 s3 : 0000003fff8528a0 s4 : 0000003fe6d13030 [ 162.782106][ T4077] s5 : 0000003fe6d13010 s6 : 0000000000000000 s7 : 0000000000000000 [ 162.783015][ T4077] s8 : 0000003fe6d13004 s9 : 0000003fe6d13010 s10: 0000003fe6d13008 [ 162.783940][ T4077] s11: 0000003fe6d13038 t3 : 0000000000000bb0 t4 : 0000000000000002 [ 162.784853][ T4077] t5 : 0000000000000002 t6 : 0000000000003d40 [ 162.785618][ T4077] status: 8000000000006020 badaddr: 0000000000000bb0 cause: 000000000000000c Segmentation fault [ 164.074891][ T4079] ssh-keygen[4079]: unhandled signal 11 code 0x1 at 0x0000000000000bb0 in ssh-keygen[2ac3c68000+63000] [ 164.076916][ T4079] CPU: 1 PID: 4079 Comm: ssh-keygen Not tainted 5.11.0-rc2-00012-g0983834a8393 #19 [ 164.096635][ T4079] epc: 0000000000000bb0 ra : 0000003ff6899872 sp : 0000003fffed1330 [ 164.099233][ T4079] gp : 0000002ac3ccd448 tp : 0000003ff6435cd0 t0 : 0000003ff6897000 [ 164.100457][ T4079] t1 : 0000003ff6899bec t2 : 0000003ff6891940 s0 : 0000003fffed13d0 [ 164.101578][ T4079] s1 : 0000003ff689a000 a0 : 0000003ff689a000 a1 : 0000003fffed13f8 [ 164.102914][ T4079] a2 : 0000000000000000 a3 : 0000000000000001 a4 : 0000000000000001 [ 164.104058][ T4079] a5 : 000000000000000f a6 : 0000000000000000 a7 : 00000000000000ac [ 164.105150][ T4079] s2 : 0000000000000000 s3 : 0000003fffed13f8 s4 : 0000003ff689a020 [ 164.106241][ T4079] s5 : 0000003ff689a000 s6 : 0000003fd1861830 s7 : ffffffffffffffff [ 164.113694][ T4079] s8 : 0000003ff689a004 s9 : 0000003ff689a010 s10: 0000003ff689a008 [ 164.114869][ T4079] s11: 0000003ff689a028 t3 : 0000000000000bb0 t4 : 0000000000000002 [ 164.115972][ T4079] t5 : 0000000000000002 t6 : 0000000000003d40 [ 164.128360][ T4079] status: 8000000000004020 badaddr: 0000000000000bb0 cause: 000000000000000c Segmentation fault Starting sshd: [ 164.872315][ T4080] sshd[4080]: unhandled signal 11 code 0x1 at 0x0000000000000bb0 in sshd[2ac7ea7000+a4000] [ 164.874297][ T4080] CPU: 1 PID: 4080 Comm: sshd Not tainted 5.11.0-rc2-00012-g0983834a8393 #19 [ 164.875331][ T4080] epc: 0000000000000bb0 ra : 0000003ff2222872 sp : 0000003fffbea300 [ 164.876230][ T4080] gp : 0000002ac7f4f9d0 tp : 0000003ff1dbecd0 t0 : 0000003ff2220000 [ 164.877146][ T4080] t1 : 0000003ff2222bec t2 : 0000003ff221a940 s0 : 0000003fffbea3a0 [ 164.892174][ T4080] s1 : 0000003ff2223000 a0 : 0000003ff2223000 a1 : 0000003fffbea3c8 [ 164.893137][ T4080] a2 : 0000000000000000 a3 : 0000000000000001 a4 : 0000000000000001 [ 164.894065][ T4080] a5 : 000000000000000f a6 : 0000000000000000 a7 : 00000000000000ac [ 164.895013][ T4080] s2 : 0000000000000000 s3 : 0000003fffbea3c8 s4 : 0000003ff2223020 [ 164.895947][ T4080] s5 : 0000003ff2223000 s6 : 0000003fd1861830 s7 : ffffffffffffffff [ 164.896881][ T4080] s8 : 0000003ff2223004 s9 : 0000003ff2223010 s10: 0000003ff2223008 [ 164.905684][ T4080] s11: 0000003ff2223028 t3 : 0000000000000bb0 t4 : 0000000000000002 [ 164.906679][ T4080] t5 : 0000000000000002 t6 : 0000000000003d40 [ 164.908565][ T4080] status: 8000000000004020 badaddr: 0000000000000bb0 cause: 000000000000000c Segmentation fault OK syzkaller syzkaller login: [ 167.973016][ T4082] ------------[ cut here ]------------ [ 167.975887][ T4082] virt_to_phys used for non-linear address: 0000000059ffc026 (0xffffffd0158d105e) [ 167.979939][ T4082] WARNING: CPU: 0 PID: 4082 at arch/riscv/mm/physaddr.c:16 __virt_to_phys+0x74/0x78 [ 167.988658][ T4082] Modules linked in: [ 167.989781][ T4082] CPU: 0 PID: 4082 Comm: getty Not tainted 5.11.0-rc2-00012-g0983834a8393 #19 [ 167.991063][ T4082] epc: ffffffe000011164 ra : ffffffe000011164 sp : ffffffe01354fb10 [ 167.992243][ T4082] gp : ffffffe006234420 tp : ffffffe009c8ad80 t0 : ffffffe006cafb67 [ 167.993384][ T4082] t1 : 0000000000000001 t2 : 0000000000000000 s0 : ffffffe01354fb40 [ 167.994531][ T4082] s1 : fffffff0158d105e a0 : 000000000000004f a1 : 00000000000f0000 [ 167.995690][ T4082] a2 : 0000000000000002 a3 : ffffffe0000d1a30 a4 : 763e2d90a60ec500 [ 167.996803][ T4082] a5 : 763e2d90a60ec500 a6 : 0000000000f00000 a7 : ffffffe00009481c [ 167.999690][ T4082] s2 : ffffffd0158d105e s3 : 0000001fffffffff s4 : 0000000000000001 [ 168.000898][ T4082] s5 : ffffffd0158d105f s6 : ffffffd0158d3260 s7 : 0000003fff81eac8 [ 168.002093][ T4082] s8 : ffffffd0158d105e s9 : 0000000000000001 s10: 0000000000000000 [ 168.003226][ T4082] s11: 0000000000000000 t3 : 763e2d90a60ec500 t4 : ffffffc4026a9efd [ 168.004361][ T4082] t5 : ffffffc4026a9eff t6 : ffffffe01354f7f8 [ 168.005328][ T4082] status: 0000000000000120 badaddr: 0000000000000000 cause: 0000000000000003 [ 168.006756][ T4082] Kernel panic - not syncing: panic_on_warn set ... [ 168.008056][ T4082] CPU: 0 PID: 4082 Comm: getty Not tainted 5.11.0-rc2-00012-g0983834a8393 #19 [ 168.009301][ T4082] Call Trace: [ 168.009969][ T4082] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0 [ 168.011166][ T4082] [<ffffffe00458b2d8>] show_stack+0x3a/0x46 [ 168.012215][ T4082] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180 [ 168.013262][ T4082] [<ffffffe00458b6a0>] panic+0x20a/0x5cc [ 168.014264][ T4082] [<ffffffe000024210>] __warn+0x110/0x20a [ 168.015285][ T4082] [<ffffffe001759424>] report_bug+0x156/0x200 [ 168.016324][ T4082] [<ffffffe0000093f6>] do_trap_break+0xa6/0x152 [ 168.017431][ T4082] [<ffffffe00000559c>] ret_from_exception+0x0/0x14 [ 168.018560][ T4082] [<ffffffe0018c97bc>] n_tty_read+0x908/0x115a [ 168.020124][ T4082] SMP: stopping secondary CPUs [ 168.022087][ T4082] Rebooting in 86400 seconds.. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2021-01-18 15:43 ` Dmitry Vyukov 0 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-01-18 15:43 UTC (permalink / raw) To: Tobias Klauser Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, linux-riscv On Mon, Jan 18, 2021 at 4:05 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Mon, Jan 18, 2021 at 3:53 PM Tobias Klauser <tklauser@distanz.ch> wrote: > > > > On Thu, Jan 14, 2021 at 5:57 AM Palmer Dabbelt <palmerdabbelt@google.com> wrote: > > > > > > > > > > On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote: > > > > > > On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote: > > > > > >> > > > > > >> On Dez 25 2020, Dmitry Vyukov wrote: > > > > > >> > > > > > >> > qemu-system-riscv64 \ > > > > > >> > -machine virt -bios default -smp 1 -m 2G \ > > > > > >> > -device virtio-blk-device,drive=hd0 \ > > > > > >> > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \ > > > > > >> > -kernel arch/riscv/boot/Image \ > > > > > >> > -nographic \ > > > > > >> > -device virtio-rng-device,rng=rng0 -object > > > > > >> > rng-random,filename=/dev/urandom,id=rng0 \ > > > > > >> > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > > > > > >> > virtio-net-device,netdev=net0 \ > > > > > >> > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > > > > > >> > panic_on_warn=1 panic=86400" > > > > > >> > > > > > >> Do you get more output with earlycon=sbi? > > > > > > > > > > > > Hi Andreas, > > > > > > > > > > > > For defconfig+kvm_guest.config+ scripts/config -e KASAN -e > > > > > > KASAN_INLINE it actually gave me more output: > > > > > > > > > > > > > > > > > > OpenSBI v0.7 > > > > > > ____ _____ ____ _____ > > > > > > / __ \ / ____| _ \_ _| > > > > > > | | | |_ __ ___ _ __ | (___ | |_) || | > > > > > > | | | | '_ \ / _ \ '_ \ \___ \| _ < | | > > > > > > | |__| | |_) | __/ | | |____) | |_) || |_ > > > > > > \____/| .__/ \___|_| |_|_____/|____/_____| > > > > > > | | > > > > > > |_| > > > > > > > > > > > > Platform Name : QEMU Virt Machine > > > > > > Platform HART Features : RV64ACDFIMSU > > > > > > Current Hart : 0 > > > > > > Firmware Base : 0x80000000 > > > > > > Firmware Size : 132 KB > > > > > > Runtime SBI Version : 0.2 > > > > > > > > > > > > MIDELEG : 0x0000000000000222 > > > > > > MEDELEG : 0x000000000000b109 > > > > > > PMP0 : 0x0000000080000000-0x000000008003ffff (A) > > > > > > PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X) > > > > > > [ 0.000000] Linux version 5.10.0-01370-g71c5f03154ac > > > > > > (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc > > > > > > (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17 > > > > > > SMP Fri Dec 25 18:10:12 CET 2020 > > > > > > [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 > > > > > > [ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '') > > > > > > [ 0.000000] printk: bootconsole [sbi0] enabled > > > > > > [ 0.000000] efi: UEFI not found. > > > > > > [ 0.000000] Zone ranges: > > > > > > [ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000ffffffff] > > > > > > [ 0.000000] Normal empty > > > > > > [ 0.000000] Movable zone start for each node > > > > > > [ 0.000000] Early memory node ranges > > > > > > [ 0.000000] node 0: [mem 0x0000000080200000-0x00000000ffffffff] > > > > > > [ 0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff] > > > > > > [ 0.000000] SBI specification v0.2 detected > > > > > > [ 0.000000] SBI implementation ID=0x1 Version=0x7 > > > > > > [ 0.000000] SBI v0.2 TIME extension detected > > > > > > [ 0.000000] SBI v0.2 IPI extension detected > > > > > > [ 0.000000] SBI v0.2 RFENCE extension detected > > > > > > [ 0.000000] software IO TLB: mapped [mem > > > > > > 0x00000000fa3f9000-0x00000000fe3f9000] (64MB) > > > > > > [ 0.000000] Unable to handle kernel paging request at virtual > > > > > > address dfffffc810040000 > > > > > > [ 0.000000] Oops [#1] > > > > > > [ 0.000000] Modules linked in: > > > > > > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted > > > > > > 5.10.0-01370-g71c5f03154ac #17 > > > > > > [ 0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0 > > > > > > [ 0.000000] gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 : > > > > > > dfffffc810040000 > > > > > > [ 0.000000] t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 : > > > > > > ffffffe001603f50 > > > > > > [ 0.000000] s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 : > > > > > > 0000000000000000 > > > > > > [ 0.000000] a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 : > > > > > > 0000000000000000 > > > > > > [ 0.000000] a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 : > > > > > > 0000000000000900 > > > > > > [ 0.000000] s2 : dfffffc820000000 s3 : dfffffc800000000 s4 : > > > > > > 0000000000000001 > > > > > > [ 0.000000] s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 : > > > > > > dfffffc810040000 > > > > > > [ 0.000000] s8 : 0000000000000080 s9 : ffffffffffffffff s10: > > > > > > ffffffe07a119000 > > > > > > [ 0.000000] s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 : > > > > > > 0000000000000001 > > > > > > [ 0.000000] t5 : ffffffc4001c150a t6 : ffffffe001603be8 > > > > > > [ 0.000000] status: 0000000000000100 badaddr: dfffffc810040000 > > > > > > cause: 000000000000000f > > > > > > [ 0.000000] random: get_random_bytes called from > > > > > > oops_exit+0x30/0x58 with crng_init=0 > > > > > > [ 0.000000] ---[ end trace 0000000000000000 ]--- > > > > > > [ 0.000000] Kernel panic - not syncing: Fatal exception > > > > > > [ 0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > > > > > > > > > > > > > > > > > But I first tried with a the kernel image I had in the dir, I think it > > > > > > was this config (no KASAN): > > > > > > https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt > > > > > > > > > > > > and earlycon=sbi did not change anything (no output after OpenSBI). > > > > > > So potentially there are 2 different problems. > > > > > > > > > > Thanks for reporting this. Looks like I'd forgotten to add a kasan config to > > > > > my tests. There's one in there now, and it's passing as of the fix that Nylon > > > > > posted. > > > > > > > > I can boot the KASAN kernel now on riscv/fixes. > > > > > > > > Next problem: I've got only to: > > > > > > > > [ 90.498967][ T1] Run /sbin/init as init process > > > > [ 91.164353][ T4022] init[4022]: unhandled signal 11 code 0x1 at > > > > 0x0000000000000bb0 in busybox[10000+d7000] > > > > [ 91.179640][ T4022] CPU: 1 PID: 4022 Comm: init Not tainted > > > > 5.11.0-rc2-00012-g0983834a8393 #19 > > > > [ 91.180853][ T4022] epc: 0000000000000bb0 ra : 0000003fccab09d0 sp > > > > : 0000003fffa8c7b0 > > > > [ 91.181861][ T4022] gp : 00000000000e8d70 tp : 0000003fccaaf820 t0 > > > > : 000000000000001e > > > > [ 91.182810][ T4022] t1 : 0000003fccab0bfc t2 : 000000000000000a s0 > > > > : 0000003fffa8c850 > > > > [ 91.183749][ T4022] s1 : 0000003fccab1070 a0 : 0000003fccab1070 a1 > > > > : 0000003fffa8c8c8 > > > > [ 91.184689][ T4022] a2 : 0000000000000001 a3 : 0000000000000020 a4 > > > > : 0000000000000000 > > > > [ 91.185620][ T4022] a5 : 0000000000000000 a6 : 0000003fcc9c4260 a7 > > > > : fffffffffffffffe > > > > [ 91.186566][ T4022] s2 : 0000000000000000 s3 : 0000003fffa8c8c8 s4 > > > > : 0000003fccab1000 > > > > [ 91.187500][ T4022] s5 : 0000003fccab1078 s6 : 0000003fffa8c8d0 s7 > > > > : 0000000000000010 > > > > [ 91.189672][ T4022] s8 : 0000000000000016 s9 : 0000000000000000 > > > > s10: 0000003fffa8c8c8 > > > > [ 91.190637][ T4022] s11: 0000000000000000 t3 : 0000000000000bb0 t4 > > > > : 0000000000000000 > > > > [ 91.191568][ T4022] t5 : 0000003fffa8c360 t6 : 0000000000000000 > > > > [ 91.192389][ T4022] status: 8000000000004020 badaddr: > > > > 0000000000000bb0 cause: 000000000000000c > > > > [ 91.201573][ T1] Kernel panic - not syncing: Attempted to kill > > > > init! exitcode=0x0000000b > > > > [ 91.202906][ T1] CPU: 0 PID: 1 Comm: init Not tainted > > > > 5.11.0-rc2-00012-g0983834a8393 #19 > > > > [ 91.204139][ T1] Call Trace: > > > > [ 91.204849][ T1] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0 > > > > [ 91.206124][ T1] [<ffffffe00458b2d8>] show_stack+0x3a/0x46 > > > > [ 91.207240][ T1] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180 > > > > [ 91.208732][ T1] [<ffffffe00458b6a0>] panic+0x20a/0x5cc > > > > [ 91.209890][ T1] [<ffffffe00002eea4>] do_exit+0x1846/0x1874 > > > > [ 91.211052][ T1] [<ffffffe00002efdc>] do_group_exit+0xa0/0x192 > > > > [ 91.212224][ T1] [<ffffffe000047d30>] get_signal+0x2d6/0x13dc > > > > [ 91.213390][ T1] [<ffffffe000007eb0>] do_notify_resume+0xa8/0x912 > > > > [ 91.214567][ T1] [<ffffffe00000559c>] ret_from_exception+0x0/0x14 > > > > > > > > The image is buildroot on 2020.11.x built with this script: > > > > https://gist.githubusercontent.com/dvyukov/1a9a01ca2189e35175a021820c95b04d/raw/5c01d755e83f4eab0d56aa7dc84af3b2d5e80423/gistfile1.txt > > > > > > > > Readelf for init shows the following (is it that [10000+d7000] address > > > > is not .text at all?): > > > > > > > > $ riscv64-linux-gnu-readelf --sections image/bin/busybox > > > > There are 27 section headers, starting at offset 0xd7f20: > > > > > > > > Section Headers: > > > > [Nr] Name Type Address Offset > > > > Size EntSize Flags Link Info Align > > > > [ 0] NULL 0000000000000000 00000000 > > > > 0000000000000000 0000000000000000 0 0 0 > > > > [ 1] .interp PROGBITS 0000000000010238 00000238 > > > > 0000000000000021 0000000000000000 A 0 0 1 > > > > [ 2] .note.ABI-tag NOTE 000000000001025c 0000025c > > > > 0000000000000020 0000000000000000 A 0 0 4 > > > > [ 3] .hash HASH 0000000000010280 00000280 > > > > 00000000000009cc 0000000000000004 A 5 0 8 > > > > [ 4] .gnu.hash GNU_HASH 0000000000010c50 00000c50 > > > > 0000000000000ac8 0000000000000000 A 5 0 8 > > > > [ 5] .dynsym DYNSYM 0000000000011718 00001718 > > > > 00000000000021f0 0000000000000018 A 6 1 8 > > > > [ 6] .dynstr STRTAB 0000000000013908 00003908 > > > > 0000000000000c66 0000000000000000 A 0 0 1 > > > > [ 7] .gnu.version VERSYM 000000000001456e 0000456e > > > > 00000000000002d4 0000000000000002 A 5 0 2 > > > > [ 8] .gnu.version_r VERNEED 0000000000014848 00004848 > > > > 0000000000000050 0000000000000000 A 6 2 8 > > > > [ 9] .rela.dyn RELA 0000000000014898 00004898 > > > > 00000000000000c0 0000000000000018 A 5 0 8 > > > > [10] .rela.plt RELA 0000000000014958 00004958 > > > > 00000000000020a0 0000000000000018 AI 5 22 8 > > > > [11] .plt PROGBITS 0000000000016a00 00006a00 > > > > 00000000000015e0 0000000000000010 AX 0 0 16 > > > > [12] .text PROGBITS 0000000000017fe0 00007fe0 > > > > 00000000000a3668 0000000000000000 AX 0 0 4 > > > > [13] .rodata PROGBITS 00000000000bb648 000ab648 > > > > 000000000002b076 0000000000000000 A 0 0 8 > > > > [14] .sdata2 PROGBITS 00000000000e66c0 000d66c0 > > > > 0000000000000163 0000000000000000 A 0 0 8 > > > > [15] .eh_frame_hdr PROGBITS 00000000000e6824 000d6824 > > > > 0000000000000014 0000000000000000 A 0 0 4 > > > > [16] .eh_frame PROGBITS 00000000000e6838 000d6838 > > > > 000000000000002c 0000000000000000 A 0 0 8 > > > > [17] .preinit_array PREINIT_ARRAY 00000000000e7df8 000d6df8 > > > > 0000000000000008 0000000000000008 WA 0 0 1 > > > > [18] .init_array INIT_ARRAY 00000000000e7e00 000d6e00 > > > > 0000000000000008 0000000000000008 WA 0 0 8 > > > > [19] .fini_array FINI_ARRAY 00000000000e7e08 000d6e08 > > > > 0000000000000008 0000000000000008 WA 0 0 8 > > > > [20] .dynamic DYNAMIC 00000000000e7e10 000d6e10 > > > > 00000000000001f0 0000000000000010 WA 6 0 8 > > > > [21] .data PROGBITS 00000000000e8000 000d7000 > > > > 0000000000000240 0000000000000000 WA 0 0 8 > > > > [22] .got PROGBITS 00000000000e8240 000d7240 > > > > 0000000000000af8 0000000000000008 WA 0 0 8 > > > > [23] .sdata PROGBITS 00000000000e8d38 000d7d38 > > > > 0000000000000101 0000000000000000 WA 0 0 8 > > > > [24] .sbss NOBITS 00000000000e8e40 000d7e39 > > > > 000000000000017f 0000000000000000 WA 0 0 8 > > > > [25] .bss NOBITS 00000000000e8fc0 000d7e39 > > > > 00000000000005b0 0000000000000000 WA 0 0 8 > > > > [26] .shstrtab STRTAB 0000000000000000 000d7e39 > > > > 00000000000000e6 0000000000000000 0 0 1 > > > > > > > > > > > > Before I spent more time on this, am I doing anything obviously wrong? > > > > Is it a known issue? Are there any fresh working recipes? > > > > > > Humm.. I tried to use 2020.05 which Tobias used here: > > > https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_riscv64-kernel.md#image > > > But there is no make qemu_riscv64_virt_defconfig target... though I > > > remember I tested these instructions at the time... > > > > Weird. `make qemu_riscv64_virt_defconfig` works here on buildroot > > 202.05, 2020.11.1 and on latest master. > > > > Do you see these in your configs/ directory? > > > > $ ls -l configs/qemu_riscv* > > -rw-rw-r-- 1 tklauser tklauser 673 Jan 18 15:51 configs/qemu_riscv32_virt_defconfig > > -rw-rw-r-- 1 tklauser tklauser 682 Jan 18 15:51 configs/qemu_riscv64_virt_defconfig > > Oh, turned out I previously checked out 2011.05 somehow... > Yes, 2020.05 has qemu_riscv64_virt_defconfig and I am building it now. > 2020.11 has the config, but init crashes (see above). 2020.05 is a bit better, but still failed in several ways. First, a number of user-space services including sshd still crashed. Second, kernel also crashed a bit later. And 2020.11 seems to regress even more. It's with the same kernel from the previous email (I did not rebuilt it). 2020.05 buildroot: [ 90.381218][ T1] devtmpfs: mounted [ 90.534531][ T1] Freeing unused kernel memory: 2328K [ 90.537085][ T1] Run /sbin/init as init process [ 91.754610][ T4022] EXT4-fs (vda): re-mounted. Opts: (null). Quota mode: none. Starting syslogd: OK Starting klogd: OK Running sysctl: OK Populating /dev using udev: [ 99.413418][ T4051] udevd[4051]: starting version 3.2.9 [ 100.480500][ T4052] udevd[4052]: starting eudev-3.2.9 [ 101.904876][ T4052] udevd[4052]: unhandled signal 11 code 0x1 at 0x0000000000000bb0 in udevd[10000+35000] [ 101.911401][ T4052] CPU: 1 PID: 4052 Comm: udevd Not tainted 5.11.0-rc2-00012-g0983834a8393 #19 [ 101.913136][ T4052] epc: 0000000000000bb0 ra : 0000003ff5921872 sp : 0000003fffb0c3a0 [ 101.914593][ T4052] gp : 000000000004f908 tp : 0000003ff552b720 t0 : 0000003ff5943160 [ 101.915740][ T4052] t1 : 0000003ff5921bec t2 : 000000000004f450 s0 : 0000003fffb0c440 [ 101.916872][ T4052] s1 : 0000003ff5922000 a0 : 0000003ff5922000 a1 : 0000003fffb0c460 [ 101.949318][ T4052] a2 : 0000000000000001 a3 : 0000000000000002 a4 : 0000000000000002 [ 101.950529][ T4052] a5 : 000000000000000f a6 : 0000000000000007 a7 : 0000000000000016 [ 101.951653][ T4052] s2 : 0000000000000001 s3 : 0000003fffb0c460 s4 : 0000003ff5922030 [ 101.952771][ T4052] s5 : 0000003ff5922010 s6 : 0000000000000000 s7 : 0000000000000000 [ 101.953878][ T4052] s8 : 0000003ff5922004 s9 : 0000003ff5922010 s10: 0000003ff5922008 [ 101.955016][ T4052] s11: 0000003ff5922038 t3 : 0000000000000bb0 t4 : 0000000000000002 [ 101.956122][ T4052] t5 : 0000000000000002 t6 : 0000000000003d40 [ 101.957072][ T4052] status: 8000000000004020 badaddr: 0000000000000bb0 cause: 000000000000000c [ 154.349233][ T4055] udevadm[4055]: unhandled signal 11 code 0x1 at 0x0000000000000bb0 in udevadm[10000+38000] [ 154.351201][ T4055] CPU: 0 PID: 4055 Comm: udevadm Not tainted 5.11.0-rc2-00012-g0983834a8393 #19 [ 154.352227][ T4055] epc: 0000000000000bb0 ra : 0000003ff2cd3872 sp : 0000003fffe26a50 [ 154.353136][ T4055] gp : 0000000000052808 tp : 0000003ff28dd720 t0 : 0000003ff2cf5160 [ 154.354047][ T4055] t1 : 0000003ff2cd3bec t2 : 0000000000052570 s0 : 0000003fffe26af0 [ 154.354957][ T4055] s1 : 0000003ff2cd4000 a0 : 0000003ff2cd4000 a1 : 0000003fffe26b10 [ 154.355860][ T4055] a2 : 000000000003d790 a3 : 0000000000000002 a4 : 0000000000000002 [ 154.356739][ T4055] a5 : 000000000000000f a6 : ffffffffffffffff a7 : 0000000000000000 [ 154.366998][ T4055] s2 : 0000000000000001 s3 : 0000003fffe26b10 s4 : 0000003ff2cd4030 [ 154.372223][ T4055] s5 : 0000003ff2cd4010 s6 : 0000000000000068 s7 : 000000000003d000 [ 154.373192][ T4055] s8 : 0000003ff2cd4004 s9 : 0000003ff2cd4010 s10: 0000003ff2cd4008 [ 154.374114][ T4055] s11: 0000003ff2cd4038 t3 : 0000000000000bb0 t4 : 0000000000000002 [ 154.375023][ T4055] t5 : 0000000000000002 t6 : 0000000000003d40 [ 154.375793][ T4055] status: 0000000000004020 badaddr: 0000000000000bb0 cause: 000000000000000c Segmentation fault udevadm settle failed done Saving random seed: OK Starting network: [ 160.769276][ T4073] 8021q: adding VLAN 0 to HW filter on device eth0 udhcpc: started, v1.31.1 [ 161.642968][ T4074] udhcpc[4074]: unhandled signal 11 code 0x1 at 0x0000000000000bb0 in busybox[10000+d6000] [ 161.645275][ T4074] CPU: 0 PID: 4074 Comm: udhcpc Not tainted 5.11.0-rc2-00012-g0983834a8393 #19 [ 161.646515][ T4074] epc: 0000000000000bb0 ra : 0000003fd4d43872 sp : 0000003fffedf5c0 [ 161.661669][ T4074] gp : 00000000000e7c90 tp : 0000003fd4d42820 t0 : 0000003fd4d65160 [ 161.662875][ T4074] t1 : 0000003fd4d43bec t2 : 00000000000e7960 s0 : 0000003fffedf660 [ 161.663979][ T4074] s1 : 0000003fd4d44000 a0 : 0000003fd4d44000 a1 : 0000003fffedf690 [ 161.665110][ T4074] a2 : 0000000000000019 a3 : 0000000000000002 a4 : 0000000000000002 [ 161.666351][ T4074] a5 : 000000000000000f a6 : fefefefefefefeff a7 : 0000000000000040 [ 161.668642][ T4074] s2 : 0000000000000001 s3 : 0000003fffedf690 s4 : 0000003fd4d44030 [ 161.669785][ T4074] s5 : 0000003fd4d44010 s6 : 00000000149d82c3 s7 : 00000000000000fe [ 161.670921][ T4074] s8 : 0000003fd4d44004 s9 : 0000003fd4d44010 s10: 0000003fd4d44008 [ 161.672150][ T4074] s11: 0000003fd4d44038 t3 : 0000000000000bb0 t4 : 0000000000000002 [ 161.673355][ T4074] t5 : 0000000000000002 t6 : 0000000000003d40 [ 161.674303][ T4074] status: 8000000000004020 badaddr: 0000000000000bb0 cause: 000000000000000c FAIL Starting dhcpcd... [ 162.771471][ T4077] dhcpcd[4077]: unhandled signal 11 code 0x1 at 0x0000000000000bb0 in dhcpcd[10000+39000] [ 162.773414][ T4077] CPU: 0 PID: 4077 Comm: dhcpcd Not tainted 5.11.0-rc2-00012-g0983834a8393 #19 [ 162.774462][ T4077] epc: 0000000000000bb0 ra : 0000003fe6d12872 sp : 0000003fff8527e0 [ 162.775366][ T4077] gp : 000000000004adb8 tp : 0000003fe6d11250 t0 : 0000003fe6d34160 [ 162.776274][ T4077] t1 : 0000003fe6d12bec t2 : 0000000000049a00 s0 : 0000003fff852880 [ 162.777167][ T4077] s1 : 0000003fe6d13000 a0 : 0000003fe6d13000 a1 : 0000003fff8528a0 [ 162.779363][ T4077] a2 : 0000000000000004 a3 : 0000000000000002 a4 : 0000000000000002 [ 162.780279][ T4077] a5 : 000000000000000f a6 : 7efefefefefefeff a7 : fffffffffffff000 [ 162.781194][ T4077] s2 : 0000000000000001 s3 : 0000003fff8528a0 s4 : 0000003fe6d13030 [ 162.782106][ T4077] s5 : 0000003fe6d13010 s6 : 0000000000000000 s7 : 0000000000000000 [ 162.783015][ T4077] s8 : 0000003fe6d13004 s9 : 0000003fe6d13010 s10: 0000003fe6d13008 [ 162.783940][ T4077] s11: 0000003fe6d13038 t3 : 0000000000000bb0 t4 : 0000000000000002 [ 162.784853][ T4077] t5 : 0000000000000002 t6 : 0000000000003d40 [ 162.785618][ T4077] status: 8000000000006020 badaddr: 0000000000000bb0 cause: 000000000000000c Segmentation fault [ 164.074891][ T4079] ssh-keygen[4079]: unhandled signal 11 code 0x1 at 0x0000000000000bb0 in ssh-keygen[2ac3c68000+63000] [ 164.076916][ T4079] CPU: 1 PID: 4079 Comm: ssh-keygen Not tainted 5.11.0-rc2-00012-g0983834a8393 #19 [ 164.096635][ T4079] epc: 0000000000000bb0 ra : 0000003ff6899872 sp : 0000003fffed1330 [ 164.099233][ T4079] gp : 0000002ac3ccd448 tp : 0000003ff6435cd0 t0 : 0000003ff6897000 [ 164.100457][ T4079] t1 : 0000003ff6899bec t2 : 0000003ff6891940 s0 : 0000003fffed13d0 [ 164.101578][ T4079] s1 : 0000003ff689a000 a0 : 0000003ff689a000 a1 : 0000003fffed13f8 [ 164.102914][ T4079] a2 : 0000000000000000 a3 : 0000000000000001 a4 : 0000000000000001 [ 164.104058][ T4079] a5 : 000000000000000f a6 : 0000000000000000 a7 : 00000000000000ac [ 164.105150][ T4079] s2 : 0000000000000000 s3 : 0000003fffed13f8 s4 : 0000003ff689a020 [ 164.106241][ T4079] s5 : 0000003ff689a000 s6 : 0000003fd1861830 s7 : ffffffffffffffff [ 164.113694][ T4079] s8 : 0000003ff689a004 s9 : 0000003ff689a010 s10: 0000003ff689a008 [ 164.114869][ T4079] s11: 0000003ff689a028 t3 : 0000000000000bb0 t4 : 0000000000000002 [ 164.115972][ T4079] t5 : 0000000000000002 t6 : 0000000000003d40 [ 164.128360][ T4079] status: 8000000000004020 badaddr: 0000000000000bb0 cause: 000000000000000c Segmentation fault Starting sshd: [ 164.872315][ T4080] sshd[4080]: unhandled signal 11 code 0x1 at 0x0000000000000bb0 in sshd[2ac7ea7000+a4000] [ 164.874297][ T4080] CPU: 1 PID: 4080 Comm: sshd Not tainted 5.11.0-rc2-00012-g0983834a8393 #19 [ 164.875331][ T4080] epc: 0000000000000bb0 ra : 0000003ff2222872 sp : 0000003fffbea300 [ 164.876230][ T4080] gp : 0000002ac7f4f9d0 tp : 0000003ff1dbecd0 t0 : 0000003ff2220000 [ 164.877146][ T4080] t1 : 0000003ff2222bec t2 : 0000003ff221a940 s0 : 0000003fffbea3a0 [ 164.892174][ T4080] s1 : 0000003ff2223000 a0 : 0000003ff2223000 a1 : 0000003fffbea3c8 [ 164.893137][ T4080] a2 : 0000000000000000 a3 : 0000000000000001 a4 : 0000000000000001 [ 164.894065][ T4080] a5 : 000000000000000f a6 : 0000000000000000 a7 : 00000000000000ac [ 164.895013][ T4080] s2 : 0000000000000000 s3 : 0000003fffbea3c8 s4 : 0000003ff2223020 [ 164.895947][ T4080] s5 : 0000003ff2223000 s6 : 0000003fd1861830 s7 : ffffffffffffffff [ 164.896881][ T4080] s8 : 0000003ff2223004 s9 : 0000003ff2223010 s10: 0000003ff2223008 [ 164.905684][ T4080] s11: 0000003ff2223028 t3 : 0000000000000bb0 t4 : 0000000000000002 [ 164.906679][ T4080] t5 : 0000000000000002 t6 : 0000000000003d40 [ 164.908565][ T4080] status: 8000000000004020 badaddr: 0000000000000bb0 cause: 000000000000000c Segmentation fault OK syzkaller syzkaller login: [ 167.973016][ T4082] ------------[ cut here ]------------ [ 167.975887][ T4082] virt_to_phys used for non-linear address: 0000000059ffc026 (0xffffffd0158d105e) [ 167.979939][ T4082] WARNING: CPU: 0 PID: 4082 at arch/riscv/mm/physaddr.c:16 __virt_to_phys+0x74/0x78 [ 167.988658][ T4082] Modules linked in: [ 167.989781][ T4082] CPU: 0 PID: 4082 Comm: getty Not tainted 5.11.0-rc2-00012-g0983834a8393 #19 [ 167.991063][ T4082] epc: ffffffe000011164 ra : ffffffe000011164 sp : ffffffe01354fb10 [ 167.992243][ T4082] gp : ffffffe006234420 tp : ffffffe009c8ad80 t0 : ffffffe006cafb67 [ 167.993384][ T4082] t1 : 0000000000000001 t2 : 0000000000000000 s0 : ffffffe01354fb40 [ 167.994531][ T4082] s1 : fffffff0158d105e a0 : 000000000000004f a1 : 00000000000f0000 [ 167.995690][ T4082] a2 : 0000000000000002 a3 : ffffffe0000d1a30 a4 : 763e2d90a60ec500 [ 167.996803][ T4082] a5 : 763e2d90a60ec500 a6 : 0000000000f00000 a7 : ffffffe00009481c [ 167.999690][ T4082] s2 : ffffffd0158d105e s3 : 0000001fffffffff s4 : 0000000000000001 [ 168.000898][ T4082] s5 : ffffffd0158d105f s6 : ffffffd0158d3260 s7 : 0000003fff81eac8 [ 168.002093][ T4082] s8 : ffffffd0158d105e s9 : 0000000000000001 s10: 0000000000000000 [ 168.003226][ T4082] s11: 0000000000000000 t3 : 763e2d90a60ec500 t4 : ffffffc4026a9efd [ 168.004361][ T4082] t5 : ffffffc4026a9eff t6 : ffffffe01354f7f8 [ 168.005328][ T4082] status: 0000000000000120 badaddr: 0000000000000000 cause: 0000000000000003 [ 168.006756][ T4082] Kernel panic - not syncing: panic_on_warn set ... [ 168.008056][ T4082] CPU: 0 PID: 4082 Comm: getty Not tainted 5.11.0-rc2-00012-g0983834a8393 #19 [ 168.009301][ T4082] Call Trace: [ 168.009969][ T4082] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0 [ 168.011166][ T4082] [<ffffffe00458b2d8>] show_stack+0x3a/0x46 [ 168.012215][ T4082] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180 [ 168.013262][ T4082] [<ffffffe00458b6a0>] panic+0x20a/0x5cc [ 168.014264][ T4082] [<ffffffe000024210>] __warn+0x110/0x20a [ 168.015285][ T4082] [<ffffffe001759424>] report_bug+0x156/0x200 [ 168.016324][ T4082] [<ffffffe0000093f6>] do_trap_break+0xa6/0x152 [ 168.017431][ T4082] [<ffffffe00000559c>] ret_from_exception+0x0/0x14 [ 168.018560][ T4082] [<ffffffe0018c97bc>] n_tty_read+0x908/0x115a [ 168.020124][ T4082] SMP: stopping secondary CPUs [ 168.022087][ T4082] Rebooting in 86400 seconds.. _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2021-01-18 15:43 ` Dmitry Vyukov @ 2021-01-29 7:45 ` Alex Ghiti -1 siblings, 0 replies; 55+ messages in thread From: Alex Ghiti @ 2021-01-29 7:45 UTC (permalink / raw) To: Dmitry Vyukov, Tobias Klauser Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, linux-riscv Hi Dmitry, On 1/18/21 10:43 AM, Dmitry Vyukov wrote: > On Mon, Jan 18, 2021 at 4:05 PM Dmitry Vyukov <dvyukov@google.com> wrote: >> >> On Mon, Jan 18, 2021 at 3:53 PM Tobias Klauser <tklauser@distanz.ch> wrote: >>>>> On Thu, Jan 14, 2021 at 5:57 AM Palmer Dabbelt <palmerdabbelt@google.com> wrote: >>>>>> >>>>>> On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote: >>>>>>> On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote: >>>>>>>> >>>>>>>> On Dez 25 2020, Dmitry Vyukov wrote: >>>>>>>> >>>>>>>>> qemu-system-riscv64 \ >>>>>>>>> -machine virt -bios default -smp 1 -m 2G \ >>>>>>>>> -device virtio-blk-device,drive=hd0 \ >>>>>>>>> -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \ >>>>>>>>> -kernel arch/riscv/boot/Image \ >>>>>>>>> -nographic \ >>>>>>>>> -device virtio-rng-device,rng=rng0 -object >>>>>>>>> rng-random,filename=/dev/urandom,id=rng0 \ >>>>>>>>> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device >>>>>>>>> virtio-net-device,netdev=net0 \ >>>>>>>>> -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic >>>>>>>>> panic_on_warn=1 panic=86400" >>>>>>>> >>>>>>>> Do you get more output with earlycon=sbi? >>>>>>> >>>>>>> Hi Andreas, >>>>>>> >>>>>>> For defconfig+kvm_guest.config+ scripts/config -e KASAN -e >>>>>>> KASAN_INLINE it actually gave me more output: >>>>>>> >>>>>>> >>>>>>> OpenSBI v0.7 >>>>>>> ____ _____ ____ _____ >>>>>>> / __ \ / ____| _ \_ _| >>>>>>> | | | |_ __ ___ _ __ | (___ | |_) || | >>>>>>> | | | | '_ \ / _ \ '_ \ \___ \| _ < | | >>>>>>> | |__| | |_) | __/ | | |____) | |_) || |_ >>>>>>> \____/| .__/ \___|_| |_|_____/|____/_____| >>>>>>> | | >>>>>>> |_| >>>>>>> >>>>>>> Platform Name : QEMU Virt Machine >>>>>>> Platform HART Features : RV64ACDFIMSU >>>>>>> Current Hart : 0 >>>>>>> Firmware Base : 0x80000000 >>>>>>> Firmware Size : 132 KB >>>>>>> Runtime SBI Version : 0.2 >>>>>>> >>>>>>> MIDELEG : 0x0000000000000222 >>>>>>> MEDELEG : 0x000000000000b109 >>>>>>> PMP0 : 0x0000000080000000-0x000000008003ffff (A) >>>>>>> PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X) >>>>>>> [ 0.000000] Linux version 5.10.0-01370-g71c5f03154ac >>>>>>> (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc >>>>>>> (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17 >>>>>>> SMP Fri Dec 25 18:10:12 CET 2020 >>>>>>> [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 >>>>>>> [ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '') >>>>>>> [ 0.000000] printk: bootconsole [sbi0] enabled >>>>>>> [ 0.000000] efi: UEFI not found. >>>>>>> [ 0.000000] Zone ranges: >>>>>>> [ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000ffffffff] >>>>>>> [ 0.000000] Normal empty >>>>>>> [ 0.000000] Movable zone start for each node >>>>>>> [ 0.000000] Early memory node ranges >>>>>>> [ 0.000000] node 0: [mem 0x0000000080200000-0x00000000ffffffff] >>>>>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff] >>>>>>> [ 0.000000] SBI specification v0.2 detected >>>>>>> [ 0.000000] SBI implementation ID=0x1 Version=0x7 >>>>>>> [ 0.000000] SBI v0.2 TIME extension detected >>>>>>> [ 0.000000] SBI v0.2 IPI extension detected >>>>>>> [ 0.000000] SBI v0.2 RFENCE extension detected >>>>>>> [ 0.000000] software IO TLB: mapped [mem >>>>>>> 0x00000000fa3f9000-0x00000000fe3f9000] (64MB) >>>>>>> [ 0.000000] Unable to handle kernel paging request at virtual >>>>>>> address dfffffc810040000 >>>>>>> [ 0.000000] Oops [#1] >>>>>>> [ 0.000000] Modules linked in: >>>>>>> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted >>>>>>> 5.10.0-01370-g71c5f03154ac #17 >>>>>>> [ 0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0 >>>>>>> [ 0.000000] gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 : >>>>>>> dfffffc810040000 >>>>>>> [ 0.000000] t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 : >>>>>>> ffffffe001603f50 >>>>>>> [ 0.000000] s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 : >>>>>>> 0000000000000000 >>>>>>> [ 0.000000] a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 : >>>>>>> 0000000000000000 >>>>>>> [ 0.000000] a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 : >>>>>>> 0000000000000900 >>>>>>> [ 0.000000] s2 : dfffffc820000000 s3 : dfffffc800000000 s4 : >>>>>>> 0000000000000001 >>>>>>> [ 0.000000] s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 : >>>>>>> dfffffc810040000 >>>>>>> [ 0.000000] s8 : 0000000000000080 s9 : ffffffffffffffff s10: >>>>>>> ffffffe07a119000 >>>>>>> [ 0.000000] s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 : >>>>>>> 0000000000000001 >>>>>>> [ 0.000000] t5 : ffffffc4001c150a t6 : ffffffe001603be8 >>>>>>> [ 0.000000] status: 0000000000000100 badaddr: dfffffc810040000 >>>>>>> cause: 000000000000000f >>>>>>> [ 0.000000] random: get_random_bytes called from >>>>>>> oops_exit+0x30/0x58 with crng_init=0 >>>>>>> [ 0.000000] ---[ end trace 0000000000000000 ]--- >>>>>>> [ 0.000000] Kernel panic - not syncing: Fatal exception >>>>>>> [ 0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]--- >>>>>>> >>>>>>> >>>>>>> But I first tried with a the kernel image I had in the dir, I think it >>>>>>> was this config (no KASAN): >>>>>>> https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt >>>>>>> >>>>>>> and earlycon=sbi did not change anything (no output after OpenSBI). >>>>>>> So potentially there are 2 different problems. >>>>>> >>>>>> Thanks for reporting this. Looks like I'd forgotten to add a kasan config to >>>>>> my tests. There's one in there now, and it's passing as of the fix that Nylon >>>>>> posted. >>>>> >>>>> I can boot the KASAN kernel now on riscv/fixes. >>>>> >>>>> Next problem: I've got only to: >>>>> >>>>> [ 90.498967][ T1] Run /sbin/init as init process >>>>> [ 91.164353][ T4022] init[4022]: unhandled signal 11 code 0x1 at >>>>> 0x0000000000000bb0 in busybox[10000+d7000] >>>>> [ 91.179640][ T4022] CPU: 1 PID: 4022 Comm: init Not tainted >>>>> 5.11.0-rc2-00012-g0983834a8393 #19 >>>>> [ 91.180853][ T4022] epc: 0000000000000bb0 ra : 0000003fccab09d0 sp >>>>> : 0000003fffa8c7b0 >>>>> [ 91.181861][ T4022] gp : 00000000000e8d70 tp : 0000003fccaaf820 t0 >>>>> : 000000000000001e >>>>> [ 91.182810][ T4022] t1 : 0000003fccab0bfc t2 : 000000000000000a s0 >>>>> : 0000003fffa8c850 >>>>> [ 91.183749][ T4022] s1 : 0000003fccab1070 a0 : 0000003fccab1070 a1 >>>>> : 0000003fffa8c8c8 >>>>> [ 91.184689][ T4022] a2 : 0000000000000001 a3 : 0000000000000020 a4 >>>>> : 0000000000000000 >>>>> [ 91.185620][ T4022] a5 : 0000000000000000 a6 : 0000003fcc9c4260 a7 >>>>> : fffffffffffffffe >>>>> [ 91.186566][ T4022] s2 : 0000000000000000 s3 : 0000003fffa8c8c8 s4 >>>>> : 0000003fccab1000 >>>>> [ 91.187500][ T4022] s5 : 0000003fccab1078 s6 : 0000003fffa8c8d0 s7 >>>>> : 0000000000000010 >>>>> [ 91.189672][ T4022] s8 : 0000000000000016 s9 : 0000000000000000 >>>>> s10: 0000003fffa8c8c8 >>>>> [ 91.190637][ T4022] s11: 0000000000000000 t3 : 0000000000000bb0 t4 >>>>> : 0000000000000000 >>>>> [ 91.191568][ T4022] t5 : 0000003fffa8c360 t6 : 0000000000000000 >>>>> [ 91.192389][ T4022] status: 8000000000004020 badaddr: >>>>> 0000000000000bb0 cause: 000000000000000c >>>>> [ 91.201573][ T1] Kernel panic - not syncing: Attempted to kill >>>>> init! exitcode=0x0000000b >>>>> [ 91.202906][ T1] CPU: 0 PID: 1 Comm: init Not tainted >>>>> 5.11.0-rc2-00012-g0983834a8393 #19 >>>>> [ 91.204139][ T1] Call Trace: >>>>> [ 91.204849][ T1] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0 >>>>> [ 91.206124][ T1] [<ffffffe00458b2d8>] show_stack+0x3a/0x46 >>>>> [ 91.207240][ T1] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180 >>>>> [ 91.208732][ T1] [<ffffffe00458b6a0>] panic+0x20a/0x5cc >>>>> [ 91.209890][ T1] [<ffffffe00002eea4>] do_exit+0x1846/0x1874 >>>>> [ 91.211052][ T1] [<ffffffe00002efdc>] do_group_exit+0xa0/0x192 >>>>> [ 91.212224][ T1] [<ffffffe000047d30>] get_signal+0x2d6/0x13dc >>>>> [ 91.213390][ T1] [<ffffffe000007eb0>] do_notify_resume+0xa8/0x912 >>>>> [ 91.214567][ T1] [<ffffffe00000559c>] ret_from_exception+0x0/0x14 >>>>> >>>>> The image is buildroot on 2020.11.x built with this script: >>>>> https://gist.githubusercontent.com/dvyukov/1a9a01ca2189e35175a021820c95b04d/raw/5c01d755e83f4eab0d56aa7dc84af3b2d5e80423/gistfile1.txt >>>>> >>>>> Readelf for init shows the following (is it that [10000+d7000] address >>>>> is not .text at all?): >>>>> >>>>> $ riscv64-linux-gnu-readelf --sections image/bin/busybox >>>>> There are 27 section headers, starting at offset 0xd7f20: >>>>> >>>>> Section Headers: >>>>> [Nr] Name Type Address Offset >>>>> Size EntSize Flags Link Info Align >>>>> [ 0] NULL 0000000000000000 00000000 >>>>> 0000000000000000 0000000000000000 0 0 0 >>>>> [ 1] .interp PROGBITS 0000000000010238 00000238 >>>>> 0000000000000021 0000000000000000 A 0 0 1 >>>>> [ 2] .note.ABI-tag NOTE 000000000001025c 0000025c >>>>> 0000000000000020 0000000000000000 A 0 0 4 >>>>> [ 3] .hash HASH 0000000000010280 00000280 >>>>> 00000000000009cc 0000000000000004 A 5 0 8 >>>>> [ 4] .gnu.hash GNU_HASH 0000000000010c50 00000c50 >>>>> 0000000000000ac8 0000000000000000 A 5 0 8 >>>>> [ 5] .dynsym DYNSYM 0000000000011718 00001718 >>>>> 00000000000021f0 0000000000000018 A 6 1 8 >>>>> [ 6] .dynstr STRTAB 0000000000013908 00003908 >>>>> 0000000000000c66 0000000000000000 A 0 0 1 >>>>> [ 7] .gnu.version VERSYM 000000000001456e 0000456e >>>>> 00000000000002d4 0000000000000002 A 5 0 2 >>>>> [ 8] .gnu.version_r VERNEED 0000000000014848 00004848 >>>>> 0000000000000050 0000000000000000 A 6 2 8 >>>>> [ 9] .rela.dyn RELA 0000000000014898 00004898 >>>>> 00000000000000c0 0000000000000018 A 5 0 8 >>>>> [10] .rela.plt RELA 0000000000014958 00004958 >>>>> 00000000000020a0 0000000000000018 AI 5 22 8 >>>>> [11] .plt PROGBITS 0000000000016a00 00006a00 >>>>> 00000000000015e0 0000000000000010 AX 0 0 16 >>>>> [12] .text PROGBITS 0000000000017fe0 00007fe0 >>>>> 00000000000a3668 0000000000000000 AX 0 0 4 >>>>> [13] .rodata PROGBITS 00000000000bb648 000ab648 >>>>> 000000000002b076 0000000000000000 A 0 0 8 >>>>> [14] .sdata2 PROGBITS 00000000000e66c0 000d66c0 >>>>> 0000000000000163 0000000000000000 A 0 0 8 >>>>> [15] .eh_frame_hdr PROGBITS 00000000000e6824 000d6824 >>>>> 0000000000000014 0000000000000000 A 0 0 4 >>>>> [16] .eh_frame PROGBITS 00000000000e6838 000d6838 >>>>> 000000000000002c 0000000000000000 A 0 0 8 >>>>> [17] .preinit_array PREINIT_ARRAY 00000000000e7df8 000d6df8 >>>>> 0000000000000008 0000000000000008 WA 0 0 1 >>>>> [18] .init_array INIT_ARRAY 00000000000e7e00 000d6e00 >>>>> 0000000000000008 0000000000000008 WA 0 0 8 >>>>> [19] .fini_array FINI_ARRAY 00000000000e7e08 000d6e08 >>>>> 0000000000000008 0000000000000008 WA 0 0 8 >>>>> [20] .dynamic DYNAMIC 00000000000e7e10 000d6e10 >>>>> 00000000000001f0 0000000000000010 WA 6 0 8 >>>>> [21] .data PROGBITS 00000000000e8000 000d7000 >>>>> 0000000000000240 0000000000000000 WA 0 0 8 >>>>> [22] .got PROGBITS 00000000000e8240 000d7240 >>>>> 0000000000000af8 0000000000000008 WA 0 0 8 >>>>> [23] .sdata PROGBITS 00000000000e8d38 000d7d38 >>>>> 0000000000000101 0000000000000000 WA 0 0 8 >>>>> [24] .sbss NOBITS 00000000000e8e40 000d7e39 >>>>> 000000000000017f 0000000000000000 WA 0 0 8 >>>>> [25] .bss NOBITS 00000000000e8fc0 000d7e39 >>>>> 00000000000005b0 0000000000000000 WA 0 0 8 >>>>> [26] .shstrtab STRTAB 0000000000000000 000d7e39 >>>>> 00000000000000e6 0000000000000000 0 0 1 >>>>> >>>>> >>>>> Before I spent more time on this, am I doing anything obviously wrong? >>>>> Is it a known issue? Are there any fresh working recipes? >>>> >>>> Humm.. I tried to use 2020.05 which Tobias used here: >>>> https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_riscv64-kernel.md#image >>>> But there is no make qemu_riscv64_virt_defconfig target... though I >>>> remember I tested these instructions at the time... >>> >>> Weird. `make qemu_riscv64_virt_defconfig` works here on buildroot >>> 202.05, 2020.11.1 and on latest master. >>> >>> Do you see these in your configs/ directory? >>> >>> $ ls -l configs/qemu_riscv* >>> -rw-rw-r-- 1 tklauser tklauser 673 Jan 18 15:51 configs/qemu_riscv32_virt_defconfig >>> -rw-rw-r-- 1 tklauser tklauser 682 Jan 18 15:51 configs/qemu_riscv64_virt_defconfig >> >> Oh, turned out I previously checked out 2011.05 somehow... >> Yes, 2020.05 has qemu_riscv64_virt_defconfig and I am building it now. >> 2020.11 has the config, but init crashes (see above). > > > 2020.05 is a bit better, but still failed in several ways. > First, a number of user-space services including sshd still crashed. > Second, kernel also crashed a bit later. > And 2020.11 seems to regress even more. > It's with the same kernel from the previous email (I did not rebuilt it). > > > 2020.05 buildroot: > [ 90.381218][ T1] devtmpfs: mounted > [ 90.534531][ T1] Freeing unused kernel memory: 2328K > [ 90.537085][ T1] Run /sbin/init as init process > [ 91.754610][ T4022] EXT4-fs (vda): re-mounted. Opts: (null). Quota > mode: none. > Starting syslogd: OK > Starting klogd: OK > Running sysctl: OK > Populating /dev using udev: [ 99.413418][ T4051] udevd[4051]: > starting version 3.2.9 > [ 100.480500][ T4052] udevd[4052]: starting eudev-3.2.9 > [ 101.904876][ T4052] udevd[4052]: unhandled signal 11 code 0x1 at > 0x0000000000000bb0 in udevd[10000+35000] > [ 101.911401][ T4052] CPU: 1 PID: 4052 Comm: udevd Not tainted > 5.11.0-rc2-00012-g0983834a8393 #19 > [ 101.913136][ T4052] epc: 0000000000000bb0 ra : 0000003ff5921872 sp > : 0000003fffb0c3a0 > [ 101.914593][ T4052] gp : 000000000004f908 tp : 0000003ff552b720 t0 > : 0000003ff5943160 > [ 101.915740][ T4052] t1 : 0000003ff5921bec t2 : 000000000004f450 s0 > : 0000003fffb0c440 > [ 101.916872][ T4052] s1 : 0000003ff5922000 a0 : 0000003ff5922000 a1 > : 0000003fffb0c460 > [ 101.949318][ T4052] a2 : 0000000000000001 a3 : 0000000000000002 a4 > : 0000000000000002 > [ 101.950529][ T4052] a5 : 000000000000000f a6 : 0000000000000007 a7 > : 0000000000000016 > [ 101.951653][ T4052] s2 : 0000000000000001 s3 : 0000003fffb0c460 s4 > : 0000003ff5922030 > [ 101.952771][ T4052] s5 : 0000003ff5922010 s6 : 0000000000000000 s7 > : 0000000000000000 > [ 101.953878][ T4052] s8 : 0000003ff5922004 s9 : 0000003ff5922010 > s10: 0000003ff5922008 > [ 101.955016][ T4052] s11: 0000003ff5922038 t3 : 0000000000000bb0 t4 > : 0000000000000002 > [ 101.956122][ T4052] t5 : 0000000000000002 t6 : 0000000000003d40 > [ 101.957072][ T4052] status: 8000000000004020 badaddr: > 0000000000000bb0 cause: 000000000000000c > [ 154.349233][ T4055] udevadm[4055]: unhandled signal 11 code 0x1 at > 0x0000000000000bb0 in udevadm[10000+38000] > [ 154.351201][ T4055] CPU: 0 PID: 4055 Comm: udevadm Not tainted > 5.11.0-rc2-00012-g0983834a8393 #19 > [ 154.352227][ T4055] epc: 0000000000000bb0 ra : 0000003ff2cd3872 sp > : 0000003fffe26a50 > [ 154.353136][ T4055] gp : 0000000000052808 tp : 0000003ff28dd720 t0 > : 0000003ff2cf5160 > [ 154.354047][ T4055] t1 : 0000003ff2cd3bec t2 : 0000000000052570 s0 > : 0000003fffe26af0 > [ 154.354957][ T4055] s1 : 0000003ff2cd4000 a0 : 0000003ff2cd4000 a1 > : 0000003fffe26b10 > [ 154.355860][ T4055] a2 : 000000000003d790 a3 : 0000000000000002 a4 > : 0000000000000002 > [ 154.356739][ T4055] a5 : 000000000000000f a6 : ffffffffffffffff a7 > : 0000000000000000 > [ 154.366998][ T4055] s2 : 0000000000000001 s3 : 0000003fffe26b10 s4 > : 0000003ff2cd4030 > [ 154.372223][ T4055] s5 : 0000003ff2cd4010 s6 : 0000000000000068 s7 > : 000000000003d000 > [ 154.373192][ T4055] s8 : 0000003ff2cd4004 s9 : 0000003ff2cd4010 > s10: 0000003ff2cd4008 > [ 154.374114][ T4055] s11: 0000003ff2cd4038 t3 : 0000000000000bb0 t4 > : 0000000000000002 > [ 154.375023][ T4055] t5 : 0000000000000002 t6 : 0000000000003d40 > [ 154.375793][ T4055] status: 0000000000004020 badaddr: > 0000000000000bb0 cause: 000000000000000c > Segmentation fault > udevadm settle failed > done > Saving random seed: OK > Starting network: [ 160.769276][ T4073] 8021q: adding VLAN 0 to HW > filter on device eth0 > udhcpc: started, v1.31.1 > [ 161.642968][ T4074] udhcpc[4074]: unhandled signal 11 code 0x1 at > 0x0000000000000bb0 in busybox[10000+d6000] > [ 161.645275][ T4074] CPU: 0 PID: 4074 Comm: udhcpc Not tainted > 5.11.0-rc2-00012-g0983834a8393 #19 > [ 161.646515][ T4074] epc: 0000000000000bb0 ra : 0000003fd4d43872 sp > : 0000003fffedf5c0 > [ 161.661669][ T4074] gp : 00000000000e7c90 tp : 0000003fd4d42820 t0 > : 0000003fd4d65160 > [ 161.662875][ T4074] t1 : 0000003fd4d43bec t2 : 00000000000e7960 s0 > : 0000003fffedf660 > [ 161.663979][ T4074] s1 : 0000003fd4d44000 a0 : 0000003fd4d44000 a1 > : 0000003fffedf690 > [ 161.665110][ T4074] a2 : 0000000000000019 a3 : 0000000000000002 a4 > : 0000000000000002 > [ 161.666351][ T4074] a5 : 000000000000000f a6 : fefefefefefefeff a7 > : 0000000000000040 > [ 161.668642][ T4074] s2 : 0000000000000001 s3 : 0000003fffedf690 s4 > : 0000003fd4d44030 > [ 161.669785][ T4074] s5 : 0000003fd4d44010 s6 : 00000000149d82c3 s7 > : 00000000000000fe > [ 161.670921][ T4074] s8 : 0000003fd4d44004 s9 : 0000003fd4d44010 > s10: 0000003fd4d44008 > [ 161.672150][ T4074] s11: 0000003fd4d44038 t3 : 0000000000000bb0 t4 > : 0000000000000002 > [ 161.673355][ T4074] t5 : 0000000000000002 t6 : 0000000000003d40 > [ 161.674303][ T4074] status: 8000000000004020 badaddr: > 0000000000000bb0 cause: 000000000000000c > FAIL > Starting dhcpcd... > [ 162.771471][ T4077] dhcpcd[4077]: unhandled signal 11 code 0x1 at > 0x0000000000000bb0 in dhcpcd[10000+39000] > [ 162.773414][ T4077] CPU: 0 PID: 4077 Comm: dhcpcd Not tainted > 5.11.0-rc2-00012-g0983834a8393 #19 > [ 162.774462][ T4077] epc: 0000000000000bb0 ra : 0000003fe6d12872 sp > : 0000003fff8527e0 > [ 162.775366][ T4077] gp : 000000000004adb8 tp : 0000003fe6d11250 t0 > : 0000003fe6d34160 > [ 162.776274][ T4077] t1 : 0000003fe6d12bec t2 : 0000000000049a00 s0 > : 0000003fff852880 > [ 162.777167][ T4077] s1 : 0000003fe6d13000 a0 : 0000003fe6d13000 a1 > : 0000003fff8528a0 > [ 162.779363][ T4077] a2 : 0000000000000004 a3 : 0000000000000002 a4 > : 0000000000000002 > [ 162.780279][ T4077] a5 : 000000000000000f a6 : 7efefefefefefeff a7 > : fffffffffffff000 > [ 162.781194][ T4077] s2 : 0000000000000001 s3 : 0000003fff8528a0 s4 > : 0000003fe6d13030 > [ 162.782106][ T4077] s5 : 0000003fe6d13010 s6 : 0000000000000000 s7 > : 0000000000000000 > [ 162.783015][ T4077] s8 : 0000003fe6d13004 s9 : 0000003fe6d13010 > s10: 0000003fe6d13008 > [ 162.783940][ T4077] s11: 0000003fe6d13038 t3 : 0000000000000bb0 t4 > : 0000000000000002 > [ 162.784853][ T4077] t5 : 0000000000000002 t6 : 0000000000003d40 > [ 162.785618][ T4077] status: 8000000000006020 badaddr: > 0000000000000bb0 cause: 000000000000000c > Segmentation fault > [ 164.074891][ T4079] ssh-keygen[4079]: unhandled signal 11 code 0x1 > at 0x0000000000000bb0 in ssh-keygen[2ac3c68000+63000] > [ 164.076916][ T4079] CPU: 1 PID: 4079 Comm: ssh-keygen Not tainted > 5.11.0-rc2-00012-g0983834a8393 #19 > [ 164.096635][ T4079] epc: 0000000000000bb0 ra : 0000003ff6899872 sp > : 0000003fffed1330 > [ 164.099233][ T4079] gp : 0000002ac3ccd448 tp : 0000003ff6435cd0 t0 > : 0000003ff6897000 > [ 164.100457][ T4079] t1 : 0000003ff6899bec t2 : 0000003ff6891940 s0 > : 0000003fffed13d0 > [ 164.101578][ T4079] s1 : 0000003ff689a000 a0 : 0000003ff689a000 a1 > : 0000003fffed13f8 > [ 164.102914][ T4079] a2 : 0000000000000000 a3 : 0000000000000001 a4 > : 0000000000000001 > [ 164.104058][ T4079] a5 : 000000000000000f a6 : 0000000000000000 a7 > : 00000000000000ac > [ 164.105150][ T4079] s2 : 0000000000000000 s3 : 0000003fffed13f8 s4 > : 0000003ff689a020 > [ 164.106241][ T4079] s5 : 0000003ff689a000 s6 : 0000003fd1861830 s7 > : ffffffffffffffff > [ 164.113694][ T4079] s8 : 0000003ff689a004 s9 : 0000003ff689a010 > s10: 0000003ff689a008 > [ 164.114869][ T4079] s11: 0000003ff689a028 t3 : 0000000000000bb0 t4 > : 0000000000000002 > [ 164.115972][ T4079] t5 : 0000000000000002 t6 : 0000000000003d40 > [ 164.128360][ T4079] status: 8000000000004020 badaddr: > 0000000000000bb0 cause: 000000000000000c > Segmentation fault > Starting sshd: [ 164.872315][ T4080] sshd[4080]: unhandled signal 11 > code 0x1 at 0x0000000000000bb0 in sshd[2ac7ea7000+a4000] > [ 164.874297][ T4080] CPU: 1 PID: 4080 Comm: sshd Not tainted > 5.11.0-rc2-00012-g0983834a8393 #19 > [ 164.875331][ T4080] epc: 0000000000000bb0 ra : 0000003ff2222872 sp > : 0000003fffbea300 > [ 164.876230][ T4080] gp : 0000002ac7f4f9d0 tp : 0000003ff1dbecd0 t0 > : 0000003ff2220000 > [ 164.877146][ T4080] t1 : 0000003ff2222bec t2 : 0000003ff221a940 s0 > : 0000003fffbea3a0 > [ 164.892174][ T4080] s1 : 0000003ff2223000 a0 : 0000003ff2223000 a1 > : 0000003fffbea3c8 > [ 164.893137][ T4080] a2 : 0000000000000000 a3 : 0000000000000001 a4 > : 0000000000000001 > [ 164.894065][ T4080] a5 : 000000000000000f a6 : 0000000000000000 a7 > : 00000000000000ac > [ 164.895013][ T4080] s2 : 0000000000000000 s3 : 0000003fffbea3c8 s4 > : 0000003ff2223020 > [ 164.895947][ T4080] s5 : 0000003ff2223000 s6 : 0000003fd1861830 s7 > : ffffffffffffffff > [ 164.896881][ T4080] s8 : 0000003ff2223004 s9 : 0000003ff2223010 > s10: 0000003ff2223008 > [ 164.905684][ T4080] s11: 0000003ff2223028 t3 : 0000000000000bb0 t4 > : 0000000000000002 > [ 164.906679][ T4080] t5 : 0000000000000002 t6 : 0000000000003d40 > [ 164.908565][ T4080] status: 8000000000004020 badaddr: > 0000000000000bb0 cause: 000000000000000c > Segmentation fault > OK > syzkaller > syzkaller login: [ 167.973016][ T4082] ------------[ cut here ]------------ > [ 167.975887][ T4082] virt_to_phys used for non-linear address: > 0000000059ffc026 (0xffffffd0158d105e) > [ 167.979939][ T4082] WARNING: CPU: 0 PID: 4082 at > arch/riscv/mm/physaddr.c:16 __virt_to_phys+0x74/0x78 > [ 167.988658][ T4082] Modules linked in: > [ 167.989781][ T4082] CPU: 0 PID: 4082 Comm: getty Not tainted > 5.11.0-rc2-00012-g0983834a8393 #19 > [ 167.991063][ T4082] epc: ffffffe000011164 ra : ffffffe000011164 sp > : ffffffe01354fb10 > [ 167.992243][ T4082] gp : ffffffe006234420 tp : ffffffe009c8ad80 t0 > : ffffffe006cafb67 > [ 167.993384][ T4082] t1 : 0000000000000001 t2 : 0000000000000000 s0 > : ffffffe01354fb40 > [ 167.994531][ T4082] s1 : fffffff0158d105e a0 : 000000000000004f a1 > : 00000000000f0000 > [ 167.995690][ T4082] a2 : 0000000000000002 a3 : ffffffe0000d1a30 a4 > : 763e2d90a60ec500 > [ 167.996803][ T4082] a5 : 763e2d90a60ec500 a6 : 0000000000f00000 a7 > : ffffffe00009481c > [ 167.999690][ T4082] s2 : ffffffd0158d105e s3 : 0000001fffffffff s4 > : 0000000000000001 > [ 168.000898][ T4082] s5 : ffffffd0158d105f s6 : ffffffd0158d3260 s7 > : 0000003fff81eac8 > [ 168.002093][ T4082] s8 : ffffffd0158d105e s9 : 0000000000000001 > s10: 0000000000000000 > [ 168.003226][ T4082] s11: 0000000000000000 t3 : 763e2d90a60ec500 t4 > : ffffffc4026a9efd > [ 168.004361][ T4082] t5 : ffffffc4026a9eff t6 : ffffffe01354f7f8 > [ 168.005328][ T4082] status: 0000000000000120 badaddr: > 0000000000000000 cause: 0000000000000003 > [ 168.006756][ T4082] Kernel panic - not syncing: panic_on_warn set ... > [ 168.008056][ T4082] CPU: 0 PID: 4082 Comm: getty Not tainted > 5.11.0-rc2-00012-g0983834a8393 #19 > [ 168.009301][ T4082] Call Trace: > [ 168.009969][ T4082] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0 > [ 168.011166][ T4082] [<ffffffe00458b2d8>] show_stack+0x3a/0x46 > [ 168.012215][ T4082] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180 > [ 168.013262][ T4082] [<ffffffe00458b6a0>] panic+0x20a/0x5cc > [ 168.014264][ T4082] [<ffffffe000024210>] __warn+0x110/0x20a > [ 168.015285][ T4082] [<ffffffe001759424>] report_bug+0x156/0x200 > [ 168.016324][ T4082] [<ffffffe0000093f6>] do_trap_break+0xa6/0x152 > [ 168.017431][ T4082] [<ffffffe00000559c>] ret_from_exception+0x0/0x14 > [ 168.018560][ T4082] [<ffffffe0018c97bc>] n_tty_read+0x908/0x115a > [ 168.020124][ T4082] SMP: stopping secondary CPUs > [ 168.022087][ T4082] Rebooting in 86400 seconds.. > I was fixing KASAN support for my sv48 patchset so I took a look at your issue: I built a kernel on top of the branch riscv/fixes using https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config and Buildroot 2020.11. I have the warnings regarding the use of __virt_to_phys on wrong addresses (but that's normal since this function is used in virt_addr_valid) but not the segfaults you describe. Hope that helps, Alex > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2021-01-29 7:45 ` Alex Ghiti 0 siblings, 0 replies; 55+ messages in thread From: Alex Ghiti @ 2021-01-29 7:45 UTC (permalink / raw) To: Dmitry Vyukov, Tobias Klauser Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, linux-riscv Hi Dmitry, On 1/18/21 10:43 AM, Dmitry Vyukov wrote: > On Mon, Jan 18, 2021 at 4:05 PM Dmitry Vyukov <dvyukov@google.com> wrote: >> >> On Mon, Jan 18, 2021 at 3:53 PM Tobias Klauser <tklauser@distanz.ch> wrote: >>>>> On Thu, Jan 14, 2021 at 5:57 AM Palmer Dabbelt <palmerdabbelt@google.com> wrote: >>>>>> >>>>>> On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote: >>>>>>> On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote: >>>>>>>> >>>>>>>> On Dez 25 2020, Dmitry Vyukov wrote: >>>>>>>> >>>>>>>>> qemu-system-riscv64 \ >>>>>>>>> -machine virt -bios default -smp 1 -m 2G \ >>>>>>>>> -device virtio-blk-device,drive=hd0 \ >>>>>>>>> -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \ >>>>>>>>> -kernel arch/riscv/boot/Image \ >>>>>>>>> -nographic \ >>>>>>>>> -device virtio-rng-device,rng=rng0 -object >>>>>>>>> rng-random,filename=/dev/urandom,id=rng0 \ >>>>>>>>> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device >>>>>>>>> virtio-net-device,netdev=net0 \ >>>>>>>>> -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic >>>>>>>>> panic_on_warn=1 panic=86400" >>>>>>>> >>>>>>>> Do you get more output with earlycon=sbi? >>>>>>> >>>>>>> Hi Andreas, >>>>>>> >>>>>>> For defconfig+kvm_guest.config+ scripts/config -e KASAN -e >>>>>>> KASAN_INLINE it actually gave me more output: >>>>>>> >>>>>>> >>>>>>> OpenSBI v0.7 >>>>>>> ____ _____ ____ _____ >>>>>>> / __ \ / ____| _ \_ _| >>>>>>> | | | |_ __ ___ _ __ | (___ | |_) || | >>>>>>> | | | | '_ \ / _ \ '_ \ \___ \| _ < | | >>>>>>> | |__| | |_) | __/ | | |____) | |_) || |_ >>>>>>> \____/| .__/ \___|_| |_|_____/|____/_____| >>>>>>> | | >>>>>>> |_| >>>>>>> >>>>>>> Platform Name : QEMU Virt Machine >>>>>>> Platform HART Features : RV64ACDFIMSU >>>>>>> Current Hart : 0 >>>>>>> Firmware Base : 0x80000000 >>>>>>> Firmware Size : 132 KB >>>>>>> Runtime SBI Version : 0.2 >>>>>>> >>>>>>> MIDELEG : 0x0000000000000222 >>>>>>> MEDELEG : 0x000000000000b109 >>>>>>> PMP0 : 0x0000000080000000-0x000000008003ffff (A) >>>>>>> PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X) >>>>>>> [ 0.000000] Linux version 5.10.0-01370-g71c5f03154ac >>>>>>> (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc >>>>>>> (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17 >>>>>>> SMP Fri Dec 25 18:10:12 CET 2020 >>>>>>> [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 >>>>>>> [ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '') >>>>>>> [ 0.000000] printk: bootconsole [sbi0] enabled >>>>>>> [ 0.000000] efi: UEFI not found. >>>>>>> [ 0.000000] Zone ranges: >>>>>>> [ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000ffffffff] >>>>>>> [ 0.000000] Normal empty >>>>>>> [ 0.000000] Movable zone start for each node >>>>>>> [ 0.000000] Early memory node ranges >>>>>>> [ 0.000000] node 0: [mem 0x0000000080200000-0x00000000ffffffff] >>>>>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff] >>>>>>> [ 0.000000] SBI specification v0.2 detected >>>>>>> [ 0.000000] SBI implementation ID=0x1 Version=0x7 >>>>>>> [ 0.000000] SBI v0.2 TIME extension detected >>>>>>> [ 0.000000] SBI v0.2 IPI extension detected >>>>>>> [ 0.000000] SBI v0.2 RFENCE extension detected >>>>>>> [ 0.000000] software IO TLB: mapped [mem >>>>>>> 0x00000000fa3f9000-0x00000000fe3f9000] (64MB) >>>>>>> [ 0.000000] Unable to handle kernel paging request at virtual >>>>>>> address dfffffc810040000 >>>>>>> [ 0.000000] Oops [#1] >>>>>>> [ 0.000000] Modules linked in: >>>>>>> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted >>>>>>> 5.10.0-01370-g71c5f03154ac #17 >>>>>>> [ 0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0 >>>>>>> [ 0.000000] gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 : >>>>>>> dfffffc810040000 >>>>>>> [ 0.000000] t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 : >>>>>>> ffffffe001603f50 >>>>>>> [ 0.000000] s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 : >>>>>>> 0000000000000000 >>>>>>> [ 0.000000] a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 : >>>>>>> 0000000000000000 >>>>>>> [ 0.000000] a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 : >>>>>>> 0000000000000900 >>>>>>> [ 0.000000] s2 : dfffffc820000000 s3 : dfffffc800000000 s4 : >>>>>>> 0000000000000001 >>>>>>> [ 0.000000] s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 : >>>>>>> dfffffc810040000 >>>>>>> [ 0.000000] s8 : 0000000000000080 s9 : ffffffffffffffff s10: >>>>>>> ffffffe07a119000 >>>>>>> [ 0.000000] s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 : >>>>>>> 0000000000000001 >>>>>>> [ 0.000000] t5 : ffffffc4001c150a t6 : ffffffe001603be8 >>>>>>> [ 0.000000] status: 0000000000000100 badaddr: dfffffc810040000 >>>>>>> cause: 000000000000000f >>>>>>> [ 0.000000] random: get_random_bytes called from >>>>>>> oops_exit+0x30/0x58 with crng_init=0 >>>>>>> [ 0.000000] ---[ end trace 0000000000000000 ]--- >>>>>>> [ 0.000000] Kernel panic - not syncing: Fatal exception >>>>>>> [ 0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]--- >>>>>>> >>>>>>> >>>>>>> But I first tried with a the kernel image I had in the dir, I think it >>>>>>> was this config (no KASAN): >>>>>>> https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt >>>>>>> >>>>>>> and earlycon=sbi did not change anything (no output after OpenSBI). >>>>>>> So potentially there are 2 different problems. >>>>>> >>>>>> Thanks for reporting this. Looks like I'd forgotten to add a kasan config to >>>>>> my tests. There's one in there now, and it's passing as of the fix that Nylon >>>>>> posted. >>>>> >>>>> I can boot the KASAN kernel now on riscv/fixes. >>>>> >>>>> Next problem: I've got only to: >>>>> >>>>> [ 90.498967][ T1] Run /sbin/init as init process >>>>> [ 91.164353][ T4022] init[4022]: unhandled signal 11 code 0x1 at >>>>> 0x0000000000000bb0 in busybox[10000+d7000] >>>>> [ 91.179640][ T4022] CPU: 1 PID: 4022 Comm: init Not tainted >>>>> 5.11.0-rc2-00012-g0983834a8393 #19 >>>>> [ 91.180853][ T4022] epc: 0000000000000bb0 ra : 0000003fccab09d0 sp >>>>> : 0000003fffa8c7b0 >>>>> [ 91.181861][ T4022] gp : 00000000000e8d70 tp : 0000003fccaaf820 t0 >>>>> : 000000000000001e >>>>> [ 91.182810][ T4022] t1 : 0000003fccab0bfc t2 : 000000000000000a s0 >>>>> : 0000003fffa8c850 >>>>> [ 91.183749][ T4022] s1 : 0000003fccab1070 a0 : 0000003fccab1070 a1 >>>>> : 0000003fffa8c8c8 >>>>> [ 91.184689][ T4022] a2 : 0000000000000001 a3 : 0000000000000020 a4 >>>>> : 0000000000000000 >>>>> [ 91.185620][ T4022] a5 : 0000000000000000 a6 : 0000003fcc9c4260 a7 >>>>> : fffffffffffffffe >>>>> [ 91.186566][ T4022] s2 : 0000000000000000 s3 : 0000003fffa8c8c8 s4 >>>>> : 0000003fccab1000 >>>>> [ 91.187500][ T4022] s5 : 0000003fccab1078 s6 : 0000003fffa8c8d0 s7 >>>>> : 0000000000000010 >>>>> [ 91.189672][ T4022] s8 : 0000000000000016 s9 : 0000000000000000 >>>>> s10: 0000003fffa8c8c8 >>>>> [ 91.190637][ T4022] s11: 0000000000000000 t3 : 0000000000000bb0 t4 >>>>> : 0000000000000000 >>>>> [ 91.191568][ T4022] t5 : 0000003fffa8c360 t6 : 0000000000000000 >>>>> [ 91.192389][ T4022] status: 8000000000004020 badaddr: >>>>> 0000000000000bb0 cause: 000000000000000c >>>>> [ 91.201573][ T1] Kernel panic - not syncing: Attempted to kill >>>>> init! exitcode=0x0000000b >>>>> [ 91.202906][ T1] CPU: 0 PID: 1 Comm: init Not tainted >>>>> 5.11.0-rc2-00012-g0983834a8393 #19 >>>>> [ 91.204139][ T1] Call Trace: >>>>> [ 91.204849][ T1] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0 >>>>> [ 91.206124][ T1] [<ffffffe00458b2d8>] show_stack+0x3a/0x46 >>>>> [ 91.207240][ T1] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180 >>>>> [ 91.208732][ T1] [<ffffffe00458b6a0>] panic+0x20a/0x5cc >>>>> [ 91.209890][ T1] [<ffffffe00002eea4>] do_exit+0x1846/0x1874 >>>>> [ 91.211052][ T1] [<ffffffe00002efdc>] do_group_exit+0xa0/0x192 >>>>> [ 91.212224][ T1] [<ffffffe000047d30>] get_signal+0x2d6/0x13dc >>>>> [ 91.213390][ T1] [<ffffffe000007eb0>] do_notify_resume+0xa8/0x912 >>>>> [ 91.214567][ T1] [<ffffffe00000559c>] ret_from_exception+0x0/0x14 >>>>> >>>>> The image is buildroot on 2020.11.x built with this script: >>>>> https://gist.githubusercontent.com/dvyukov/1a9a01ca2189e35175a021820c95b04d/raw/5c01d755e83f4eab0d56aa7dc84af3b2d5e80423/gistfile1.txt >>>>> >>>>> Readelf for init shows the following (is it that [10000+d7000] address >>>>> is not .text at all?): >>>>> >>>>> $ riscv64-linux-gnu-readelf --sections image/bin/busybox >>>>> There are 27 section headers, starting at offset 0xd7f20: >>>>> >>>>> Section Headers: >>>>> [Nr] Name Type Address Offset >>>>> Size EntSize Flags Link Info Align >>>>> [ 0] NULL 0000000000000000 00000000 >>>>> 0000000000000000 0000000000000000 0 0 0 >>>>> [ 1] .interp PROGBITS 0000000000010238 00000238 >>>>> 0000000000000021 0000000000000000 A 0 0 1 >>>>> [ 2] .note.ABI-tag NOTE 000000000001025c 0000025c >>>>> 0000000000000020 0000000000000000 A 0 0 4 >>>>> [ 3] .hash HASH 0000000000010280 00000280 >>>>> 00000000000009cc 0000000000000004 A 5 0 8 >>>>> [ 4] .gnu.hash GNU_HASH 0000000000010c50 00000c50 >>>>> 0000000000000ac8 0000000000000000 A 5 0 8 >>>>> [ 5] .dynsym DYNSYM 0000000000011718 00001718 >>>>> 00000000000021f0 0000000000000018 A 6 1 8 >>>>> [ 6] .dynstr STRTAB 0000000000013908 00003908 >>>>> 0000000000000c66 0000000000000000 A 0 0 1 >>>>> [ 7] .gnu.version VERSYM 000000000001456e 0000456e >>>>> 00000000000002d4 0000000000000002 A 5 0 2 >>>>> [ 8] .gnu.version_r VERNEED 0000000000014848 00004848 >>>>> 0000000000000050 0000000000000000 A 6 2 8 >>>>> [ 9] .rela.dyn RELA 0000000000014898 00004898 >>>>> 00000000000000c0 0000000000000018 A 5 0 8 >>>>> [10] .rela.plt RELA 0000000000014958 00004958 >>>>> 00000000000020a0 0000000000000018 AI 5 22 8 >>>>> [11] .plt PROGBITS 0000000000016a00 00006a00 >>>>> 00000000000015e0 0000000000000010 AX 0 0 16 >>>>> [12] .text PROGBITS 0000000000017fe0 00007fe0 >>>>> 00000000000a3668 0000000000000000 AX 0 0 4 >>>>> [13] .rodata PROGBITS 00000000000bb648 000ab648 >>>>> 000000000002b076 0000000000000000 A 0 0 8 >>>>> [14] .sdata2 PROGBITS 00000000000e66c0 000d66c0 >>>>> 0000000000000163 0000000000000000 A 0 0 8 >>>>> [15] .eh_frame_hdr PROGBITS 00000000000e6824 000d6824 >>>>> 0000000000000014 0000000000000000 A 0 0 4 >>>>> [16] .eh_frame PROGBITS 00000000000e6838 000d6838 >>>>> 000000000000002c 0000000000000000 A 0 0 8 >>>>> [17] .preinit_array PREINIT_ARRAY 00000000000e7df8 000d6df8 >>>>> 0000000000000008 0000000000000008 WA 0 0 1 >>>>> [18] .init_array INIT_ARRAY 00000000000e7e00 000d6e00 >>>>> 0000000000000008 0000000000000008 WA 0 0 8 >>>>> [19] .fini_array FINI_ARRAY 00000000000e7e08 000d6e08 >>>>> 0000000000000008 0000000000000008 WA 0 0 8 >>>>> [20] .dynamic DYNAMIC 00000000000e7e10 000d6e10 >>>>> 00000000000001f0 0000000000000010 WA 6 0 8 >>>>> [21] .data PROGBITS 00000000000e8000 000d7000 >>>>> 0000000000000240 0000000000000000 WA 0 0 8 >>>>> [22] .got PROGBITS 00000000000e8240 000d7240 >>>>> 0000000000000af8 0000000000000008 WA 0 0 8 >>>>> [23] .sdata PROGBITS 00000000000e8d38 000d7d38 >>>>> 0000000000000101 0000000000000000 WA 0 0 8 >>>>> [24] .sbss NOBITS 00000000000e8e40 000d7e39 >>>>> 000000000000017f 0000000000000000 WA 0 0 8 >>>>> [25] .bss NOBITS 00000000000e8fc0 000d7e39 >>>>> 00000000000005b0 0000000000000000 WA 0 0 8 >>>>> [26] .shstrtab STRTAB 0000000000000000 000d7e39 >>>>> 00000000000000e6 0000000000000000 0 0 1 >>>>> >>>>> >>>>> Before I spent more time on this, am I doing anything obviously wrong? >>>>> Is it a known issue? Are there any fresh working recipes? >>>> >>>> Humm.. I tried to use 2020.05 which Tobias used here: >>>> https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_riscv64-kernel.md#image >>>> But there is no make qemu_riscv64_virt_defconfig target... though I >>>> remember I tested these instructions at the time... >>> >>> Weird. `make qemu_riscv64_virt_defconfig` works here on buildroot >>> 202.05, 2020.11.1 and on latest master. >>> >>> Do you see these in your configs/ directory? >>> >>> $ ls -l configs/qemu_riscv* >>> -rw-rw-r-- 1 tklauser tklauser 673 Jan 18 15:51 configs/qemu_riscv32_virt_defconfig >>> -rw-rw-r-- 1 tklauser tklauser 682 Jan 18 15:51 configs/qemu_riscv64_virt_defconfig >> >> Oh, turned out I previously checked out 2011.05 somehow... >> Yes, 2020.05 has qemu_riscv64_virt_defconfig and I am building it now. >> 2020.11 has the config, but init crashes (see above). > > > 2020.05 is a bit better, but still failed in several ways. > First, a number of user-space services including sshd still crashed. > Second, kernel also crashed a bit later. > And 2020.11 seems to regress even more. > It's with the same kernel from the previous email (I did not rebuilt it). > > > 2020.05 buildroot: > [ 90.381218][ T1] devtmpfs: mounted > [ 90.534531][ T1] Freeing unused kernel memory: 2328K > [ 90.537085][ T1] Run /sbin/init as init process > [ 91.754610][ T4022] EXT4-fs (vda): re-mounted. Opts: (null). Quota > mode: none. > Starting syslogd: OK > Starting klogd: OK > Running sysctl: OK > Populating /dev using udev: [ 99.413418][ T4051] udevd[4051]: > starting version 3.2.9 > [ 100.480500][ T4052] udevd[4052]: starting eudev-3.2.9 > [ 101.904876][ T4052] udevd[4052]: unhandled signal 11 code 0x1 at > 0x0000000000000bb0 in udevd[10000+35000] > [ 101.911401][ T4052] CPU: 1 PID: 4052 Comm: udevd Not tainted > 5.11.0-rc2-00012-g0983834a8393 #19 > [ 101.913136][ T4052] epc: 0000000000000bb0 ra : 0000003ff5921872 sp > : 0000003fffb0c3a0 > [ 101.914593][ T4052] gp : 000000000004f908 tp : 0000003ff552b720 t0 > : 0000003ff5943160 > [ 101.915740][ T4052] t1 : 0000003ff5921bec t2 : 000000000004f450 s0 > : 0000003fffb0c440 > [ 101.916872][ T4052] s1 : 0000003ff5922000 a0 : 0000003ff5922000 a1 > : 0000003fffb0c460 > [ 101.949318][ T4052] a2 : 0000000000000001 a3 : 0000000000000002 a4 > : 0000000000000002 > [ 101.950529][ T4052] a5 : 000000000000000f a6 : 0000000000000007 a7 > : 0000000000000016 > [ 101.951653][ T4052] s2 : 0000000000000001 s3 : 0000003fffb0c460 s4 > : 0000003ff5922030 > [ 101.952771][ T4052] s5 : 0000003ff5922010 s6 : 0000000000000000 s7 > : 0000000000000000 > [ 101.953878][ T4052] s8 : 0000003ff5922004 s9 : 0000003ff5922010 > s10: 0000003ff5922008 > [ 101.955016][ T4052] s11: 0000003ff5922038 t3 : 0000000000000bb0 t4 > : 0000000000000002 > [ 101.956122][ T4052] t5 : 0000000000000002 t6 : 0000000000003d40 > [ 101.957072][ T4052] status: 8000000000004020 badaddr: > 0000000000000bb0 cause: 000000000000000c > [ 154.349233][ T4055] udevadm[4055]: unhandled signal 11 code 0x1 at > 0x0000000000000bb0 in udevadm[10000+38000] > [ 154.351201][ T4055] CPU: 0 PID: 4055 Comm: udevadm Not tainted > 5.11.0-rc2-00012-g0983834a8393 #19 > [ 154.352227][ T4055] epc: 0000000000000bb0 ra : 0000003ff2cd3872 sp > : 0000003fffe26a50 > [ 154.353136][ T4055] gp : 0000000000052808 tp : 0000003ff28dd720 t0 > : 0000003ff2cf5160 > [ 154.354047][ T4055] t1 : 0000003ff2cd3bec t2 : 0000000000052570 s0 > : 0000003fffe26af0 > [ 154.354957][ T4055] s1 : 0000003ff2cd4000 a0 : 0000003ff2cd4000 a1 > : 0000003fffe26b10 > [ 154.355860][ T4055] a2 : 000000000003d790 a3 : 0000000000000002 a4 > : 0000000000000002 > [ 154.356739][ T4055] a5 : 000000000000000f a6 : ffffffffffffffff a7 > : 0000000000000000 > [ 154.366998][ T4055] s2 : 0000000000000001 s3 : 0000003fffe26b10 s4 > : 0000003ff2cd4030 > [ 154.372223][ T4055] s5 : 0000003ff2cd4010 s6 : 0000000000000068 s7 > : 000000000003d000 > [ 154.373192][ T4055] s8 : 0000003ff2cd4004 s9 : 0000003ff2cd4010 > s10: 0000003ff2cd4008 > [ 154.374114][ T4055] s11: 0000003ff2cd4038 t3 : 0000000000000bb0 t4 > : 0000000000000002 > [ 154.375023][ T4055] t5 : 0000000000000002 t6 : 0000000000003d40 > [ 154.375793][ T4055] status: 0000000000004020 badaddr: > 0000000000000bb0 cause: 000000000000000c > Segmentation fault > udevadm settle failed > done > Saving random seed: OK > Starting network: [ 160.769276][ T4073] 8021q: adding VLAN 0 to HW > filter on device eth0 > udhcpc: started, v1.31.1 > [ 161.642968][ T4074] udhcpc[4074]: unhandled signal 11 code 0x1 at > 0x0000000000000bb0 in busybox[10000+d6000] > [ 161.645275][ T4074] CPU: 0 PID: 4074 Comm: udhcpc Not tainted > 5.11.0-rc2-00012-g0983834a8393 #19 > [ 161.646515][ T4074] epc: 0000000000000bb0 ra : 0000003fd4d43872 sp > : 0000003fffedf5c0 > [ 161.661669][ T4074] gp : 00000000000e7c90 tp : 0000003fd4d42820 t0 > : 0000003fd4d65160 > [ 161.662875][ T4074] t1 : 0000003fd4d43bec t2 : 00000000000e7960 s0 > : 0000003fffedf660 > [ 161.663979][ T4074] s1 : 0000003fd4d44000 a0 : 0000003fd4d44000 a1 > : 0000003fffedf690 > [ 161.665110][ T4074] a2 : 0000000000000019 a3 : 0000000000000002 a4 > : 0000000000000002 > [ 161.666351][ T4074] a5 : 000000000000000f a6 : fefefefefefefeff a7 > : 0000000000000040 > [ 161.668642][ T4074] s2 : 0000000000000001 s3 : 0000003fffedf690 s4 > : 0000003fd4d44030 > [ 161.669785][ T4074] s5 : 0000003fd4d44010 s6 : 00000000149d82c3 s7 > : 00000000000000fe > [ 161.670921][ T4074] s8 : 0000003fd4d44004 s9 : 0000003fd4d44010 > s10: 0000003fd4d44008 > [ 161.672150][ T4074] s11: 0000003fd4d44038 t3 : 0000000000000bb0 t4 > : 0000000000000002 > [ 161.673355][ T4074] t5 : 0000000000000002 t6 : 0000000000003d40 > [ 161.674303][ T4074] status: 8000000000004020 badaddr: > 0000000000000bb0 cause: 000000000000000c > FAIL > Starting dhcpcd... > [ 162.771471][ T4077] dhcpcd[4077]: unhandled signal 11 code 0x1 at > 0x0000000000000bb0 in dhcpcd[10000+39000] > [ 162.773414][ T4077] CPU: 0 PID: 4077 Comm: dhcpcd Not tainted > 5.11.0-rc2-00012-g0983834a8393 #19 > [ 162.774462][ T4077] epc: 0000000000000bb0 ra : 0000003fe6d12872 sp > : 0000003fff8527e0 > [ 162.775366][ T4077] gp : 000000000004adb8 tp : 0000003fe6d11250 t0 > : 0000003fe6d34160 > [ 162.776274][ T4077] t1 : 0000003fe6d12bec t2 : 0000000000049a00 s0 > : 0000003fff852880 > [ 162.777167][ T4077] s1 : 0000003fe6d13000 a0 : 0000003fe6d13000 a1 > : 0000003fff8528a0 > [ 162.779363][ T4077] a2 : 0000000000000004 a3 : 0000000000000002 a4 > : 0000000000000002 > [ 162.780279][ T4077] a5 : 000000000000000f a6 : 7efefefefefefeff a7 > : fffffffffffff000 > [ 162.781194][ T4077] s2 : 0000000000000001 s3 : 0000003fff8528a0 s4 > : 0000003fe6d13030 > [ 162.782106][ T4077] s5 : 0000003fe6d13010 s6 : 0000000000000000 s7 > : 0000000000000000 > [ 162.783015][ T4077] s8 : 0000003fe6d13004 s9 : 0000003fe6d13010 > s10: 0000003fe6d13008 > [ 162.783940][ T4077] s11: 0000003fe6d13038 t3 : 0000000000000bb0 t4 > : 0000000000000002 > [ 162.784853][ T4077] t5 : 0000000000000002 t6 : 0000000000003d40 > [ 162.785618][ T4077] status: 8000000000006020 badaddr: > 0000000000000bb0 cause: 000000000000000c > Segmentation fault > [ 164.074891][ T4079] ssh-keygen[4079]: unhandled signal 11 code 0x1 > at 0x0000000000000bb0 in ssh-keygen[2ac3c68000+63000] > [ 164.076916][ T4079] CPU: 1 PID: 4079 Comm: ssh-keygen Not tainted > 5.11.0-rc2-00012-g0983834a8393 #19 > [ 164.096635][ T4079] epc: 0000000000000bb0 ra : 0000003ff6899872 sp > : 0000003fffed1330 > [ 164.099233][ T4079] gp : 0000002ac3ccd448 tp : 0000003ff6435cd0 t0 > : 0000003ff6897000 > [ 164.100457][ T4079] t1 : 0000003ff6899bec t2 : 0000003ff6891940 s0 > : 0000003fffed13d0 > [ 164.101578][ T4079] s1 : 0000003ff689a000 a0 : 0000003ff689a000 a1 > : 0000003fffed13f8 > [ 164.102914][ T4079] a2 : 0000000000000000 a3 : 0000000000000001 a4 > : 0000000000000001 > [ 164.104058][ T4079] a5 : 000000000000000f a6 : 0000000000000000 a7 > : 00000000000000ac > [ 164.105150][ T4079] s2 : 0000000000000000 s3 : 0000003fffed13f8 s4 > : 0000003ff689a020 > [ 164.106241][ T4079] s5 : 0000003ff689a000 s6 : 0000003fd1861830 s7 > : ffffffffffffffff > [ 164.113694][ T4079] s8 : 0000003ff689a004 s9 : 0000003ff689a010 > s10: 0000003ff689a008 > [ 164.114869][ T4079] s11: 0000003ff689a028 t3 : 0000000000000bb0 t4 > : 0000000000000002 > [ 164.115972][ T4079] t5 : 0000000000000002 t6 : 0000000000003d40 > [ 164.128360][ T4079] status: 8000000000004020 badaddr: > 0000000000000bb0 cause: 000000000000000c > Segmentation fault > Starting sshd: [ 164.872315][ T4080] sshd[4080]: unhandled signal 11 > code 0x1 at 0x0000000000000bb0 in sshd[2ac7ea7000+a4000] > [ 164.874297][ T4080] CPU: 1 PID: 4080 Comm: sshd Not tainted > 5.11.0-rc2-00012-g0983834a8393 #19 > [ 164.875331][ T4080] epc: 0000000000000bb0 ra : 0000003ff2222872 sp > : 0000003fffbea300 > [ 164.876230][ T4080] gp : 0000002ac7f4f9d0 tp : 0000003ff1dbecd0 t0 > : 0000003ff2220000 > [ 164.877146][ T4080] t1 : 0000003ff2222bec t2 : 0000003ff221a940 s0 > : 0000003fffbea3a0 > [ 164.892174][ T4080] s1 : 0000003ff2223000 a0 : 0000003ff2223000 a1 > : 0000003fffbea3c8 > [ 164.893137][ T4080] a2 : 0000000000000000 a3 : 0000000000000001 a4 > : 0000000000000001 > [ 164.894065][ T4080] a5 : 000000000000000f a6 : 0000000000000000 a7 > : 00000000000000ac > [ 164.895013][ T4080] s2 : 0000000000000000 s3 : 0000003fffbea3c8 s4 > : 0000003ff2223020 > [ 164.895947][ T4080] s5 : 0000003ff2223000 s6 : 0000003fd1861830 s7 > : ffffffffffffffff > [ 164.896881][ T4080] s8 : 0000003ff2223004 s9 : 0000003ff2223010 > s10: 0000003ff2223008 > [ 164.905684][ T4080] s11: 0000003ff2223028 t3 : 0000000000000bb0 t4 > : 0000000000000002 > [ 164.906679][ T4080] t5 : 0000000000000002 t6 : 0000000000003d40 > [ 164.908565][ T4080] status: 8000000000004020 badaddr: > 0000000000000bb0 cause: 000000000000000c > Segmentation fault > OK > syzkaller > syzkaller login: [ 167.973016][ T4082] ------------[ cut here ]------------ > [ 167.975887][ T4082] virt_to_phys used for non-linear address: > 0000000059ffc026 (0xffffffd0158d105e) > [ 167.979939][ T4082] WARNING: CPU: 0 PID: 4082 at > arch/riscv/mm/physaddr.c:16 __virt_to_phys+0x74/0x78 > [ 167.988658][ T4082] Modules linked in: > [ 167.989781][ T4082] CPU: 0 PID: 4082 Comm: getty Not tainted > 5.11.0-rc2-00012-g0983834a8393 #19 > [ 167.991063][ T4082] epc: ffffffe000011164 ra : ffffffe000011164 sp > : ffffffe01354fb10 > [ 167.992243][ T4082] gp : ffffffe006234420 tp : ffffffe009c8ad80 t0 > : ffffffe006cafb67 > [ 167.993384][ T4082] t1 : 0000000000000001 t2 : 0000000000000000 s0 > : ffffffe01354fb40 > [ 167.994531][ T4082] s1 : fffffff0158d105e a0 : 000000000000004f a1 > : 00000000000f0000 > [ 167.995690][ T4082] a2 : 0000000000000002 a3 : ffffffe0000d1a30 a4 > : 763e2d90a60ec500 > [ 167.996803][ T4082] a5 : 763e2d90a60ec500 a6 : 0000000000f00000 a7 > : ffffffe00009481c > [ 167.999690][ T4082] s2 : ffffffd0158d105e s3 : 0000001fffffffff s4 > : 0000000000000001 > [ 168.000898][ T4082] s5 : ffffffd0158d105f s6 : ffffffd0158d3260 s7 > : 0000003fff81eac8 > [ 168.002093][ T4082] s8 : ffffffd0158d105e s9 : 0000000000000001 > s10: 0000000000000000 > [ 168.003226][ T4082] s11: 0000000000000000 t3 : 763e2d90a60ec500 t4 > : ffffffc4026a9efd > [ 168.004361][ T4082] t5 : ffffffc4026a9eff t6 : ffffffe01354f7f8 > [ 168.005328][ T4082] status: 0000000000000120 badaddr: > 0000000000000000 cause: 0000000000000003 > [ 168.006756][ T4082] Kernel panic - not syncing: panic_on_warn set ... > [ 168.008056][ T4082] CPU: 0 PID: 4082 Comm: getty Not tainted > 5.11.0-rc2-00012-g0983834a8393 #19 > [ 168.009301][ T4082] Call Trace: > [ 168.009969][ T4082] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0 > [ 168.011166][ T4082] [<ffffffe00458b2d8>] show_stack+0x3a/0x46 > [ 168.012215][ T4082] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180 > [ 168.013262][ T4082] [<ffffffe00458b6a0>] panic+0x20a/0x5cc > [ 168.014264][ T4082] [<ffffffe000024210>] __warn+0x110/0x20a > [ 168.015285][ T4082] [<ffffffe001759424>] report_bug+0x156/0x200 > [ 168.016324][ T4082] [<ffffffe0000093f6>] do_trap_break+0xa6/0x152 > [ 168.017431][ T4082] [<ffffffe00000559c>] ret_from_exception+0x0/0x14 > [ 168.018560][ T4082] [<ffffffe0018c97bc>] n_tty_read+0x908/0x115a > [ 168.020124][ T4082] SMP: stopping secondary CPUs > [ 168.022087][ T4082] Rebooting in 86400 seconds.. > I was fixing KASAN support for my sv48 patchset so I took a look at your issue: I built a kernel on top of the branch riscv/fixes using https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config and Buildroot 2020.11. I have the warnings regarding the use of __virt_to_phys on wrong addresses (but that's normal since this function is used in virt_addr_valid) but not the segfaults you describe. Hope that helps, Alex > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv > _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2021-01-29 7:45 ` Alex Ghiti (?) @ 2021-01-29 8:11 ` Dmitry Vyukov 2021-02-16 11:17 ` Dmitry Vyukov -1 siblings, 1 reply; 55+ messages in thread From: Dmitry Vyukov @ 2021-01-29 8:11 UTC (permalink / raw) To: Alex Ghiti Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv On Fri, Jan 29, 2021 at 8:45 AM Alex Ghiti <alex@ghiti.fr> wrote: > > Hi Dmitry, > > On 1/18/21 10:43 AM, Dmitry Vyukov wrote: > > On Mon, Jan 18, 2021 at 4:05 PM Dmitry Vyukov <dvyukov@google.com> wrote: > >> > >> On Mon, Jan 18, 2021 at 3:53 PM Tobias Klauser <tklauser@distanz.ch> wrote: > >>>>> On Thu, Jan 14, 2021 at 5:57 AM Palmer Dabbelt <palmerdabbelt@google.com> wrote: > >>>>>> > >>>>>> On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote: > >>>>>>> On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote: > >>>>>>>> > >>>>>>>> On Dez 25 2020, Dmitry Vyukov wrote: > >>>>>>>> > >>>>>>>>> qemu-system-riscv64 \ > >>>>>>>>> -machine virt -bios default -smp 1 -m 2G \ > >>>>>>>>> -device virtio-blk-device,drive=hd0 \ > >>>>>>>>> -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \ > >>>>>>>>> -kernel arch/riscv/boot/Image \ > >>>>>>>>> -nographic \ > >>>>>>>>> -device virtio-rng-device,rng=rng0 -object > >>>>>>>>> rng-random,filename=/dev/urandom,id=rng0 \ > >>>>>>>>> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > >>>>>>>>> virtio-net-device,netdev=net0 \ > >>>>>>>>> -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > >>>>>>>>> panic_on_warn=1 panic=86400" > >>>>>>>> > >>>>>>>> Do you get more output with earlycon=sbi? > >>>>>>> > >>>>>>> Hi Andreas, > >>>>>>> > >>>>>>> For defconfig+kvm_guest.config+ scripts/config -e KASAN -e > >>>>>>> KASAN_INLINE it actually gave me more output: > >>>>>>> > >>>>>>> > >>>>>>> OpenSBI v0.7 > >>>>>>> ____ _____ ____ _____ > >>>>>>> / __ \ / ____| _ \_ _| > >>>>>>> | | | |_ __ ___ _ __ | (___ | |_) || | > >>>>>>> | | | | '_ \ / _ \ '_ \ \___ \| _ < | | > >>>>>>> | |__| | |_) | __/ | | |____) | |_) || |_ > >>>>>>> \____/| .__/ \___|_| |_|_____/|____/_____| > >>>>>>> | | > >>>>>>> |_| > >>>>>>> > >>>>>>> Platform Name : QEMU Virt Machine > >>>>>>> Platform HART Features : RV64ACDFIMSU > >>>>>>> Current Hart : 0 > >>>>>>> Firmware Base : 0x80000000 > >>>>>>> Firmware Size : 132 KB > >>>>>>> Runtime SBI Version : 0.2 > >>>>>>> > >>>>>>> MIDELEG : 0x0000000000000222 > >>>>>>> MEDELEG : 0x000000000000b109 > >>>>>>> PMP0 : 0x0000000080000000-0x000000008003ffff (A) > >>>>>>> PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X) > >>>>>>> [ 0.000000] Linux version 5.10.0-01370-g71c5f03154ac > >>>>>>> (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc > >>>>>>> (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17 > >>>>>>> SMP Fri Dec 25 18:10:12 CET 2020 > >>>>>>> [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 > >>>>>>> [ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '') > >>>>>>> [ 0.000000] printk: bootconsole [sbi0] enabled > >>>>>>> [ 0.000000] efi: UEFI not found. > >>>>>>> [ 0.000000] Zone ranges: > >>>>>>> [ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000ffffffff] > >>>>>>> [ 0.000000] Normal empty > >>>>>>> [ 0.000000] Movable zone start for each node > >>>>>>> [ 0.000000] Early memory node ranges > >>>>>>> [ 0.000000] node 0: [mem 0x0000000080200000-0x00000000ffffffff] > >>>>>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff] > >>>>>>> [ 0.000000] SBI specification v0.2 detected > >>>>>>> [ 0.000000] SBI implementation ID=0x1 Version=0x7 > >>>>>>> [ 0.000000] SBI v0.2 TIME extension detected > >>>>>>> [ 0.000000] SBI v0.2 IPI extension detected > >>>>>>> [ 0.000000] SBI v0.2 RFENCE extension detected > >>>>>>> [ 0.000000] software IO TLB: mapped [mem > >>>>>>> 0x00000000fa3f9000-0x00000000fe3f9000] (64MB) > >>>>>>> [ 0.000000] Unable to handle kernel paging request at virtual > >>>>>>> address dfffffc810040000 > >>>>>>> [ 0.000000] Oops [#1] > >>>>>>> [ 0.000000] Modules linked in: > >>>>>>> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted > >>>>>>> 5.10.0-01370-g71c5f03154ac #17 > >>>>>>> [ 0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0 > >>>>>>> [ 0.000000] gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 : > >>>>>>> dfffffc810040000 > >>>>>>> [ 0.000000] t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 : > >>>>>>> ffffffe001603f50 > >>>>>>> [ 0.000000] s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 : > >>>>>>> 0000000000000000 > >>>>>>> [ 0.000000] a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 : > >>>>>>> 0000000000000000 > >>>>>>> [ 0.000000] a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 : > >>>>>>> 0000000000000900 > >>>>>>> [ 0.000000] s2 : dfffffc820000000 s3 : dfffffc800000000 s4 : > >>>>>>> 0000000000000001 > >>>>>>> [ 0.000000] s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 : > >>>>>>> dfffffc810040000 > >>>>>>> [ 0.000000] s8 : 0000000000000080 s9 : ffffffffffffffff s10: > >>>>>>> ffffffe07a119000 > >>>>>>> [ 0.000000] s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 : > >>>>>>> 0000000000000001 > >>>>>>> [ 0.000000] t5 : ffffffc4001c150a t6 : ffffffe001603be8 > >>>>>>> [ 0.000000] status: 0000000000000100 badaddr: dfffffc810040000 > >>>>>>> cause: 000000000000000f > >>>>>>> [ 0.000000] random: get_random_bytes called from > >>>>>>> oops_exit+0x30/0x58 with crng_init=0 > >>>>>>> [ 0.000000] ---[ end trace 0000000000000000 ]--- > >>>>>>> [ 0.000000] Kernel panic - not syncing: Fatal exception > >>>>>>> [ 0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]--- > >>>>>>> > >>>>>>> > >>>>>>> But I first tried with a the kernel image I had in the dir, I think it > >>>>>>> was this config (no KASAN): > >>>>>>> https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt > >>>>>>> > >>>>>>> and earlycon=sbi did not change anything (no output after OpenSBI). > >>>>>>> So potentially there are 2 different problems. > >>>>>> > >>>>>> Thanks for reporting this. Looks like I'd forgotten to add a kasan config to > >>>>>> my tests. There's one in there now, and it's passing as of the fix that Nylon > >>>>>> posted. > >>>>> > >>>>> I can boot the KASAN kernel now on riscv/fixes. > >>>>> > >>>>> Next problem: I've got only to: > >>>>> > >>>>> [ 90.498967][ T1] Run /sbin/init as init process > >>>>> [ 91.164353][ T4022] init[4022]: unhandled signal 11 code 0x1 at > >>>>> 0x0000000000000bb0 in busybox[10000+d7000] > >>>>> [ 91.179640][ T4022] CPU: 1 PID: 4022 Comm: init Not tainted > >>>>> 5.11.0-rc2-00012-g0983834a8393 #19 > >>>>> [ 91.180853][ T4022] epc: 0000000000000bb0 ra : 0000003fccab09d0 sp > >>>>> : 0000003fffa8c7b0 > >>>>> [ 91.181861][ T4022] gp : 00000000000e8d70 tp : 0000003fccaaf820 t0 > >>>>> : 000000000000001e > >>>>> [ 91.182810][ T4022] t1 : 0000003fccab0bfc t2 : 000000000000000a s0 > >>>>> : 0000003fffa8c850 > >>>>> [ 91.183749][ T4022] s1 : 0000003fccab1070 a0 : 0000003fccab1070 a1 > >>>>> : 0000003fffa8c8c8 > >>>>> [ 91.184689][ T4022] a2 : 0000000000000001 a3 : 0000000000000020 a4 > >>>>> : 0000000000000000 > >>>>> [ 91.185620][ T4022] a5 : 0000000000000000 a6 : 0000003fcc9c4260 a7 > >>>>> : fffffffffffffffe > >>>>> [ 91.186566][ T4022] s2 : 0000000000000000 s3 : 0000003fffa8c8c8 s4 > >>>>> : 0000003fccab1000 > >>>>> [ 91.187500][ T4022] s5 : 0000003fccab1078 s6 : 0000003fffa8c8d0 s7 > >>>>> : 0000000000000010 > >>>>> [ 91.189672][ T4022] s8 : 0000000000000016 s9 : 0000000000000000 > >>>>> s10: 0000003fffa8c8c8 > >>>>> [ 91.190637][ T4022] s11: 0000000000000000 t3 : 0000000000000bb0 t4 > >>>>> : 0000000000000000 > >>>>> [ 91.191568][ T4022] t5 : 0000003fffa8c360 t6 : 0000000000000000 > >>>>> [ 91.192389][ T4022] status: 8000000000004020 badaddr: > >>>>> 0000000000000bb0 cause: 000000000000000c > >>>>> [ 91.201573][ T1] Kernel panic - not syncing: Attempted to kill > >>>>> init! exitcode=0x0000000b > >>>>> [ 91.202906][ T1] CPU: 0 PID: 1 Comm: init Not tainted > >>>>> 5.11.0-rc2-00012-g0983834a8393 #19 > >>>>> [ 91.204139][ T1] Call Trace: > >>>>> [ 91.204849][ T1] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0 > >>>>> [ 91.206124][ T1] [<ffffffe00458b2d8>] show_stack+0x3a/0x46 > >>>>> [ 91.207240][ T1] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180 > >>>>> [ 91.208732][ T1] [<ffffffe00458b6a0>] panic+0x20a/0x5cc > >>>>> [ 91.209890][ T1] [<ffffffe00002eea4>] do_exit+0x1846/0x1874 > >>>>> [ 91.211052][ T1] [<ffffffe00002efdc>] do_group_exit+0xa0/0x192 > >>>>> [ 91.212224][ T1] [<ffffffe000047d30>] get_signal+0x2d6/0x13dc > >>>>> [ 91.213390][ T1] [<ffffffe000007eb0>] do_notify_resume+0xa8/0x912 > >>>>> [ 91.214567][ T1] [<ffffffe00000559c>] ret_from_exception+0x0/0x14 > >>>>> > >>>>> The image is buildroot on 2020.11.x built with this script: > >>>>> https://gist.githubusercontent.com/dvyukov/1a9a01ca2189e35175a021820c95b04d/raw/5c01d755e83f4eab0d56aa7dc84af3b2d5e80423/gistfile1.txt > >>>>> > >>>>> Readelf for init shows the following (is it that [10000+d7000] address > >>>>> is not .text at all?): > >>>>> > >>>>> $ riscv64-linux-gnu-readelf --sections image/bin/busybox > >>>>> There are 27 section headers, starting at offset 0xd7f20: > >>>>> > >>>>> Section Headers: > >>>>> [Nr] Name Type Address Offset > >>>>> Size EntSize Flags Link Info Align > >>>>> [ 0] NULL 0000000000000000 00000000 > >>>>> 0000000000000000 0000000000000000 0 0 0 > >>>>> [ 1] .interp PROGBITS 0000000000010238 00000238 > >>>>> 0000000000000021 0000000000000000 A 0 0 1 > >>>>> [ 2] .note.ABI-tag NOTE 000000000001025c 0000025c > >>>>> 0000000000000020 0000000000000000 A 0 0 4 > >>>>> [ 3] .hash HASH 0000000000010280 00000280 > >>>>> 00000000000009cc 0000000000000004 A 5 0 8 > >>>>> [ 4] .gnu.hash GNU_HASH 0000000000010c50 00000c50 > >>>>> 0000000000000ac8 0000000000000000 A 5 0 8 > >>>>> [ 5] .dynsym DYNSYM 0000000000011718 00001718 > >>>>> 00000000000021f0 0000000000000018 A 6 1 8 > >>>>> [ 6] .dynstr STRTAB 0000000000013908 00003908 > >>>>> 0000000000000c66 0000000000000000 A 0 0 1 > >>>>> [ 7] .gnu.version VERSYM 000000000001456e 0000456e > >>>>> 00000000000002d4 0000000000000002 A 5 0 2 > >>>>> [ 8] .gnu.version_r VERNEED 0000000000014848 00004848 > >>>>> 0000000000000050 0000000000000000 A 6 2 8 > >>>>> [ 9] .rela.dyn RELA 0000000000014898 00004898 > >>>>> 00000000000000c0 0000000000000018 A 5 0 8 > >>>>> [10] .rela.plt RELA 0000000000014958 00004958 > >>>>> 00000000000020a0 0000000000000018 AI 5 22 8 > >>>>> [11] .plt PROGBITS 0000000000016a00 00006a00 > >>>>> 00000000000015e0 0000000000000010 AX 0 0 16 > >>>>> [12] .text PROGBITS 0000000000017fe0 00007fe0 > >>>>> 00000000000a3668 0000000000000000 AX 0 0 4 > >>>>> [13] .rodata PROGBITS 00000000000bb648 000ab648 > >>>>> 000000000002b076 0000000000000000 A 0 0 8 > >>>>> [14] .sdata2 PROGBITS 00000000000e66c0 000d66c0 > >>>>> 0000000000000163 0000000000000000 A 0 0 8 > >>>>> [15] .eh_frame_hdr PROGBITS 00000000000e6824 000d6824 > >>>>> 0000000000000014 0000000000000000 A 0 0 4 > >>>>> [16] .eh_frame PROGBITS 00000000000e6838 000d6838 > >>>>> 000000000000002c 0000000000000000 A 0 0 8 > >>>>> [17] .preinit_array PREINIT_ARRAY 00000000000e7df8 000d6df8 > >>>>> 0000000000000008 0000000000000008 WA 0 0 1 > >>>>> [18] .init_array INIT_ARRAY 00000000000e7e00 000d6e00 > >>>>> 0000000000000008 0000000000000008 WA 0 0 8 > >>>>> [19] .fini_array FINI_ARRAY 00000000000e7e08 000d6e08 > >>>>> 0000000000000008 0000000000000008 WA 0 0 8 > >>>>> [20] .dynamic DYNAMIC 00000000000e7e10 000d6e10 > >>>>> 00000000000001f0 0000000000000010 WA 6 0 8 > >>>>> [21] .data PROGBITS 00000000000e8000 000d7000 > >>>>> 0000000000000240 0000000000000000 WA 0 0 8 > >>>>> [22] .got PROGBITS 00000000000e8240 000d7240 > >>>>> 0000000000000af8 0000000000000008 WA 0 0 8 > >>>>> [23] .sdata PROGBITS 00000000000e8d38 000d7d38 > >>>>> 0000000000000101 0000000000000000 WA 0 0 8 > >>>>> [24] .sbss NOBITS 00000000000e8e40 000d7e39 > >>>>> 000000000000017f 0000000000000000 WA 0 0 8 > >>>>> [25] .bss NOBITS 00000000000e8fc0 000d7e39 > >>>>> 00000000000005b0 0000000000000000 WA 0 0 8 > >>>>> [26] .shstrtab STRTAB 0000000000000000 000d7e39 > >>>>> 00000000000000e6 0000000000000000 0 0 1 > >>>>> > >>>>> > >>>>> Before I spent more time on this, am I doing anything obviously wrong? > >>>>> Is it a known issue? Are there any fresh working recipes? > >>>> > >>>> Humm.. I tried to use 2020.05 which Tobias used here: > >>>> https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_riscv64-kernel.md#image > >>>> But there is no make qemu_riscv64_virt_defconfig target... though I > >>>> remember I tested these instructions at the time... > >>> > >>> Weird. `make qemu_riscv64_virt_defconfig` works here on buildroot > >>> 202.05, 2020.11.1 and on latest master. > >>> > >>> Do you see these in your configs/ directory? > >>> > >>> $ ls -l configs/qemu_riscv* > >>> -rw-rw-r-- 1 tklauser tklauser 673 Jan 18 15:51 configs/qemu_riscv32_virt_defconfig > >>> -rw-rw-r-- 1 tklauser tklauser 682 Jan 18 15:51 configs/qemu_riscv64_virt_defconfig > >> > >> Oh, turned out I previously checked out 2011.05 somehow... > >> Yes, 2020.05 has qemu_riscv64_virt_defconfig and I am building it now. > >> 2020.11 has the config, but init crashes (see above). > > > > > > 2020.05 is a bit better, but still failed in several ways. > > First, a number of user-space services including sshd still crashed. > > Second, kernel also crashed a bit later. > > And 2020.11 seems to regress even more. > > It's with the same kernel from the previous email (I did not rebuilt it). > > > > > > 2020.05 buildroot: > > [ 90.381218][ T1] devtmpfs: mounted > > [ 90.534531][ T1] Freeing unused kernel memory: 2328K > > [ 90.537085][ T1] Run /sbin/init as init process > > [ 91.754610][ T4022] EXT4-fs (vda): re-mounted. Opts: (null). Quota > > mode: none. > > Starting syslogd: OK > > Starting klogd: OK > > Running sysctl: OK > > Populating /dev using udev: [ 99.413418][ T4051] udevd[4051]: > > starting version 3.2.9 > > [ 100.480500][ T4052] udevd[4052]: starting eudev-3.2.9 > > [ 101.904876][ T4052] udevd[4052]: unhandled signal 11 code 0x1 at > > 0x0000000000000bb0 in udevd[10000+35000] > > [ 101.911401][ T4052] CPU: 1 PID: 4052 Comm: udevd Not tainted > > 5.11.0-rc2-00012-g0983834a8393 #19 > > [ 101.913136][ T4052] epc: 0000000000000bb0 ra : 0000003ff5921872 sp > > : 0000003fffb0c3a0 > > [ 101.914593][ T4052] gp : 000000000004f908 tp : 0000003ff552b720 t0 > > : 0000003ff5943160 > > [ 101.915740][ T4052] t1 : 0000003ff5921bec t2 : 000000000004f450 s0 > > : 0000003fffb0c440 > > [ 101.916872][ T4052] s1 : 0000003ff5922000 a0 : 0000003ff5922000 a1 > > : 0000003fffb0c460 > > [ 101.949318][ T4052] a2 : 0000000000000001 a3 : 0000000000000002 a4 > > : 0000000000000002 > > [ 101.950529][ T4052] a5 : 000000000000000f a6 : 0000000000000007 a7 > > : 0000000000000016 > > [ 101.951653][ T4052] s2 : 0000000000000001 s3 : 0000003fffb0c460 s4 > > : 0000003ff5922030 > > [ 101.952771][ T4052] s5 : 0000003ff5922010 s6 : 0000000000000000 s7 > > : 0000000000000000 > > [ 101.953878][ T4052] s8 : 0000003ff5922004 s9 : 0000003ff5922010 > > s10: 0000003ff5922008 > > [ 101.955016][ T4052] s11: 0000003ff5922038 t3 : 0000000000000bb0 t4 > > : 0000000000000002 > > [ 101.956122][ T4052] t5 : 0000000000000002 t6 : 0000000000003d40 > > [ 101.957072][ T4052] status: 8000000000004020 badaddr: > > 0000000000000bb0 cause: 000000000000000c > > [ 154.349233][ T4055] udevadm[4055]: unhandled signal 11 code 0x1 at > > 0x0000000000000bb0 in udevadm[10000+38000] > > [ 154.351201][ T4055] CPU: 0 PID: 4055 Comm: udevadm Not tainted > > 5.11.0-rc2-00012-g0983834a8393 #19 > > [ 154.352227][ T4055] epc: 0000000000000bb0 ra : 0000003ff2cd3872 sp > > : 0000003fffe26a50 > > [ 154.353136][ T4055] gp : 0000000000052808 tp : 0000003ff28dd720 t0 > > : 0000003ff2cf5160 > > [ 154.354047][ T4055] t1 : 0000003ff2cd3bec t2 : 0000000000052570 s0 > > : 0000003fffe26af0 > > [ 154.354957][ T4055] s1 : 0000003ff2cd4000 a0 : 0000003ff2cd4000 a1 > > : 0000003fffe26b10 > > [ 154.355860][ T4055] a2 : 000000000003d790 a3 : 0000000000000002 a4 > > : 0000000000000002 > > [ 154.356739][ T4055] a5 : 000000000000000f a6 : ffffffffffffffff a7 > > : 0000000000000000 > > [ 154.366998][ T4055] s2 : 0000000000000001 s3 : 0000003fffe26b10 s4 > > : 0000003ff2cd4030 > > [ 154.372223][ T4055] s5 : 0000003ff2cd4010 s6 : 0000000000000068 s7 > > : 000000000003d000 > > [ 154.373192][ T4055] s8 : 0000003ff2cd4004 s9 : 0000003ff2cd4010 > > s10: 0000003ff2cd4008 > > [ 154.374114][ T4055] s11: 0000003ff2cd4038 t3 : 0000000000000bb0 t4 > > : 0000000000000002 > > [ 154.375023][ T4055] t5 : 0000000000000002 t6 : 0000000000003d40 > > [ 154.375793][ T4055] status: 0000000000004020 badaddr: > > 0000000000000bb0 cause: 000000000000000c > > Segmentation fault > > udevadm settle failed > > done > > Saving random seed: OK > > Starting network: [ 160.769276][ T4073] 8021q: adding VLAN 0 to HW > > filter on device eth0 > > udhcpc: started, v1.31.1 > > [ 161.642968][ T4074] udhcpc[4074]: unhandled signal 11 code 0x1 at > > 0x0000000000000bb0 in busybox[10000+d6000] > > [ 161.645275][ T4074] CPU: 0 PID: 4074 Comm: udhcpc Not tainted > > 5.11.0-rc2-00012-g0983834a8393 #19 > > [ 161.646515][ T4074] epc: 0000000000000bb0 ra : 0000003fd4d43872 sp > > : 0000003fffedf5c0 > > [ 161.661669][ T4074] gp : 00000000000e7c90 tp : 0000003fd4d42820 t0 > > : 0000003fd4d65160 > > [ 161.662875][ T4074] t1 : 0000003fd4d43bec t2 : 00000000000e7960 s0 > > : 0000003fffedf660 > > [ 161.663979][ T4074] s1 : 0000003fd4d44000 a0 : 0000003fd4d44000 a1 > > : 0000003fffedf690 > > [ 161.665110][ T4074] a2 : 0000000000000019 a3 : 0000000000000002 a4 > > : 0000000000000002 > > [ 161.666351][ T4074] a5 : 000000000000000f a6 : fefefefefefefeff a7 > > : 0000000000000040 > > [ 161.668642][ T4074] s2 : 0000000000000001 s3 : 0000003fffedf690 s4 > > : 0000003fd4d44030 > > [ 161.669785][ T4074] s5 : 0000003fd4d44010 s6 : 00000000149d82c3 s7 > > : 00000000000000fe > > [ 161.670921][ T4074] s8 : 0000003fd4d44004 s9 : 0000003fd4d44010 > > s10: 0000003fd4d44008 > > [ 161.672150][ T4074] s11: 0000003fd4d44038 t3 : 0000000000000bb0 t4 > > : 0000000000000002 > > [ 161.673355][ T4074] t5 : 0000000000000002 t6 : 0000000000003d40 > > [ 161.674303][ T4074] status: 8000000000004020 badaddr: > > 0000000000000bb0 cause: 000000000000000c > > FAIL > > Starting dhcpcd... > > [ 162.771471][ T4077] dhcpcd[4077]: unhandled signal 11 code 0x1 at > > 0x0000000000000bb0 in dhcpcd[10000+39000] > > [ 162.773414][ T4077] CPU: 0 PID: 4077 Comm: dhcpcd Not tainted > > 5.11.0-rc2-00012-g0983834a8393 #19 > > [ 162.774462][ T4077] epc: 0000000000000bb0 ra : 0000003fe6d12872 sp > > : 0000003fff8527e0 > > [ 162.775366][ T4077] gp : 000000000004adb8 tp : 0000003fe6d11250 t0 > > : 0000003fe6d34160 > > [ 162.776274][ T4077] t1 : 0000003fe6d12bec t2 : 0000000000049a00 s0 > > : 0000003fff852880 > > [ 162.777167][ T4077] s1 : 0000003fe6d13000 a0 : 0000003fe6d13000 a1 > > : 0000003fff8528a0 > > [ 162.779363][ T4077] a2 : 0000000000000004 a3 : 0000000000000002 a4 > > : 0000000000000002 > > [ 162.780279][ T4077] a5 : 000000000000000f a6 : 7efefefefefefeff a7 > > : fffffffffffff000 > > [ 162.781194][ T4077] s2 : 0000000000000001 s3 : 0000003fff8528a0 s4 > > : 0000003fe6d13030 > > [ 162.782106][ T4077] s5 : 0000003fe6d13010 s6 : 0000000000000000 s7 > > : 0000000000000000 > > [ 162.783015][ T4077] s8 : 0000003fe6d13004 s9 : 0000003fe6d13010 > > s10: 0000003fe6d13008 > > [ 162.783940][ T4077] s11: 0000003fe6d13038 t3 : 0000000000000bb0 t4 > > : 0000000000000002 > > [ 162.784853][ T4077] t5 : 0000000000000002 t6 : 0000000000003d40 > > [ 162.785618][ T4077] status: 8000000000006020 badaddr: > > 0000000000000bb0 cause: 000000000000000c > > Segmentation fault > > [ 164.074891][ T4079] ssh-keygen[4079]: unhandled signal 11 code 0x1 > > at 0x0000000000000bb0 in ssh-keygen[2ac3c68000+63000] > > [ 164.076916][ T4079] CPU: 1 PID: 4079 Comm: ssh-keygen Not tainted > > 5.11.0-rc2-00012-g0983834a8393 #19 > > [ 164.096635][ T4079] epc: 0000000000000bb0 ra : 0000003ff6899872 sp > > : 0000003fffed1330 > > [ 164.099233][ T4079] gp : 0000002ac3ccd448 tp : 0000003ff6435cd0 t0 > > : 0000003ff6897000 > > [ 164.100457][ T4079] t1 : 0000003ff6899bec t2 : 0000003ff6891940 s0 > > : 0000003fffed13d0 > > [ 164.101578][ T4079] s1 : 0000003ff689a000 a0 : 0000003ff689a000 a1 > > : 0000003fffed13f8 > > [ 164.102914][ T4079] a2 : 0000000000000000 a3 : 0000000000000001 a4 > > : 0000000000000001 > > [ 164.104058][ T4079] a5 : 000000000000000f a6 : 0000000000000000 a7 > > : 00000000000000ac > > [ 164.105150][ T4079] s2 : 0000000000000000 s3 : 0000003fffed13f8 s4 > > : 0000003ff689a020 > > [ 164.106241][ T4079] s5 : 0000003ff689a000 s6 : 0000003fd1861830 s7 > > : ffffffffffffffff > > [ 164.113694][ T4079] s8 : 0000003ff689a004 s9 : 0000003ff689a010 > > s10: 0000003ff689a008 > > [ 164.114869][ T4079] s11: 0000003ff689a028 t3 : 0000000000000bb0 t4 > > : 0000000000000002 > > [ 164.115972][ T4079] t5 : 0000000000000002 t6 : 0000000000003d40 > > [ 164.128360][ T4079] status: 8000000000004020 badaddr: > > 0000000000000bb0 cause: 000000000000000c > > Segmentation fault > > Starting sshd: [ 164.872315][ T4080] sshd[4080]: unhandled signal 11 > > code 0x1 at 0x0000000000000bb0 in sshd[2ac7ea7000+a4000] > > [ 164.874297][ T4080] CPU: 1 PID: 4080 Comm: sshd Not tainted > > 5.11.0-rc2-00012-g0983834a8393 #19 > > [ 164.875331][ T4080] epc: 0000000000000bb0 ra : 0000003ff2222872 sp > > : 0000003fffbea300 > > [ 164.876230][ T4080] gp : 0000002ac7f4f9d0 tp : 0000003ff1dbecd0 t0 > > : 0000003ff2220000 > > [ 164.877146][ T4080] t1 : 0000003ff2222bec t2 : 0000003ff221a940 s0 > > : 0000003fffbea3a0 > > [ 164.892174][ T4080] s1 : 0000003ff2223000 a0 : 0000003ff2223000 a1 > > : 0000003fffbea3c8 > > [ 164.893137][ T4080] a2 : 0000000000000000 a3 : 0000000000000001 a4 > > : 0000000000000001 > > [ 164.894065][ T4080] a5 : 000000000000000f a6 : 0000000000000000 a7 > > : 00000000000000ac > > [ 164.895013][ T4080] s2 : 0000000000000000 s3 : 0000003fffbea3c8 s4 > > : 0000003ff2223020 > > [ 164.895947][ T4080] s5 : 0000003ff2223000 s6 : 0000003fd1861830 s7 > > : ffffffffffffffff > > [ 164.896881][ T4080] s8 : 0000003ff2223004 s9 : 0000003ff2223010 > > s10: 0000003ff2223008 > > [ 164.905684][ T4080] s11: 0000003ff2223028 t3 : 0000000000000bb0 t4 > > : 0000000000000002 > > [ 164.906679][ T4080] t5 : 0000000000000002 t6 : 0000000000003d40 > > [ 164.908565][ T4080] status: 8000000000004020 badaddr: > > 0000000000000bb0 cause: 000000000000000c > > Segmentation fault > > OK > > syzkaller > > syzkaller login: [ 167.973016][ T4082] ------------[ cut here ]------------ > > [ 167.975887][ T4082] virt_to_phys used for non-linear address: > > 0000000059ffc026 (0xffffffd0158d105e) > > [ 167.979939][ T4082] WARNING: CPU: 0 PID: 4082 at > > arch/riscv/mm/physaddr.c:16 __virt_to_phys+0x74/0x78 > > [ 167.988658][ T4082] Modules linked in: > > [ 167.989781][ T4082] CPU: 0 PID: 4082 Comm: getty Not tainted > > 5.11.0-rc2-00012-g0983834a8393 #19 > > [ 167.991063][ T4082] epc: ffffffe000011164 ra : ffffffe000011164 sp > > : ffffffe01354fb10 > > [ 167.992243][ T4082] gp : ffffffe006234420 tp : ffffffe009c8ad80 t0 > > : ffffffe006cafb67 > > [ 167.993384][ T4082] t1 : 0000000000000001 t2 : 0000000000000000 s0 > > : ffffffe01354fb40 > > [ 167.994531][ T4082] s1 : fffffff0158d105e a0 : 000000000000004f a1 > > : 00000000000f0000 > > [ 167.995690][ T4082] a2 : 0000000000000002 a3 : ffffffe0000d1a30 a4 > > : 763e2d90a60ec500 > > [ 167.996803][ T4082] a5 : 763e2d90a60ec500 a6 : 0000000000f00000 a7 > > : ffffffe00009481c > > [ 167.999690][ T4082] s2 : ffffffd0158d105e s3 : 0000001fffffffff s4 > > : 0000000000000001 > > [ 168.000898][ T4082] s5 : ffffffd0158d105f s6 : ffffffd0158d3260 s7 > > : 0000003fff81eac8 > > [ 168.002093][ T4082] s8 : ffffffd0158d105e s9 : 0000000000000001 > > s10: 0000000000000000 > > [ 168.003226][ T4082] s11: 0000000000000000 t3 : 763e2d90a60ec500 t4 > > : ffffffc4026a9efd > > [ 168.004361][ T4082] t5 : ffffffc4026a9eff t6 : ffffffe01354f7f8 > > [ 168.005328][ T4082] status: 0000000000000120 badaddr: > > 0000000000000000 cause: 0000000000000003 > > [ 168.006756][ T4082] Kernel panic - not syncing: panic_on_warn set ... > > [ 168.008056][ T4082] CPU: 0 PID: 4082 Comm: getty Not tainted > > 5.11.0-rc2-00012-g0983834a8393 #19 > > [ 168.009301][ T4082] Call Trace: > > [ 168.009969][ T4082] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0 > > [ 168.011166][ T4082] [<ffffffe00458b2d8>] show_stack+0x3a/0x46 > > [ 168.012215][ T4082] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180 > > [ 168.013262][ T4082] [<ffffffe00458b6a0>] panic+0x20a/0x5cc > > [ 168.014264][ T4082] [<ffffffe000024210>] __warn+0x110/0x20a > > [ 168.015285][ T4082] [<ffffffe001759424>] report_bug+0x156/0x200 > > [ 168.016324][ T4082] [<ffffffe0000093f6>] do_trap_break+0xa6/0x152 > > [ 168.017431][ T4082] [<ffffffe00000559c>] ret_from_exception+0x0/0x14 > > [ 168.018560][ T4082] [<ffffffe0018c97bc>] n_tty_read+0x908/0x115a > > [ 168.020124][ T4082] SMP: stopping secondary CPUs > > [ 168.022087][ T4082] Rebooting in 86400 seconds.. > > > > I was fixing KASAN support for my sv48 patchset so I took a look at your > issue: I built a kernel on top of the branch riscv/fixes using > https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config > and Buildroot 2020.11. I have the warnings regarding the use of > __virt_to_phys on wrong addresses (but that's normal since this function > is used in virt_addr_valid) but not the segfaults you describe. Hi Alex, Let me try to rebuild buildroot image. Maybe there was something wrong with my build, though, I did 'make clean' before doing. But at the same time it worked back in June... Re WARNINGs, they indicate kernel bugs. I am working on setting up a syzbot instance on riscv. If there a WARNING during boot then the kernel will be marked as broken. No further testing will happen. Is it a mis-use of WARN_ON? If so, could anybody please remove it or replace it with pr_err. _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2021-01-29 8:11 ` Dmitry Vyukov @ 2021-02-16 11:17 ` Dmitry Vyukov 0 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-02-16 11:17 UTC (permalink / raw) To: Alex Ghiti Cc: Tobias Klauser, Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, linux-riscv On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > I was fixing KASAN support for my sv48 patchset so I took a look at your > > issue: I built a kernel on top of the branch riscv/fixes using > > https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config > > and Buildroot 2020.11. I have the warnings regarding the use of > > __virt_to_phys on wrong addresses (but that's normal since this function > > is used in virt_addr_valid) but not the segfaults you describe. > > Hi Alex, > > Let me try to rebuild buildroot image. Maybe there was something wrong > with my build, though, I did 'make clean' before doing. But at the > same time it worked back in June... > > Re WARNINGs, they indicate kernel bugs. I am working on setting up a > syzbot instance on riscv. If there a WARNING during boot then the > kernel will be marked as broken. No further testing will happen. > Is it a mis-use of WARN_ON? If so, could anybody please remove it or > replace it with pr_err. Hi, I've localized one issue with riscv/KASAN: KASAN breaks VDSO and that's I think the root cause of weird faults I saw earlier. The following patch fixes it. Could somebody please upstream this fix? I don't know how to add/run tests for this. Thanks diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile index 0cfd6da784f84..cf3a383c1799d 100644 --- a/arch/riscv/kernel/vdso/Makefile +++ b/arch/riscv/kernel/vdso/Makefile @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os # Disable gcov profiling for VDSO code GCOV_PROFILE := n KCOV_INSTRUMENT := n +KASAN_SANITIZE := n # Force dependency $(obj)/vdso.o: $(obj)/vdso.so ^ permalink raw reply related [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2021-02-16 11:17 ` Dmitry Vyukov 0 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-02-16 11:17 UTC (permalink / raw) To: Alex Ghiti Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > I was fixing KASAN support for my sv48 patchset so I took a look at your > > issue: I built a kernel on top of the branch riscv/fixes using > > https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config > > and Buildroot 2020.11. I have the warnings regarding the use of > > __virt_to_phys on wrong addresses (but that's normal since this function > > is used in virt_addr_valid) but not the segfaults you describe. > > Hi Alex, > > Let me try to rebuild buildroot image. Maybe there was something wrong > with my build, though, I did 'make clean' before doing. But at the > same time it worked back in June... > > Re WARNINGs, they indicate kernel bugs. I am working on setting up a > syzbot instance on riscv. If there a WARNING during boot then the > kernel will be marked as broken. No further testing will happen. > Is it a mis-use of WARN_ON? If so, could anybody please remove it or > replace it with pr_err. Hi, I've localized one issue with riscv/KASAN: KASAN breaks VDSO and that's I think the root cause of weird faults I saw earlier. The following patch fixes it. Could somebody please upstream this fix? I don't know how to add/run tests for this. Thanks diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile index 0cfd6da784f84..cf3a383c1799d 100644 --- a/arch/riscv/kernel/vdso/Makefile +++ b/arch/riscv/kernel/vdso/Makefile @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os # Disable gcov profiling for VDSO code GCOV_PROFILE := n KCOV_INSTRUMENT := n +KASAN_SANITIZE := n # Force dependency $(obj)/vdso.o: $(obj)/vdso.so _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2021-02-16 11:17 ` Dmitry Vyukov @ 2021-02-16 11:25 ` Dmitry Vyukov -1 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-02-16 11:25 UTC (permalink / raw) To: Alex Ghiti Cc: Tobias Klauser, Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, linux-riscv On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > I was fixing KASAN support for my sv48 patchset so I took a look at your > > > issue: I built a kernel on top of the branch riscv/fixes using > > > https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config > > > and Buildroot 2020.11. I have the warnings regarding the use of > > > __virt_to_phys on wrong addresses (but that's normal since this function > > > is used in virt_addr_valid) but not the segfaults you describe. > > > > Hi Alex, > > > > Let me try to rebuild buildroot image. Maybe there was something wrong > > with my build, though, I did 'make clean' before doing. But at the > > same time it worked back in June... > > > > Re WARNINGs, they indicate kernel bugs. I am working on setting up a > > syzbot instance on riscv. If there a WARNING during boot then the > > kernel will be marked as broken. No further testing will happen. > > Is it a mis-use of WARN_ON? If so, could anybody please remove it or > > replace it with pr_err. > > > Hi, > > I've localized one issue with riscv/KASAN: > KASAN breaks VDSO and that's I think the root cause of weird faults I > saw earlier. The following patch fixes it. > Could somebody please upstream this fix? I don't know how to add/run > tests for this. > Thanks > > diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile > index 0cfd6da784f84..cf3a383c1799d 100644 > --- a/arch/riscv/kernel/vdso/Makefile > +++ b/arch/riscv/kernel/vdso/Makefile > @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os > # Disable gcov profiling for VDSO code > GCOV_PROFILE := n > KCOV_INSTRUMENT := n > +KASAN_SANITIZE := n > > # Force dependency > $(obj)/vdso.o: $(obj)/vdso.so Second issue I am seeing seems to be related to text segment size. I check out v5.11 and use this config: https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 Then trying to boot it using: QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) $ qemu-system-riscv64 -machine virt -smp 2 -m 4G ... It shows no output from the kernel whatsoever, even though I have earlycon and output shows very early with other configs. Kernel boots fine with defconfig and other smaller configs. If I enable KASAN_OUTLINE and CC_OPTIMIZE_FOR_SIZE, then this config also boots fine. Both of these options significantly reduce kernel size. However, I can also boot the kernel without these 2 configs, if I disable a whole lot of subsystem configs. This makes me think that there is an issue related to kernel size somewhere in qemu/bootloader/kernel bootstrap code. Does it make sense to you? Can somebody reproduce what I am seeing? Thanks ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2021-02-16 11:25 ` Dmitry Vyukov 0 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-02-16 11:25 UTC (permalink / raw) To: Alex Ghiti Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > I was fixing KASAN support for my sv48 patchset so I took a look at your > > > issue: I built a kernel on top of the branch riscv/fixes using > > > https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config > > > and Buildroot 2020.11. I have the warnings regarding the use of > > > __virt_to_phys on wrong addresses (but that's normal since this function > > > is used in virt_addr_valid) but not the segfaults you describe. > > > > Hi Alex, > > > > Let me try to rebuild buildroot image. Maybe there was something wrong > > with my build, though, I did 'make clean' before doing. But at the > > same time it worked back in June... > > > > Re WARNINGs, they indicate kernel bugs. I am working on setting up a > > syzbot instance on riscv. If there a WARNING during boot then the > > kernel will be marked as broken. No further testing will happen. > > Is it a mis-use of WARN_ON? If so, could anybody please remove it or > > replace it with pr_err. > > > Hi, > > I've localized one issue with riscv/KASAN: > KASAN breaks VDSO and that's I think the root cause of weird faults I > saw earlier. The following patch fixes it. > Could somebody please upstream this fix? I don't know how to add/run > tests for this. > Thanks > > diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile > index 0cfd6da784f84..cf3a383c1799d 100644 > --- a/arch/riscv/kernel/vdso/Makefile > +++ b/arch/riscv/kernel/vdso/Makefile > @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os > # Disable gcov profiling for VDSO code > GCOV_PROFILE := n > KCOV_INSTRUMENT := n > +KASAN_SANITIZE := n > > # Force dependency > $(obj)/vdso.o: $(obj)/vdso.so Second issue I am seeing seems to be related to text segment size. I check out v5.11 and use this config: https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 Then trying to boot it using: QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) $ qemu-system-riscv64 -machine virt -smp 2 -m 4G ... It shows no output from the kernel whatsoever, even though I have earlycon and output shows very early with other configs. Kernel boots fine with defconfig and other smaller configs. If I enable KASAN_OUTLINE and CC_OPTIMIZE_FOR_SIZE, then this config also boots fine. Both of these options significantly reduce kernel size. However, I can also boot the kernel without these 2 configs, if I disable a whole lot of subsystem configs. This makes me think that there is an issue related to kernel size somewhere in qemu/bootloader/kernel bootstrap code. Does it make sense to you? Can somebody reproduce what I am seeing? Thanks _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2021-02-16 11:25 ` Dmitry Vyukov @ 2021-02-16 13:45 ` Dmitry Vyukov -1 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-02-16 13:45 UTC (permalink / raw) To: Alex Ghiti Cc: Tobias Klauser, Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, linux-riscv On Tue, Feb 16, 2021 at 12:25 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > I was fixing KASAN support for my sv48 patchset so I took a look at your > > > > issue: I built a kernel on top of the branch riscv/fixes using > > > > https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config > > > > and Buildroot 2020.11. I have the warnings regarding the use of > > > > __virt_to_phys on wrong addresses (but that's normal since this function > > > > is used in virt_addr_valid) but not the segfaults you describe. > > > > > > Hi Alex, > > > > > > Let me try to rebuild buildroot image. Maybe there was something wrong > > > with my build, though, I did 'make clean' before doing. But at the > > > same time it worked back in June... > > > > > > Re WARNINGs, they indicate kernel bugs. I am working on setting up a > > > syzbot instance on riscv. If there a WARNING during boot then the > > > kernel will be marked as broken. No further testing will happen. > > > Is it a mis-use of WARN_ON? If so, could anybody please remove it or > > > replace it with pr_err. > > > > > > Hi, > > > > I've localized one issue with riscv/KASAN: > > KASAN breaks VDSO and that's I think the root cause of weird faults I > > saw earlier. The following patch fixes it. > > Could somebody please upstream this fix? I don't know how to add/run > > tests for this. > > Thanks > > > > diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile > > index 0cfd6da784f84..cf3a383c1799d 100644 > > --- a/arch/riscv/kernel/vdso/Makefile > > +++ b/arch/riscv/kernel/vdso/Makefile > > @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os > > # Disable gcov profiling for VDSO code > > GCOV_PROFILE := n > > KCOV_INSTRUMENT := n > > +KASAN_SANITIZE := n > > > > # Force dependency > > $(obj)/vdso.o: $(obj)/vdso.so > > > > Second issue I am seeing seems to be related to text segment size. > I check out v5.11 and use this config: > https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 > > Then trying to boot it using: > QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) > $ qemu-system-riscv64 -machine virt -smp 2 -m 4G ... > > It shows no output from the kernel whatsoever, even though I have > earlycon and output shows very early with other configs. > Kernel boots fine with defconfig and other smaller configs. > > If I enable KASAN_OUTLINE and CC_OPTIMIZE_FOR_SIZE, then this config > also boots fine. Both of these options significantly reduce kernel > size. However, I can also boot the kernel without these 2 configs, if > I disable a whole lot of subsystem configs. This makes me think that > there is an issue related to kernel size somewhere in > qemu/bootloader/kernel bootstrap code. > Does it make sense to you? Can somebody reproduce what I am seeing? I am debugging the next issue with VDSO. clock_gettime is broken in some weird way. syzkaller has this function: static uint64 current_time_ms(void) { struct timespec ts; if (clock_gettime(CLOCK_MONOTONIC, &ts)) //if (syscall(SYS_clock_gettime, CLOCK_MONOTONIC, &ts)) fail("clock_gettime failed"); return (uint64)ts.tv_sec * 1000 + (uint64)ts.tv_nsec / 1000000; } When using clock_gettime it producer some nonsense that breaks all timeouts (in particular monotonic time goes backwards): pid=4343 now=836038064151457975 pid=4343 now=836038064151457975 pid=4343 now=836038064151457970 pid=4343 now=836038064151457971 When I tested it calling real syscall, it works as expected: pid=4876 now=2493379 pid=4876 now=2493392 pid=4876 now=2493395 pid=4876 now=2493409 pid=4876 now=2493414 Is it a known issue? Any ideas? ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2021-02-16 13:45 ` Dmitry Vyukov 0 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-02-16 13:45 UTC (permalink / raw) To: Alex Ghiti Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv On Tue, Feb 16, 2021 at 12:25 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > I was fixing KASAN support for my sv48 patchset so I took a look at your > > > > issue: I built a kernel on top of the branch riscv/fixes using > > > > https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config > > > > and Buildroot 2020.11. I have the warnings regarding the use of > > > > __virt_to_phys on wrong addresses (but that's normal since this function > > > > is used in virt_addr_valid) but not the segfaults you describe. > > > > > > Hi Alex, > > > > > > Let me try to rebuild buildroot image. Maybe there was something wrong > > > with my build, though, I did 'make clean' before doing. But at the > > > same time it worked back in June... > > > > > > Re WARNINGs, they indicate kernel bugs. I am working on setting up a > > > syzbot instance on riscv. If there a WARNING during boot then the > > > kernel will be marked as broken. No further testing will happen. > > > Is it a mis-use of WARN_ON? If so, could anybody please remove it or > > > replace it with pr_err. > > > > > > Hi, > > > > I've localized one issue with riscv/KASAN: > > KASAN breaks VDSO and that's I think the root cause of weird faults I > > saw earlier. The following patch fixes it. > > Could somebody please upstream this fix? I don't know how to add/run > > tests for this. > > Thanks > > > > diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile > > index 0cfd6da784f84..cf3a383c1799d 100644 > > --- a/arch/riscv/kernel/vdso/Makefile > > +++ b/arch/riscv/kernel/vdso/Makefile > > @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os > > # Disable gcov profiling for VDSO code > > GCOV_PROFILE := n > > KCOV_INSTRUMENT := n > > +KASAN_SANITIZE := n > > > > # Force dependency > > $(obj)/vdso.o: $(obj)/vdso.so > > > > Second issue I am seeing seems to be related to text segment size. > I check out v5.11 and use this config: > https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 > > Then trying to boot it using: > QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) > $ qemu-system-riscv64 -machine virt -smp 2 -m 4G ... > > It shows no output from the kernel whatsoever, even though I have > earlycon and output shows very early with other configs. > Kernel boots fine with defconfig and other smaller configs. > > If I enable KASAN_OUTLINE and CC_OPTIMIZE_FOR_SIZE, then this config > also boots fine. Both of these options significantly reduce kernel > size. However, I can also boot the kernel without these 2 configs, if > I disable a whole lot of subsystem configs. This makes me think that > there is an issue related to kernel size somewhere in > qemu/bootloader/kernel bootstrap code. > Does it make sense to you? Can somebody reproduce what I am seeing? I am debugging the next issue with VDSO. clock_gettime is broken in some weird way. syzkaller has this function: static uint64 current_time_ms(void) { struct timespec ts; if (clock_gettime(CLOCK_MONOTONIC, &ts)) //if (syscall(SYS_clock_gettime, CLOCK_MONOTONIC, &ts)) fail("clock_gettime failed"); return (uint64)ts.tv_sec * 1000 + (uint64)ts.tv_nsec / 1000000; } When using clock_gettime it producer some nonsense that breaks all timeouts (in particular monotonic time goes backwards): pid=4343 now=836038064151457975 pid=4343 now=836038064151457975 pid=4343 now=836038064151457970 pid=4343 now=836038064151457971 When I tested it calling real syscall, it works as expected: pid=4876 now=2493379 pid=4876 now=2493392 pid=4876 now=2493395 pid=4876 now=2493409 pid=4876 now=2493414 Is it a known issue? Any ideas? _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2021-02-16 11:25 ` Dmitry Vyukov @ 2021-02-16 20:42 ` Alex Ghiti -1 siblings, 0 replies; 55+ messages in thread From: Alex Ghiti @ 2021-02-16 20:42 UTC (permalink / raw) To: Dmitry Vyukov Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv Hi Dmitry, Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit : > On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: >> >> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: >>>> I was fixing KASAN support for my sv48 patchset so I took a look at your >>>> issue: I built a kernel on top of the branch riscv/fixes using >>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config >>>> and Buildroot 2020.11. I have the warnings regarding the use of >>>> __virt_to_phys on wrong addresses (but that's normal since this function >>>> is used in virt_addr_valid) but not the segfaults you describe. >>> >>> Hi Alex, >>> >>> Let me try to rebuild buildroot image. Maybe there was something wrong >>> with my build, though, I did 'make clean' before doing. But at the >>> same time it worked back in June... >>> >>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a >>> syzbot instance on riscv. If there a WARNING during boot then the >>> kernel will be marked as broken. No further testing will happen. >>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or >>> replace it with pr_err. >> >> >> Hi, >> >> I've localized one issue with riscv/KASAN: >> KASAN breaks VDSO and that's I think the root cause of weird faults I >> saw earlier. The following patch fixes it. >> Could somebody please upstream this fix? I don't know how to add/run >> tests for this. >> Thanks >> >> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile >> index 0cfd6da784f84..cf3a383c1799d 100644 >> --- a/arch/riscv/kernel/vdso/Makefile >> +++ b/arch/riscv/kernel/vdso/Makefile >> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os >> # Disable gcov profiling for VDSO code >> GCOV_PROFILE := n >> KCOV_INSTRUMENT := n >> +KASAN_SANITIZE := n >> >> # Force dependency >> $(obj)/vdso.o: $(obj)/vdso.so What's weird is that I don't have any issue without this patch with the following config whereas it indeed seems required for KASAN. But when looking at the segfaults you got earlier, the segfault address is 0xbb0 and the cause is an instruction page fault: this address is the PLT base address in vdso.so and an instruction page fault would mean that someone tried to jump at this address, which is weird. At first sight, that does not seem related to your patch above, but clearly I may be wrong. Tobias, did you observe the same segfaults as Dmitry ? > > > > Second issue I am seeing seems to be related to text segment size. > I check out v5.11 and use this config: > https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 This config gave my laptop a hard time ! Finally I was able to boot correctly to userspace, but I realized I used my sv48 branch...Either I fixed your issue along the way or I can't reproduce it, I'll give it a try tomorrow. > > Then trying to boot it using: > QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) > $ qemu-system-riscv64 -machine virt -smp 2 -m 4G ... > > It shows no output from the kernel whatsoever, even though I have > earlycon and output shows very early with other configs. > Kernel boots fine with defconfig and other smaller configs. > > If I enable KASAN_OUTLINE and CC_OPTIMIZE_FOR_SIZE, then this config > also boots fine. Both of these options significantly reduce kernel > size. However, I can also boot the kernel without these 2 configs, if > I disable a whole lot of subsystem configs. This makes me think that > there is an issue related to kernel size somewhere in > qemu/bootloader/kernel bootstrap code. > Does it make sense to you? Can somebody reproduce what I am seeing? > I did not bring any answer to your question, but at least you know I'm working on it, I'll keep you posted. Thanks for taking the time to setup syzkaller. Alex > Thanks > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2021-02-16 20:42 ` Alex Ghiti 0 siblings, 0 replies; 55+ messages in thread From: Alex Ghiti @ 2021-02-16 20:42 UTC (permalink / raw) To: Dmitry Vyukov Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv Hi Dmitry, Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit : > On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: >> >> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: >>>> I was fixing KASAN support for my sv48 patchset so I took a look at your >>>> issue: I built a kernel on top of the branch riscv/fixes using >>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config >>>> and Buildroot 2020.11. I have the warnings regarding the use of >>>> __virt_to_phys on wrong addresses (but that's normal since this function >>>> is used in virt_addr_valid) but not the segfaults you describe. >>> >>> Hi Alex, >>> >>> Let me try to rebuild buildroot image. Maybe there was something wrong >>> with my build, though, I did 'make clean' before doing. But at the >>> same time it worked back in June... >>> >>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a >>> syzbot instance on riscv. If there a WARNING during boot then the >>> kernel will be marked as broken. No further testing will happen. >>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or >>> replace it with pr_err. >> >> >> Hi, >> >> I've localized one issue with riscv/KASAN: >> KASAN breaks VDSO and that's I think the root cause of weird faults I >> saw earlier. The following patch fixes it. >> Could somebody please upstream this fix? I don't know how to add/run >> tests for this. >> Thanks >> >> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile >> index 0cfd6da784f84..cf3a383c1799d 100644 >> --- a/arch/riscv/kernel/vdso/Makefile >> +++ b/arch/riscv/kernel/vdso/Makefile >> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os >> # Disable gcov profiling for VDSO code >> GCOV_PROFILE := n >> KCOV_INSTRUMENT := n >> +KASAN_SANITIZE := n >> >> # Force dependency >> $(obj)/vdso.o: $(obj)/vdso.so What's weird is that I don't have any issue without this patch with the following config whereas it indeed seems required for KASAN. But when looking at the segfaults you got earlier, the segfault address is 0xbb0 and the cause is an instruction page fault: this address is the PLT base address in vdso.so and an instruction page fault would mean that someone tried to jump at this address, which is weird. At first sight, that does not seem related to your patch above, but clearly I may be wrong. Tobias, did you observe the same segfaults as Dmitry ? > > > > Second issue I am seeing seems to be related to text segment size. > I check out v5.11 and use this config: > https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 This config gave my laptop a hard time ! Finally I was able to boot correctly to userspace, but I realized I used my sv48 branch...Either I fixed your issue along the way or I can't reproduce it, I'll give it a try tomorrow. > > Then trying to boot it using: > QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) > $ qemu-system-riscv64 -machine virt -smp 2 -m 4G ... > > It shows no output from the kernel whatsoever, even though I have > earlycon and output shows very early with other configs. > Kernel boots fine with defconfig and other smaller configs. > > If I enable KASAN_OUTLINE and CC_OPTIMIZE_FOR_SIZE, then this config > also boots fine. Both of these options significantly reduce kernel > size. However, I can also boot the kernel without these 2 configs, if > I disable a whole lot of subsystem configs. This makes me think that > there is an issue related to kernel size somewhere in > qemu/bootloader/kernel bootstrap code. > Does it make sense to you? Can somebody reproduce what I am seeing? > I did not bring any answer to your question, but at least you know I'm working on it, I'll keep you posted. Thanks for taking the time to setup syzkaller. Alex > Thanks > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv > _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2021-02-16 20:42 ` Alex Ghiti @ 2021-02-17 4:42 ` Dmitry Vyukov -1 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-02-17 4:42 UTC (permalink / raw) To: Alex Ghiti Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote: > > Hi Dmitry, > > Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit : > > On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: > >> > >> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: > >>>> I was fixing KASAN support for my sv48 patchset so I took a look at your > >>>> issue: I built a kernel on top of the branch riscv/fixes using > >>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config > >>>> and Buildroot 2020.11. I have the warnings regarding the use of > >>>> __virt_to_phys on wrong addresses (but that's normal since this function > >>>> is used in virt_addr_valid) but not the segfaults you describe. > >>> > >>> Hi Alex, > >>> > >>> Let me try to rebuild buildroot image. Maybe there was something wrong > >>> with my build, though, I did 'make clean' before doing. But at the > >>> same time it worked back in June... > >>> > >>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a > >>> syzbot instance on riscv. If there a WARNING during boot then the > >>> kernel will be marked as broken. No further testing will happen. > >>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or > >>> replace it with pr_err. > >> > >> > >> Hi, > >> > >> I've localized one issue with riscv/KASAN: > >> KASAN breaks VDSO and that's I think the root cause of weird faults I > >> saw earlier. The following patch fixes it. > >> Could somebody please upstream this fix? I don't know how to add/run > >> tests for this. > >> Thanks > >> > >> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile > >> index 0cfd6da784f84..cf3a383c1799d 100644 > >> --- a/arch/riscv/kernel/vdso/Makefile > >> +++ b/arch/riscv/kernel/vdso/Makefile > >> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os > >> # Disable gcov profiling for VDSO code > >> GCOV_PROFILE := n > >> KCOV_INSTRUMENT := n > >> +KASAN_SANITIZE := n > >> > >> # Force dependency > >> $(obj)/vdso.o: $(obj)/vdso.so > > What's weird is that I don't have any issue without this patch with the > following config whereas it indeed seems required for KASAN. But when > looking at the segfaults you got earlier, the segfault address is 0xbb0 > and the cause is an instruction page fault: this address is the PLT base > address in vdso.so and an instruction page fault would mean that someone > tried to jump at this address, which is weird. At first sight, that does > not seem related to your patch above, but clearly I may be wrong. > > Tobias, did you observe the same segfaults as Dmitry ? I noticed that not all buildroot images use VDSO, it seems to be dependent on libc settings (at least I think I changed it in the past). I also booted an image completely successfully including dhcpd/sshd start, but then my executable crashed in clock_gettime. The executable was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static" (10.2.1). > > Second issue I am seeing seems to be related to text segment size. > > I check out v5.11 and use this config: > > https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 > > This config gave my laptop a hard time ! Finally I was able to boot > correctly to userspace, but I realized I used my sv48 branch...Either I > fixed your issue along the way or I can't reproduce it, I'll give it a > try tomorrow. Where is your branch? I could also test in my setup on your branch. > > Then trying to boot it using: > > QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) > > $ qemu-system-riscv64 -machine virt -smp 2 -m 4G ... > > > > It shows no output from the kernel whatsoever, even though I have > > earlycon and output shows very early with other configs. > > Kernel boots fine with defconfig and other smaller configs. > > > > If I enable KASAN_OUTLINE and CC_OPTIMIZE_FOR_SIZE, then this config > > also boots fine. Both of these options significantly reduce kernel > > size. However, I can also boot the kernel without these 2 configs, if > > I disable a whole lot of subsystem configs. This makes me think that > > there is an issue related to kernel size somewhere in > > qemu/bootloader/kernel bootstrap code. > > Does it make sense to you? Can somebody reproduce what I am seeing? > > > I did not bring any answer to your question, but at least you know I'm > working on it, I'll keep you posted. > > Thanks for taking the time to setup syzkaller. > > Alex > > > Thanks > > > > _______________________________________________ > > linux-riscv mailing list > > linux-riscv@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-riscv > > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2021-02-17 4:42 ` Dmitry Vyukov 0 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-02-17 4:42 UTC (permalink / raw) To: Alex Ghiti Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote: > > Hi Dmitry, > > Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit : > > On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: > >> > >> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: > >>>> I was fixing KASAN support for my sv48 patchset so I took a look at your > >>>> issue: I built a kernel on top of the branch riscv/fixes using > >>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config > >>>> and Buildroot 2020.11. I have the warnings regarding the use of > >>>> __virt_to_phys on wrong addresses (but that's normal since this function > >>>> is used in virt_addr_valid) but not the segfaults you describe. > >>> > >>> Hi Alex, > >>> > >>> Let me try to rebuild buildroot image. Maybe there was something wrong > >>> with my build, though, I did 'make clean' before doing. But at the > >>> same time it worked back in June... > >>> > >>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a > >>> syzbot instance on riscv. If there a WARNING during boot then the > >>> kernel will be marked as broken. No further testing will happen. > >>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or > >>> replace it with pr_err. > >> > >> > >> Hi, > >> > >> I've localized one issue with riscv/KASAN: > >> KASAN breaks VDSO and that's I think the root cause of weird faults I > >> saw earlier. The following patch fixes it. > >> Could somebody please upstream this fix? I don't know how to add/run > >> tests for this. > >> Thanks > >> > >> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile > >> index 0cfd6da784f84..cf3a383c1799d 100644 > >> --- a/arch/riscv/kernel/vdso/Makefile > >> +++ b/arch/riscv/kernel/vdso/Makefile > >> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os > >> # Disable gcov profiling for VDSO code > >> GCOV_PROFILE := n > >> KCOV_INSTRUMENT := n > >> +KASAN_SANITIZE := n > >> > >> # Force dependency > >> $(obj)/vdso.o: $(obj)/vdso.so > > What's weird is that I don't have any issue without this patch with the > following config whereas it indeed seems required for KASAN. But when > looking at the segfaults you got earlier, the segfault address is 0xbb0 > and the cause is an instruction page fault: this address is the PLT base > address in vdso.so and an instruction page fault would mean that someone > tried to jump at this address, which is weird. At first sight, that does > not seem related to your patch above, but clearly I may be wrong. > > Tobias, did you observe the same segfaults as Dmitry ? I noticed that not all buildroot images use VDSO, it seems to be dependent on libc settings (at least I think I changed it in the past). I also booted an image completely successfully including dhcpd/sshd start, but then my executable crashed in clock_gettime. The executable was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static" (10.2.1). > > Second issue I am seeing seems to be related to text segment size. > > I check out v5.11 and use this config: > > https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 > > This config gave my laptop a hard time ! Finally I was able to boot > correctly to userspace, but I realized I used my sv48 branch...Either I > fixed your issue along the way or I can't reproduce it, I'll give it a > try tomorrow. Where is your branch? I could also test in my setup on your branch. > > Then trying to boot it using: > > QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) > > $ qemu-system-riscv64 -machine virt -smp 2 -m 4G ... > > > > It shows no output from the kernel whatsoever, even though I have > > earlycon and output shows very early with other configs. > > Kernel boots fine with defconfig and other smaller configs. > > > > If I enable KASAN_OUTLINE and CC_OPTIMIZE_FOR_SIZE, then this config > > also boots fine. Both of these options significantly reduce kernel > > size. However, I can also boot the kernel without these 2 configs, if > > I disable a whole lot of subsystem configs. This makes me think that > > there is an issue related to kernel size somewhere in > > qemu/bootloader/kernel bootstrap code. > > Does it make sense to you? Can somebody reproduce what I am seeing? > > > I did not bring any answer to your question, but at least you know I'm > working on it, I'll keep you posted. > > Thanks for taking the time to setup syzkaller. > > Alex > > > Thanks > > > > _______________________________________________ > > linux-riscv mailing list > > linux-riscv@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-riscv > > _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2021-02-17 4:42 ` Dmitry Vyukov @ 2021-02-17 16:36 ` Alex Ghiti -1 siblings, 0 replies; 55+ messages in thread From: Alex Ghiti @ 2021-02-17 16:36 UTC (permalink / raw) To: Dmitry Vyukov Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit : > On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote: >> >> Hi Dmitry, >> >> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit : >>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: >>>> >>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: >>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your >>>>>> issue: I built a kernel on top of the branch riscv/fixes using >>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config >>>>>> and Buildroot 2020.11. I have the warnings regarding the use of >>>>>> __virt_to_phys on wrong addresses (but that's normal since this function >>>>>> is used in virt_addr_valid) but not the segfaults you describe. >>>>> >>>>> Hi Alex, >>>>> >>>>> Let me try to rebuild buildroot image. Maybe there was something wrong >>>>> with my build, though, I did 'make clean' before doing. But at the >>>>> same time it worked back in June... >>>>> >>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a >>>>> syzbot instance on riscv. If there a WARNING during boot then the >>>>> kernel will be marked as broken. No further testing will happen. >>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or >>>>> replace it with pr_err. >>>> >>>> >>>> Hi, >>>> >>>> I've localized one issue with riscv/KASAN: >>>> KASAN breaks VDSO and that's I think the root cause of weird faults I >>>> saw earlier. The following patch fixes it. >>>> Could somebody please upstream this fix? I don't know how to add/run >>>> tests for this. >>>> Thanks >>>> >>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile >>>> index 0cfd6da784f84..cf3a383c1799d 100644 >>>> --- a/arch/riscv/kernel/vdso/Makefile >>>> +++ b/arch/riscv/kernel/vdso/Makefile >>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os >>>> # Disable gcov profiling for VDSO code >>>> GCOV_PROFILE := n >>>> KCOV_INSTRUMENT := n >>>> +KASAN_SANITIZE := n >>>> >>>> # Force dependency >>>> $(obj)/vdso.o: $(obj)/vdso.so >> >> What's weird is that I don't have any issue without this patch with the >> following config whereas it indeed seems required for KASAN. But when >> looking at the segfaults you got earlier, the segfault address is 0xbb0 >> and the cause is an instruction page fault: this address is the PLT base >> address in vdso.so and an instruction page fault would mean that someone >> tried to jump at this address, which is weird. At first sight, that does >> not seem related to your patch above, but clearly I may be wrong. >> >> Tobias, did you observe the same segfaults as Dmitry ? > > > I noticed that not all buildroot images use VDSO, it seems to be > dependent on libc settings (at least I think I changed it in the > past). Ok, I used uClibc but then when using glibc, I have the same segfaults, only when KASAN is enabled. And your patch fixes the problem. I will try to take a look later to better understand the problem. > I also booted an image completely successfully including dhcpd/sshd > start, but then my executable crashed in clock_gettime. The executable > was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static" > (10.2.1). > > >>> Second issue I am seeing seems to be related to text segment size. >>> I check out v5.11 and use this config: >>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 >> >> This config gave my laptop a hard time ! Finally I was able to boot >> correctly to userspace, but I realized I used my sv48 branch...Either I >> fixed your issue along the way or I can't reproduce it, I'll give it a >> try tomorrow. > > Where is your branch? I could also test in my setup on your branch. > You can find my branch int/alex/riscv_kernel_end_of_address_space_v2 here: https://github.com/AlexGhiti/riscv-linux.git Thanks, > >>> Then trying to boot it using: >>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) >>> $ qemu-system-riscv64 -machine virt -smp 2 -m 4G ... >>> >>> It shows no output from the kernel whatsoever, even though I have >>> earlycon and output shows very early with other configs. >>> Kernel boots fine with defconfig and other smaller configs. >>> >>> If I enable KASAN_OUTLINE and CC_OPTIMIZE_FOR_SIZE, then this config >>> also boots fine. Both of these options significantly reduce kernel >>> size. However, I can also boot the kernel without these 2 configs, if >>> I disable a whole lot of subsystem configs. This makes me think that >>> there is an issue related to kernel size somewhere in >>> qemu/bootloader/kernel bootstrap code. >>> Does it make sense to you? Can somebody reproduce what I am seeing? > >> >> I did not bring any answer to your question, but at least you know I'm >> working on it, I'll keep you posted. >> >> Thanks for taking the time to setup syzkaller. >> >> Alex >> >>> Thanks >>> >>> _______________________________________________ >>> linux-riscv mailing list >>> linux-riscv@lists.infradead.org >>> http://lists.infradead.org/mailman/listinfo/linux-riscv >>> > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2021-02-17 16:36 ` Alex Ghiti 0 siblings, 0 replies; 55+ messages in thread From: Alex Ghiti @ 2021-02-17 16:36 UTC (permalink / raw) To: Dmitry Vyukov Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit : > On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote: >> >> Hi Dmitry, >> >> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit : >>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: >>>> >>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: >>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your >>>>>> issue: I built a kernel on top of the branch riscv/fixes using >>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config >>>>>> and Buildroot 2020.11. I have the warnings regarding the use of >>>>>> __virt_to_phys on wrong addresses (but that's normal since this function >>>>>> is used in virt_addr_valid) but not the segfaults you describe. >>>>> >>>>> Hi Alex, >>>>> >>>>> Let me try to rebuild buildroot image. Maybe there was something wrong >>>>> with my build, though, I did 'make clean' before doing. But at the >>>>> same time it worked back in June... >>>>> >>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a >>>>> syzbot instance on riscv. If there a WARNING during boot then the >>>>> kernel will be marked as broken. No further testing will happen. >>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or >>>>> replace it with pr_err. >>>> >>>> >>>> Hi, >>>> >>>> I've localized one issue with riscv/KASAN: >>>> KASAN breaks VDSO and that's I think the root cause of weird faults I >>>> saw earlier. The following patch fixes it. >>>> Could somebody please upstream this fix? I don't know how to add/run >>>> tests for this. >>>> Thanks >>>> >>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile >>>> index 0cfd6da784f84..cf3a383c1799d 100644 >>>> --- a/arch/riscv/kernel/vdso/Makefile >>>> +++ b/arch/riscv/kernel/vdso/Makefile >>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os >>>> # Disable gcov profiling for VDSO code >>>> GCOV_PROFILE := n >>>> KCOV_INSTRUMENT := n >>>> +KASAN_SANITIZE := n >>>> >>>> # Force dependency >>>> $(obj)/vdso.o: $(obj)/vdso.so >> >> What's weird is that I don't have any issue without this patch with the >> following config whereas it indeed seems required for KASAN. But when >> looking at the segfaults you got earlier, the segfault address is 0xbb0 >> and the cause is an instruction page fault: this address is the PLT base >> address in vdso.so and an instruction page fault would mean that someone >> tried to jump at this address, which is weird. At first sight, that does >> not seem related to your patch above, but clearly I may be wrong. >> >> Tobias, did you observe the same segfaults as Dmitry ? > > > I noticed that not all buildroot images use VDSO, it seems to be > dependent on libc settings (at least I think I changed it in the > past). Ok, I used uClibc but then when using glibc, I have the same segfaults, only when KASAN is enabled. And your patch fixes the problem. I will try to take a look later to better understand the problem. > I also booted an image completely successfully including dhcpd/sshd > start, but then my executable crashed in clock_gettime. The executable > was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static" > (10.2.1). > > >>> Second issue I am seeing seems to be related to text segment size. >>> I check out v5.11 and use this config: >>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 >> >> This config gave my laptop a hard time ! Finally I was able to boot >> correctly to userspace, but I realized I used my sv48 branch...Either I >> fixed your issue along the way or I can't reproduce it, I'll give it a >> try tomorrow. > > Where is your branch? I could also test in my setup on your branch. > You can find my branch int/alex/riscv_kernel_end_of_address_space_v2 here: https://github.com/AlexGhiti/riscv-linux.git Thanks, > >>> Then trying to boot it using: >>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) >>> $ qemu-system-riscv64 -machine virt -smp 2 -m 4G ... >>> >>> It shows no output from the kernel whatsoever, even though I have >>> earlycon and output shows very early with other configs. >>> Kernel boots fine with defconfig and other smaller configs. >>> >>> If I enable KASAN_OUTLINE and CC_OPTIMIZE_FOR_SIZE, then this config >>> also boots fine. Both of these options significantly reduce kernel >>> size. However, I can also boot the kernel without these 2 configs, if >>> I disable a whole lot of subsystem configs. This makes me think that >>> there is an issue related to kernel size somewhere in >>> qemu/bootloader/kernel bootstrap code. >>> Does it make sense to you? Can somebody reproduce what I am seeing? > >> >> I did not bring any answer to your question, but at least you know I'm >> working on it, I'll keep you posted. >> >> Thanks for taking the time to setup syzkaller. >> >> Alex >> >>> Thanks >>> >>> _______________________________________________ >>> linux-riscv mailing list >>> linux-riscv@lists.infradead.org >>> http://lists.infradead.org/mailman/listinfo/linux-riscv >>> > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv > _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2021-02-17 16:36 ` Alex Ghiti @ 2021-02-17 17:34 ` Dmitry Vyukov -1 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-02-17 17:34 UTC (permalink / raw) To: Alex Ghiti Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote: > > Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit : > > On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote: > >> > >> Hi Dmitry, > >> > >> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit : > >>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: > >>>> > >>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: > >>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your > >>>>>> issue: I built a kernel on top of the branch riscv/fixes using > >>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config > >>>>>> and Buildroot 2020.11. I have the warnings regarding the use of > >>>>>> __virt_to_phys on wrong addresses (but that's normal since this function > >>>>>> is used in virt_addr_valid) but not the segfaults you describe. > >>>>> > >>>>> Hi Alex, > >>>>> > >>>>> Let me try to rebuild buildroot image. Maybe there was something wrong > >>>>> with my build, though, I did 'make clean' before doing. But at the > >>>>> same time it worked back in June... > >>>>> > >>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a > >>>>> syzbot instance on riscv. If there a WARNING during boot then the > >>>>> kernel will be marked as broken. No further testing will happen. > >>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or > >>>>> replace it with pr_err. > >>>> > >>>> > >>>> Hi, > >>>> > >>>> I've localized one issue with riscv/KASAN: > >>>> KASAN breaks VDSO and that's I think the root cause of weird faults I > >>>> saw earlier. The following patch fixes it. > >>>> Could somebody please upstream this fix? I don't know how to add/run > >>>> tests for this. > >>>> Thanks > >>>> > >>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile > >>>> index 0cfd6da784f84..cf3a383c1799d 100644 > >>>> --- a/arch/riscv/kernel/vdso/Makefile > >>>> +++ b/arch/riscv/kernel/vdso/Makefile > >>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os > >>>> # Disable gcov profiling for VDSO code > >>>> GCOV_PROFILE := n > >>>> KCOV_INSTRUMENT := n > >>>> +KASAN_SANITIZE := n > >>>> > >>>> # Force dependency > >>>> $(obj)/vdso.o: $(obj)/vdso.so > >> > >> What's weird is that I don't have any issue without this patch with the > >> following config whereas it indeed seems required for KASAN. But when > >> looking at the segfaults you got earlier, the segfault address is 0xbb0 > >> and the cause is an instruction page fault: this address is the PLT base > >> address in vdso.so and an instruction page fault would mean that someone > >> tried to jump at this address, which is weird. At first sight, that does > >> not seem related to your patch above, but clearly I may be wrong. > >> > >> Tobias, did you observe the same segfaults as Dmitry ? > > > > > > I noticed that not all buildroot images use VDSO, it seems to be > > dependent on libc settings (at least I think I changed it in the > > past). > > Ok, I used uClibc but then when using glibc, I have the same segfaults, > only when KASAN is enabled. And your patch fixes the problem. I will try > to take a look later to better understand the problem. > > > I also booted an image completely successfully including dhcpd/sshd > > start, but then my executable crashed in clock_gettime. The executable > > was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static" > > (10.2.1). > > > > > >>> Second issue I am seeing seems to be related to text segment size. > >>> I check out v5.11 and use this config: > >>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 > >> > >> This config gave my laptop a hard time ! Finally I was able to boot > >> correctly to userspace, but I realized I used my sv48 branch...Either I > >> fixed your issue along the way or I can't reproduce it, I'll give it a > >> try tomorrow. > > > > Where is your branch? I could also test in my setup on your branch. > > > > You can find my branch int/alex/riscv_kernel_end_of_address_space_v2 > here: https://github.com/AlexGhiti/riscv-linux.git No, it does not work for me. Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD, int/alex/riscv_kernel_end_of_address_space_v2) Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt riscv64-linux-gnu-gcc -v gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1) qemu-system-riscv64 --version QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) qemu-system-riscv64 \ -machine virt -smp 2 -m 2G \ -device virtio-blk-device,drive=hd0 \ -drive file=image-riscv64,if=none,format=raw,id=hd0 \ -kernel arch/riscv/boot/Image \ -nographic \ -device virtio-rng-device,rng=rng0 -object rng-random,filename=/dev/urandom,id=rng0 \ -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device virtio-net-device,netdev=net0 \ -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic panic_on_warn=1 panic=86400 earlycon" OpenSBI v0.8 ____ _____ ____ _____ / __ \ / ____| _ \_ _| | | | |_ __ ___ _ __ | (___ | |_) || | | | | | '_ \ / _ \ '_ \ \___ \| _ < | | | |__| | |_) | __/ | | |____) | |_) || |_ \____/| .__/ \___|_| |_|_____/|____/_____| | | |_| Platform Name : riscv-virtio,qemu Platform Features : timer,mfdeleg Platform HART Count : 2 Boot HART ID : 1 Boot HART ISA : rv64imafdcsu BOOT HART Features : pmp,scounteren,mcounteren,time BOOT HART PMP Count : 16 Firmware Base : 0x80000000 Firmware Size : 104 KB Runtime SBI Version : 0.2 MIDELEG : 0x0000000000000222 MEDELEG : 0x000000000000b109 PMP0 : 0x0000000080000000-0x000000008001ffff (A) no output after this PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X) > Thanks, > > > > >>> Then trying to boot it using: > >>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) > >>> $ qemu-system-riscv64 -machine virt -smp 2 -m 4G ... > >>> > >>> It shows no output from the kernel whatsoever, even though I have > >>> earlycon and output shows very early with other configs. > >>> Kernel boots fine with defconfig and other smaller configs. > >>> > >>> If I enable KASAN_OUTLINE and CC_OPTIMIZE_FOR_SIZE, then this config > >>> also boots fine. Both of these options significantly reduce kernel > >>> size. However, I can also boot the kernel without these 2 configs, if > >>> I disable a whole lot of subsystem configs. This makes me think that > >>> there is an issue related to kernel size somewhere in > >>> qemu/bootloader/kernel bootstrap code. > >>> Does it make sense to you? Can somebody reproduce what I am seeing? > > >> > >> I did not bring any answer to your question, but at least you know I'm > >> working on it, I'll keep you posted. > >> > >> Thanks for taking the time to setup syzkaller. > >> > >> Alex > >> > >>> Thanks > >>> > >>> _______________________________________________ > >>> linux-riscv mailing list > >>> linux-riscv@lists.infradead.org > >>> http://lists.infradead.org/mailman/listinfo/linux-riscv > >>> > > > > _______________________________________________ > > linux-riscv mailing list > > linux-riscv@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-riscv > > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2021-02-17 17:34 ` Dmitry Vyukov 0 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-02-17 17:34 UTC (permalink / raw) To: Alex Ghiti Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote: > > Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit : > > On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote: > >> > >> Hi Dmitry, > >> > >> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit : > >>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: > >>>> > >>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: > >>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your > >>>>>> issue: I built a kernel on top of the branch riscv/fixes using > >>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config > >>>>>> and Buildroot 2020.11. I have the warnings regarding the use of > >>>>>> __virt_to_phys on wrong addresses (but that's normal since this function > >>>>>> is used in virt_addr_valid) but not the segfaults you describe. > >>>>> > >>>>> Hi Alex, > >>>>> > >>>>> Let me try to rebuild buildroot image. Maybe there was something wrong > >>>>> with my build, though, I did 'make clean' before doing. But at the > >>>>> same time it worked back in June... > >>>>> > >>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a > >>>>> syzbot instance on riscv. If there a WARNING during boot then the > >>>>> kernel will be marked as broken. No further testing will happen. > >>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or > >>>>> replace it with pr_err. > >>>> > >>>> > >>>> Hi, > >>>> > >>>> I've localized one issue with riscv/KASAN: > >>>> KASAN breaks VDSO and that's I think the root cause of weird faults I > >>>> saw earlier. The following patch fixes it. > >>>> Could somebody please upstream this fix? I don't know how to add/run > >>>> tests for this. > >>>> Thanks > >>>> > >>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile > >>>> index 0cfd6da784f84..cf3a383c1799d 100644 > >>>> --- a/arch/riscv/kernel/vdso/Makefile > >>>> +++ b/arch/riscv/kernel/vdso/Makefile > >>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os > >>>> # Disable gcov profiling for VDSO code > >>>> GCOV_PROFILE := n > >>>> KCOV_INSTRUMENT := n > >>>> +KASAN_SANITIZE := n > >>>> > >>>> # Force dependency > >>>> $(obj)/vdso.o: $(obj)/vdso.so > >> > >> What's weird is that I don't have any issue without this patch with the > >> following config whereas it indeed seems required for KASAN. But when > >> looking at the segfaults you got earlier, the segfault address is 0xbb0 > >> and the cause is an instruction page fault: this address is the PLT base > >> address in vdso.so and an instruction page fault would mean that someone > >> tried to jump at this address, which is weird. At first sight, that does > >> not seem related to your patch above, but clearly I may be wrong. > >> > >> Tobias, did you observe the same segfaults as Dmitry ? > > > > > > I noticed that not all buildroot images use VDSO, it seems to be > > dependent on libc settings (at least I think I changed it in the > > past). > > Ok, I used uClibc but then when using glibc, I have the same segfaults, > only when KASAN is enabled. And your patch fixes the problem. I will try > to take a look later to better understand the problem. > > > I also booted an image completely successfully including dhcpd/sshd > > start, but then my executable crashed in clock_gettime. The executable > > was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static" > > (10.2.1). > > > > > >>> Second issue I am seeing seems to be related to text segment size. > >>> I check out v5.11 and use this config: > >>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 > >> > >> This config gave my laptop a hard time ! Finally I was able to boot > >> correctly to userspace, but I realized I used my sv48 branch...Either I > >> fixed your issue along the way or I can't reproduce it, I'll give it a > >> try tomorrow. > > > > Where is your branch? I could also test in my setup on your branch. > > > > You can find my branch int/alex/riscv_kernel_end_of_address_space_v2 > here: https://github.com/AlexGhiti/riscv-linux.git No, it does not work for me. Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD, int/alex/riscv_kernel_end_of_address_space_v2) Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt riscv64-linux-gnu-gcc -v gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1) qemu-system-riscv64 --version QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) qemu-system-riscv64 \ -machine virt -smp 2 -m 2G \ -device virtio-blk-device,drive=hd0 \ -drive file=image-riscv64,if=none,format=raw,id=hd0 \ -kernel arch/riscv/boot/Image \ -nographic \ -device virtio-rng-device,rng=rng0 -object rng-random,filename=/dev/urandom,id=rng0 \ -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device virtio-net-device,netdev=net0 \ -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic panic_on_warn=1 panic=86400 earlycon" OpenSBI v0.8 ____ _____ ____ _____ / __ \ / ____| _ \_ _| | | | |_ __ ___ _ __ | (___ | |_) || | | | | | '_ \ / _ \ '_ \ \___ \| _ < | | | |__| | |_) | __/ | | |____) | |_) || |_ \____/| .__/ \___|_| |_|_____/|____/_____| | | |_| Platform Name : riscv-virtio,qemu Platform Features : timer,mfdeleg Platform HART Count : 2 Boot HART ID : 1 Boot HART ISA : rv64imafdcsu BOOT HART Features : pmp,scounteren,mcounteren,time BOOT HART PMP Count : 16 Firmware Base : 0x80000000 Firmware Size : 104 KB Runtime SBI Version : 0.2 MIDELEG : 0x0000000000000222 MEDELEG : 0x000000000000b109 PMP0 : 0x0000000080000000-0x000000008001ffff (A) no output after this PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X) > Thanks, > > > > >>> Then trying to boot it using: > >>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) > >>> $ qemu-system-riscv64 -machine virt -smp 2 -m 4G ... > >>> > >>> It shows no output from the kernel whatsoever, even though I have > >>> earlycon and output shows very early with other configs. > >>> Kernel boots fine with defconfig and other smaller configs. > >>> > >>> If I enable KASAN_OUTLINE and CC_OPTIMIZE_FOR_SIZE, then this config > >>> also boots fine. Both of these options significantly reduce kernel > >>> size. However, I can also boot the kernel without these 2 configs, if > >>> I disable a whole lot of subsystem configs. This makes me think that > >>> there is an issue related to kernel size somewhere in > >>> qemu/bootloader/kernel bootstrap code. > >>> Does it make sense to you? Can somebody reproduce what I am seeing? > > >> > >> I did not bring any answer to your question, but at least you know I'm > >> working on it, I'll keep you posted. > >> > >> Thanks for taking the time to setup syzkaller. > >> > >> Alex > >> > >>> Thanks > >>> > >>> _______________________________________________ > >>> linux-riscv mailing list > >>> linux-riscv@lists.infradead.org > >>> http://lists.infradead.org/mailman/listinfo/linux-riscv > >>> > > > > _______________________________________________ > > linux-riscv mailing list > > linux-riscv@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-riscv > > _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2021-02-17 17:34 ` Dmitry Vyukov @ 2021-02-18 7:54 ` Alex Ghiti -1 siblings, 0 replies; 55+ messages in thread From: Alex Ghiti @ 2021-02-18 7:54 UTC (permalink / raw) To: Dmitry Vyukov Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv Hi Dmitry, > On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote: >> >> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit : >>> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote: >>>> >>>> Hi Dmitry, >>>> >>>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit : >>>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: >>>>>> >>>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: >>>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your >>>>>>>> issue: I built a kernel on top of the branch riscv/fixes using >>>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config >>>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of >>>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function >>>>>>>> is used in virt_addr_valid) but not the segfaults you describe. >>>>>>> >>>>>>> Hi Alex, >>>>>>> >>>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong >>>>>>> with my build, though, I did 'make clean' before doing. But at the >>>>>>> same time it worked back in June... >>>>>>> >>>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a >>>>>>> syzbot instance on riscv. If there a WARNING during boot then the >>>>>>> kernel will be marked as broken. No further testing will happen. >>>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or >>>>>>> replace it with pr_err. >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> I've localized one issue with riscv/KASAN: >>>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I >>>>>> saw earlier. The following patch fixes it. >>>>>> Could somebody please upstream this fix? I don't know how to add/run >>>>>> tests for this. >>>>>> Thanks >>>>>> >>>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile >>>>>> index 0cfd6da784f84..cf3a383c1799d 100644 >>>>>> --- a/arch/riscv/kernel/vdso/Makefile >>>>>> +++ b/arch/riscv/kernel/vdso/Makefile >>>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os >>>>>> # Disable gcov profiling for VDSO code >>>>>> GCOV_PROFILE := n >>>>>> KCOV_INSTRUMENT := n >>>>>> +KASAN_SANITIZE := n >>>>>> >>>>>> # Force dependency >>>>>> $(obj)/vdso.o: $(obj)/vdso.so >>>> >>>> What's weird is that I don't have any issue without this patch with the >>>> following config whereas it indeed seems required for KASAN. But when >>>> looking at the segfaults you got earlier, the segfault address is 0xbb0 >>>> and the cause is an instruction page fault: this address is the PLT base >>>> address in vdso.so and an instruction page fault would mean that someone >>>> tried to jump at this address, which is weird. At first sight, that does >>>> not seem related to your patch above, but clearly I may be wrong. >>>> >>>> Tobias, did you observe the same segfaults as Dmitry ? >>> >>> >>> I noticed that not all buildroot images use VDSO, it seems to be >>> dependent on libc settings (at least I think I changed it in the >>> past). >> >> Ok, I used uClibc but then when using glibc, I have the same segfaults, >> only when KASAN is enabled. And your patch fixes the problem. I will try >> to take a look later to better understand the problem. >> >>> I also booted an image completely successfully including dhcpd/sshd >>> start, but then my executable crashed in clock_gettime. The executable >>> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static" >>> (10.2.1). >>> >>> >>>>> Second issue I am seeing seems to be related to text segment size. >>>>> I check out v5.11 and use this config: >>>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 >>>> >>>> This config gave my laptop a hard time ! Finally I was able to boot >>>> correctly to userspace, but I realized I used my sv48 branch...Either I >>>> fixed your issue along the way or I can't reproduce it, I'll give it a >>>> try tomorrow. >>> >>> Where is your branch? I could also test in my setup on your branch. >>> >> >> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2 >> here: https://github.com/AlexGhiti/riscv-linux.git > > No, it does not work for me. > > Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD, > int/alex/riscv_kernel_end_of_address_space_v2) > Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt > > riscv64-linux-gnu-gcc -v > gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1) > > qemu-system-riscv64 --version > QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) > > qemu-system-riscv64 \ > -machine virt -smp 2 -m 2G \ > -device virtio-blk-device,drive=hd0 \ > -drive file=image-riscv64,if=none,format=raw,id=hd0 \ > -kernel arch/riscv/boot/Image \ > -nographic \ > -device virtio-rng-device,rng=rng0 -object > rng-random,filename=/dev/urandom,id=rng0 \ > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > virtio-net-device,netdev=net0 \ > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > panic_on_warn=1 panic=86400 earlycon" It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I don't think that changes anything at runtime). But your above command line does not work for me as it appears you do not load any firmware, if I add -bios images/fw_jump.elf, it works. But then I don't know where your opensbi output below comes from... And regarding your issue with calling clock_gettime 'directly' compared to using the syscall, I have the same consistent output from both calls. I have an older gcc (9.3.0) and the same qemu. I think what is missing here is your buildroot config, so that we have the exact same environment: could you post your buildroot config as well ? Thanks, > > OpenSBI v0.8 > ____ _____ ____ _____ > / __ \ / ____| _ \_ _| > | | | |_ __ ___ _ __ | (___ | |_) || | > | | | | '_ \ / _ \ '_ \ \___ \| _ < | | > | |__| | |_) | __/ | | |____) | |_) || |_ > \____/| .__/ \___|_| |_|_____/|____/_____| > | | > |_| > > Platform Name : riscv-virtio,qemu > Platform Features : timer,mfdeleg > Platform HART Count : 2 > Boot HART ID : 1 > Boot HART ISA : rv64imafdcsu > BOOT HART Features : pmp,scounteren,mcounteren,time > BOOT HART PMP Count : 16 > Firmware Base : 0x80000000 > Firmware Size : 104 KB > Runtime SBI Version : 0.2 > > MIDELEG : 0x0000000000000222 > MEDELEG : 0x000000000000b109 > PMP0 : 0x0000000080000000-0x000000008001ffff (A)OpenSBI v0.6 > > > no output after this > PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X) > > > >> Thanks, >> >>> >>>>> Then trying to boot it using: >>>>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) >>>>> $ qemu-system-riscv64 -machine virt -smp 2 -m 4G ... >>>>> >>>>> It shows no output from the kernel whatsoever, even though I have >>>>> earlycon and output shows very early with other configs. >>>>> Kernel boots fine with defconfig and other smaller configs. >>>>> >>>>> If I enable KASAN_OUTLINE and CC_OPTIMIZE_FOR_SIZE, then this config >>>>> also boots fine. Both of these options significantly reduce kernel >>>>> size. However, I can also boot the kernel without these 2 configs, if >>>>> I disable a whole lot of subsystem configs. This makes me think that >>>>> there is an issue related to kernel size somewhere in >>>>> qemu/bootloader/kernel bootstrap code. >>>>> Does it make sense to you? Can somebody reproduce what I am seeing? > >>>> >>>> I did not bring any answer to your question, but at least you know I'm >>>> working on it, I'll keep you posted. >>>> >>>> Thanks for taking the time to setup syzkaller. >>>> >>>> Alex >>>> >>>>> Thanks >>>>> >>>>> _______________________________________________ >>>>> linux-riscv mailing list >>>>> linux-riscv@lists.infradead.org >>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv >>>>> >>> >>> _______________________________________________ >>> linux-riscv mailing list >>> linux-riscv@lists.infradead.org >>> http://lists.infradead.org/mailman/listinfo/linux-riscv >>> > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2021-02-18 7:54 ` Alex Ghiti 0 siblings, 0 replies; 55+ messages in thread From: Alex Ghiti @ 2021-02-18 7:54 UTC (permalink / raw) To: Dmitry Vyukov Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv Hi Dmitry, > On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote: >> >> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit : >>> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote: >>>> >>>> Hi Dmitry, >>>> >>>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit : >>>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: >>>>>> >>>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: >>>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your >>>>>>>> issue: I built a kernel on top of the branch riscv/fixes using >>>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config >>>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of >>>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function >>>>>>>> is used in virt_addr_valid) but not the segfaults you describe. >>>>>>> >>>>>>> Hi Alex, >>>>>>> >>>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong >>>>>>> with my build, though, I did 'make clean' before doing. But at the >>>>>>> same time it worked back in June... >>>>>>> >>>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a >>>>>>> syzbot instance on riscv. If there a WARNING during boot then the >>>>>>> kernel will be marked as broken. No further testing will happen. >>>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or >>>>>>> replace it with pr_err. >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> I've localized one issue with riscv/KASAN: >>>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I >>>>>> saw earlier. The following patch fixes it. >>>>>> Could somebody please upstream this fix? I don't know how to add/run >>>>>> tests for this. >>>>>> Thanks >>>>>> >>>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile >>>>>> index 0cfd6da784f84..cf3a383c1799d 100644 >>>>>> --- a/arch/riscv/kernel/vdso/Makefile >>>>>> +++ b/arch/riscv/kernel/vdso/Makefile >>>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os >>>>>> # Disable gcov profiling for VDSO code >>>>>> GCOV_PROFILE := n >>>>>> KCOV_INSTRUMENT := n >>>>>> +KASAN_SANITIZE := n >>>>>> >>>>>> # Force dependency >>>>>> $(obj)/vdso.o: $(obj)/vdso.so >>>> >>>> What's weird is that I don't have any issue without this patch with the >>>> following config whereas it indeed seems required for KASAN. But when >>>> looking at the segfaults you got earlier, the segfault address is 0xbb0 >>>> and the cause is an instruction page fault: this address is the PLT base >>>> address in vdso.so and an instruction page fault would mean that someone >>>> tried to jump at this address, which is weird. At first sight, that does >>>> not seem related to your patch above, but clearly I may be wrong. >>>> >>>> Tobias, did you observe the same segfaults as Dmitry ? >>> >>> >>> I noticed that not all buildroot images use VDSO, it seems to be >>> dependent on libc settings (at least I think I changed it in the >>> past). >> >> Ok, I used uClibc but then when using glibc, I have the same segfaults, >> only when KASAN is enabled. And your patch fixes the problem. I will try >> to take a look later to better understand the problem. >> >>> I also booted an image completely successfully including dhcpd/sshd >>> start, but then my executable crashed in clock_gettime. The executable >>> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static" >>> (10.2.1). >>> >>> >>>>> Second issue I am seeing seems to be related to text segment size. >>>>> I check out v5.11 and use this config: >>>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 >>>> >>>> This config gave my laptop a hard time ! Finally I was able to boot >>>> correctly to userspace, but I realized I used my sv48 branch...Either I >>>> fixed your issue along the way or I can't reproduce it, I'll give it a >>>> try tomorrow. >>> >>> Where is your branch? I could also test in my setup on your branch. >>> >> >> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2 >> here: https://github.com/AlexGhiti/riscv-linux.git > > No, it does not work for me. > > Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD, > int/alex/riscv_kernel_end_of_address_space_v2) > Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt > > riscv64-linux-gnu-gcc -v > gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1) > > qemu-system-riscv64 --version > QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) > > qemu-system-riscv64 \ > -machine virt -smp 2 -m 2G \ > -device virtio-blk-device,drive=hd0 \ > -drive file=image-riscv64,if=none,format=raw,id=hd0 \ > -kernel arch/riscv/boot/Image \ > -nographic \ > -device virtio-rng-device,rng=rng0 -object > rng-random,filename=/dev/urandom,id=rng0 \ > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > virtio-net-device,netdev=net0 \ > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > panic_on_warn=1 panic=86400 earlycon" It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I don't think that changes anything at runtime). But your above command line does not work for me as it appears you do not load any firmware, if I add -bios images/fw_jump.elf, it works. But then I don't know where your opensbi output below comes from... And regarding your issue with calling clock_gettime 'directly' compared to using the syscall, I have the same consistent output from both calls. I have an older gcc (9.3.0) and the same qemu. I think what is missing here is your buildroot config, so that we have the exact same environment: could you post your buildroot config as well ? Thanks, > > OpenSBI v0.8 > ____ _____ ____ _____ > / __ \ / ____| _ \_ _| > | | | |_ __ ___ _ __ | (___ | |_) || | > | | | | '_ \ / _ \ '_ \ \___ \| _ < | | > | |__| | |_) | __/ | | |____) | |_) || |_ > \____/| .__/ \___|_| |_|_____/|____/_____| > | | > |_| > > Platform Name : riscv-virtio,qemu > Platform Features : timer,mfdeleg > Platform HART Count : 2 > Boot HART ID : 1 > Boot HART ISA : rv64imafdcsu > BOOT HART Features : pmp,scounteren,mcounteren,time > BOOT HART PMP Count : 16 > Firmware Base : 0x80000000 > Firmware Size : 104 KB > Runtime SBI Version : 0.2 > > MIDELEG : 0x0000000000000222 > MEDELEG : 0x000000000000b109 > PMP0 : 0x0000000080000000-0x000000008001ffff (A)OpenSBI v0.6 > > > no output after this > PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X) > > > >> Thanks, >> >>> >>>>> Then trying to boot it using: >>>>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) >>>>> $ qemu-system-riscv64 -machine virt -smp 2 -m 4G ... >>>>> >>>>> It shows no output from the kernel whatsoever, even though I have >>>>> earlycon and output shows very early with other configs. >>>>> Kernel boots fine with defconfig and other smaller configs. >>>>> >>>>> If I enable KASAN_OUTLINE and CC_OPTIMIZE_FOR_SIZE, then this config >>>>> also boots fine. Both of these options significantly reduce kernel >>>>> size. However, I can also boot the kernel without these 2 configs, if >>>>> I disable a whole lot of subsystem configs. This makes me think that >>>>> there is an issue related to kernel size somewhere in >>>>> qemu/bootloader/kernel bootstrap code. >>>>> Does it make sense to you? Can somebody reproduce what I am seeing? > >>>> >>>> I did not bring any answer to your question, but at least you know I'm >>>> working on it, I'll keep you posted. >>>> >>>> Thanks for taking the time to setup syzkaller. >>>> >>>> Alex >>>> >>>>> Thanks >>>>> >>>>> _______________________________________________ >>>>> linux-riscv mailing list >>>>> linux-riscv@lists.infradead.org >>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv >>>>> >>> >>> _______________________________________________ >>> linux-riscv mailing list >>> linux-riscv@lists.infradead.org >>> http://lists.infradead.org/mailman/listinfo/linux-riscv >>> > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv > _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2021-02-18 7:54 ` Alex Ghiti @ 2021-02-18 11:36 ` Dmitry Vyukov -1 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-02-18 11:36 UTC (permalink / raw) To: Alex Ghiti Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv On Thu, Feb 18, 2021 at 8:54 AM Alex Ghiti <alex@ghiti.fr> wrote: > > Hi Dmitry, > > > On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote: > >> > >> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit : > >>> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote: > >>>> > >>>> Hi Dmitry, > >>>> > >>>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit : > >>>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: > >>>>>> > >>>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: > >>>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your > >>>>>>>> issue: I built a kernel on top of the branch riscv/fixes using > >>>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config > >>>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of > >>>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function > >>>>>>>> is used in virt_addr_valid) but not the segfaults you describe. > >>>>>>> > >>>>>>> Hi Alex, > >>>>>>> > >>>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong > >>>>>>> with my build, though, I did 'make clean' before doing. But at the > >>>>>>> same time it worked back in June... > >>>>>>> > >>>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a > >>>>>>> syzbot instance on riscv. If there a WARNING during boot then the > >>>>>>> kernel will be marked as broken. No further testing will happen. > >>>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or > >>>>>>> replace it with pr_err. > >>>>>> > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> I've localized one issue with riscv/KASAN: > >>>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I > >>>>>> saw earlier. The following patch fixes it. > >>>>>> Could somebody please upstream this fix? I don't know how to add/run > >>>>>> tests for this. > >>>>>> Thanks > >>>>>> > >>>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile > >>>>>> index 0cfd6da784f84..cf3a383c1799d 100644 > >>>>>> --- a/arch/riscv/kernel/vdso/Makefile > >>>>>> +++ b/arch/riscv/kernel/vdso/Makefile > >>>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os > >>>>>> # Disable gcov profiling for VDSO code > >>>>>> GCOV_PROFILE := n > >>>>>> KCOV_INSTRUMENT := n > >>>>>> +KASAN_SANITIZE := n > >>>>>> > >>>>>> # Force dependency > >>>>>> $(obj)/vdso.o: $(obj)/vdso.so > >>>> > >>>> What's weird is that I don't have any issue without this patch with the > >>>> following config whereas it indeed seems required for KASAN. But when > >>>> looking at the segfaults you got earlier, the segfault address is 0xbb0 > >>>> and the cause is an instruction page fault: this address is the PLT base > >>>> address in vdso.so and an instruction page fault would mean that someone > >>>> tried to jump at this address, which is weird. At first sight, that does > >>>> not seem related to your patch above, but clearly I may be wrong. > >>>> > >>>> Tobias, did you observe the same segfaults as Dmitry ? > >>> > >>> > >>> I noticed that not all buildroot images use VDSO, it seems to be > >>> dependent on libc settings (at least I think I changed it in the > >>> past). > >> > >> Ok, I used uClibc but then when using glibc, I have the same segfaults, > >> only when KASAN is enabled. And your patch fixes the problem. I will try > >> to take a look later to better understand the problem. > >> > >>> I also booted an image completely successfully including dhcpd/sshd > >>> start, but then my executable crashed in clock_gettime. The executable > >>> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static" > >>> (10.2.1). > >>> > >>> > >>>>> Second issue I am seeing seems to be related to text segment size. > >>>>> I check out v5.11 and use this config: > >>>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 > >>>> > >>>> This config gave my laptop a hard time ! Finally I was able to boot > >>>> correctly to userspace, but I realized I used my sv48 branch...Either I > >>>> fixed your issue along the way or I can't reproduce it, I'll give it a > >>>> try tomorrow. > >>> > >>> Where is your branch? I could also test in my setup on your branch. > >>> > >> > >> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2 > >> here: https://github.com/AlexGhiti/riscv-linux.git > > > > No, it does not work for me. > > > > Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD, > > int/alex/riscv_kernel_end_of_address_space_v2) > > Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt > > > > riscv64-linux-gnu-gcc -v > > gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1) > > > > qemu-system-riscv64 --version > > QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) > > > > qemu-system-riscv64 \ > > -machine virt -smp 2 -m 2G \ > > -device virtio-blk-device,drive=hd0 \ > > -drive file=image-riscv64,if=none,format=raw,id=hd0 \ > > -kernel arch/riscv/boot/Image \ > > -nographic \ > > -device virtio-rng-device,rng=rng0 -object > > rng-random,filename=/dev/urandom,id=rng0 \ > > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > > virtio-net-device,netdev=net0 \ > > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > > panic_on_warn=1 panic=86400 earlycon" > > It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I > don't think that changes anything at runtime). But your above command > line does not work for me as it appears you do not load any firmware, if > I add -bios images/fw_jump.elf, it works. But then I don't know where > your opensbi output below comes from... > > And regarding your issue with calling clock_gettime 'directly' compared > to using the syscall, I have the same consistent output from both calls. > > I have an older gcc (9.3.0) and the same qemu. I think what is missing > here is your buildroot config, so that we have the exact same > environment: could you post your buildroot config as well ? I don't think the image is relevant because I don't even get to kernel code. If the kernel will complain about no init later, that's fine. Re bios, this version of qemu already has OpenSBI bios builtin, you can pass -bios default, but that's, well, the default :) Here are more reproducible repro instructions that capture gcc and qemu. I think gcc version may be potentially relevant as I suspect code size. curl https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt > $KERNEL_SRC/.config docker pull gcr.io/syzkaller/syzbot docker run -it -v $KERNEL_SRC:/kernel gcr.io/syzkaller/syzbot cd /kernel make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- olddefconfig make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial console=ttyS0" [this does not, only OpenSBI output] scripts/config -d KASAN_INLINE -e KASAN_OUTLINE -d CC_OPTIMIZE_FOR_PERFORMANCE -e CC_OPTIMIZE_FOR_SIZE make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial console=ttyS0" [this boots fine, at least at to starting init process] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2021-02-18 11:36 ` Dmitry Vyukov 0 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-02-18 11:36 UTC (permalink / raw) To: Alex Ghiti Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv On Thu, Feb 18, 2021 at 8:54 AM Alex Ghiti <alex@ghiti.fr> wrote: > > Hi Dmitry, > > > On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote: > >> > >> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit : > >>> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote: > >>>> > >>>> Hi Dmitry, > >>>> > >>>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit : > >>>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: > >>>>>> > >>>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: > >>>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your > >>>>>>>> issue: I built a kernel on top of the branch riscv/fixes using > >>>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config > >>>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of > >>>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function > >>>>>>>> is used in virt_addr_valid) but not the segfaults you describe. > >>>>>>> > >>>>>>> Hi Alex, > >>>>>>> > >>>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong > >>>>>>> with my build, though, I did 'make clean' before doing. But at the > >>>>>>> same time it worked back in June... > >>>>>>> > >>>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a > >>>>>>> syzbot instance on riscv. If there a WARNING during boot then the > >>>>>>> kernel will be marked as broken. No further testing will happen. > >>>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or > >>>>>>> replace it with pr_err. > >>>>>> > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> I've localized one issue with riscv/KASAN: > >>>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I > >>>>>> saw earlier. The following patch fixes it. > >>>>>> Could somebody please upstream this fix? I don't know how to add/run > >>>>>> tests for this. > >>>>>> Thanks > >>>>>> > >>>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile > >>>>>> index 0cfd6da784f84..cf3a383c1799d 100644 > >>>>>> --- a/arch/riscv/kernel/vdso/Makefile > >>>>>> +++ b/arch/riscv/kernel/vdso/Makefile > >>>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os > >>>>>> # Disable gcov profiling for VDSO code > >>>>>> GCOV_PROFILE := n > >>>>>> KCOV_INSTRUMENT := n > >>>>>> +KASAN_SANITIZE := n > >>>>>> > >>>>>> # Force dependency > >>>>>> $(obj)/vdso.o: $(obj)/vdso.so > >>>> > >>>> What's weird is that I don't have any issue without this patch with the > >>>> following config whereas it indeed seems required for KASAN. But when > >>>> looking at the segfaults you got earlier, the segfault address is 0xbb0 > >>>> and the cause is an instruction page fault: this address is the PLT base > >>>> address in vdso.so and an instruction page fault would mean that someone > >>>> tried to jump at this address, which is weird. At first sight, that does > >>>> not seem related to your patch above, but clearly I may be wrong. > >>>> > >>>> Tobias, did you observe the same segfaults as Dmitry ? > >>> > >>> > >>> I noticed that not all buildroot images use VDSO, it seems to be > >>> dependent on libc settings (at least I think I changed it in the > >>> past). > >> > >> Ok, I used uClibc but then when using glibc, I have the same segfaults, > >> only when KASAN is enabled. And your patch fixes the problem. I will try > >> to take a look later to better understand the problem. > >> > >>> I also booted an image completely successfully including dhcpd/sshd > >>> start, but then my executable crashed in clock_gettime. The executable > >>> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static" > >>> (10.2.1). > >>> > >>> > >>>>> Second issue I am seeing seems to be related to text segment size. > >>>>> I check out v5.11 and use this config: > >>>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 > >>>> > >>>> This config gave my laptop a hard time ! Finally I was able to boot > >>>> correctly to userspace, but I realized I used my sv48 branch...Either I > >>>> fixed your issue along the way or I can't reproduce it, I'll give it a > >>>> try tomorrow. > >>> > >>> Where is your branch? I could also test in my setup on your branch. > >>> > >> > >> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2 > >> here: https://github.com/AlexGhiti/riscv-linux.git > > > > No, it does not work for me. > > > > Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD, > > int/alex/riscv_kernel_end_of_address_space_v2) > > Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt > > > > riscv64-linux-gnu-gcc -v > > gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1) > > > > qemu-system-riscv64 --version > > QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) > > > > qemu-system-riscv64 \ > > -machine virt -smp 2 -m 2G \ > > -device virtio-blk-device,drive=hd0 \ > > -drive file=image-riscv64,if=none,format=raw,id=hd0 \ > > -kernel arch/riscv/boot/Image \ > > -nographic \ > > -device virtio-rng-device,rng=rng0 -object > > rng-random,filename=/dev/urandom,id=rng0 \ > > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > > virtio-net-device,netdev=net0 \ > > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > > panic_on_warn=1 panic=86400 earlycon" > > It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I > don't think that changes anything at runtime). But your above command > line does not work for me as it appears you do not load any firmware, if > I add -bios images/fw_jump.elf, it works. But then I don't know where > your opensbi output below comes from... > > And regarding your issue with calling clock_gettime 'directly' compared > to using the syscall, I have the same consistent output from both calls. > > I have an older gcc (9.3.0) and the same qemu. I think what is missing > here is your buildroot config, so that we have the exact same > environment: could you post your buildroot config as well ? I don't think the image is relevant because I don't even get to kernel code. If the kernel will complain about no init later, that's fine. Re bios, this version of qemu already has OpenSBI bios builtin, you can pass -bios default, but that's, well, the default :) Here are more reproducible repro instructions that capture gcc and qemu. I think gcc version may be potentially relevant as I suspect code size. curl https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt > $KERNEL_SRC/.config docker pull gcr.io/syzkaller/syzbot docker run -it -v $KERNEL_SRC:/kernel gcr.io/syzkaller/syzbot cd /kernel make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- olddefconfig make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial console=ttyS0" [this does not, only OpenSBI output] scripts/config -d KASAN_INLINE -e KASAN_OUTLINE -d CC_OPTIMIZE_FOR_PERFORMANCE -e CC_OPTIMIZE_FOR_SIZE make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial console=ttyS0" [this boots fine, at least at to starting init process] _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2021-02-18 11:36 ` Dmitry Vyukov @ 2021-02-19 17:01 ` Alex Ghiti -1 siblings, 0 replies; 55+ messages in thread From: Alex Ghiti @ 2021-02-19 17:01 UTC (permalink / raw) To: Dmitry Vyukov Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv Hi Dmitry, Le 2/18/21 à 6:36 AM, Dmitry Vyukov a écrit : > On Thu, Feb 18, 2021 at 8:54 AM Alex Ghiti <alex@ghiti.fr> wrote: >> >> Hi Dmitry, >> >>> On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote: >>>> >>>> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit : >>>>> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote: >>>>>> >>>>>> Hi Dmitry, >>>>>> >>>>>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit : >>>>>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: >>>>>>>> >>>>>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: >>>>>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your >>>>>>>>>> issue: I built a kernel on top of the branch riscv/fixes using >>>>>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config >>>>>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of >>>>>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function >>>>>>>>>> is used in virt_addr_valid) but not the segfaults you describe. >>>>>>>>> >>>>>>>>> Hi Alex, >>>>>>>>> >>>>>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong >>>>>>>>> with my build, though, I did 'make clean' before doing. But at the >>>>>>>>> same time it worked back in June... >>>>>>>>> >>>>>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a >>>>>>>>> syzbot instance on riscv. If there a WARNING during boot then the >>>>>>>>> kernel will be marked as broken. No further testing will happen. >>>>>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or >>>>>>>>> replace it with pr_err. >>>>>>>> >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I've localized one issue with riscv/KASAN: >>>>>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I >>>>>>>> saw earlier. The following patch fixes it. >>>>>>>> Could somebody please upstream this fix? I don't know how to add/run >>>>>>>> tests for this. >>>>>>>> Thanks >>>>>>>> >>>>>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile >>>>>>>> index 0cfd6da784f84..cf3a383c1799d 100644 >>>>>>>> --- a/arch/riscv/kernel/vdso/Makefile >>>>>>>> +++ b/arch/riscv/kernel/vdso/Makefile >>>>>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os >>>>>>>> # Disable gcov profiling for VDSO code >>>>>>>> GCOV_PROFILE := n >>>>>>>> KCOV_INSTRUMENT := n >>>>>>>> +KASAN_SANITIZE := n >>>>>>>> >>>>>>>> # Force dependency >>>>>>>> $(obj)/vdso.o: $(obj)/vdso.so >>>>>> >>>>>> What's weird is that I don't have any issue without this patch with the >>>>>> following config whereas it indeed seems required for KASAN. But when >>>>>> looking at the segfaults you got earlier, the segfault address is 0xbb0 >>>>>> and the cause is an instruction page fault: this address is the PLT base >>>>>> address in vdso.so and an instruction page fault would mean that someone >>>>>> tried to jump at this address, which is weird. At first sight, that does >>>>>> not seem related to your patch above, but clearly I may be wrong. >>>>>> >>>>>> Tobias, did you observe the same segfaults as Dmitry ? >>>>> >>>>> >>>>> I noticed that not all buildroot images use VDSO, it seems to be >>>>> dependent on libc settings (at least I think I changed it in the >>>>> past). >>>> >>>> Ok, I used uClibc but then when using glibc, I have the same segfaults, >>>> only when KASAN is enabled. And your patch fixes the problem. I will try >>>> to take a look later to better understand the problem. >>>> >>>>> I also booted an image completely successfully including dhcpd/sshd >>>>> start, but then my executable crashed in clock_gettime. The executable >>>>> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static" >>>>> (10.2.1). >>>>> >>>>> >>>>>>> Second issue I am seeing seems to be related to text segment size. >>>>>>> I check out v5.11 and use this config: >>>>>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 >>>>>> >>>>>> This config gave my laptop a hard time ! Finally I was able to boot >>>>>> correctly to userspace, but I realized I used my sv48 branch...Either I >>>>>> fixed your issue along the way or I can't reproduce it, I'll give it a >>>>>> try tomorrow. >>>>> >>>>> Where is your branch? I could also test in my setup on your branch. >>>>> >>>> >>>> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2 >>>> here: https://github.com/AlexGhiti/riscv-linux.git >>> >>> No, it does not work for me. >>> >>> Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD, >>> int/alex/riscv_kernel_end_of_address_space_v2) >>> Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt >>> >>> riscv64-linux-gnu-gcc -v >>> gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1) >>> >>> qemu-system-riscv64 --version >>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) >>> >>> qemu-system-riscv64 \ >>> -machine virt -smp 2 -m 2G \ >>> -device virtio-blk-device,drive=hd0 \ >>> -drive file=image-riscv64,if=none,format=raw,id=hd0 \ >>> -kernel arch/riscv/boot/Image \ >>> -nographic \ >>> -device virtio-rng-device,rng=rng0 -object >>> rng-random,filename=/dev/urandom,id=rng0 \ >>> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device >>> virtio-net-device,netdev=net0 \ >>> -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic >>> panic_on_warn=1 panic=86400 earlycon" >> >> It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I >> don't think that changes anything at runtime). But your above command >> line does not work for me as it appears you do not load any firmware, if >> I add -bios images/fw_jump.elf, it works. But then I don't know where >> your opensbi output below comes from... >> >> And regarding your issue with calling clock_gettime 'directly' compared >> to using the syscall, I have the same consistent output from both calls. >> >> I have an older gcc (9.3.0) and the same qemu. I think what is missing >> here is your buildroot config, so that we have the exact same >> environment: could you post your buildroot config as well ? > > I don't think the image is relevant because I don't even get to kernel > code. If the kernel will complain about no init later, that's fine. > Re bios, this version of qemu already has OpenSBI bios builtin, you > can pass -bios default, but that's, well, the default :) > Here are more reproducible repro instructions that capture gcc and > qemu. I think gcc version may be potentially relevant as I suspect > code size. > > > curl https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt >> $KERNEL_SRC/.config > docker pull gcr.io/syzkaller/syzbot > docker run -it -v $KERNEL_SRC:/kernel gcr.io/syzkaller/syzbot > cd /kernel > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- olddefconfig > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- > qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel > arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial > console=ttyS0" > [this does not, only OpenSBI output] > Indeed the issue was code size, please find the fix below. I will send a proper patch once I made sure the fix is the right one, but I'm pretty confident, there's no reason to limit the mapping size to 128MB whereas we have a whole pgdir. diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index 9b0592b11a9f..ff2495707edb 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -287,7 +287,7 @@ pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_bss; pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss; pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss; -#define MAX_EARLY_MAPPING_SIZE SZ_128M +#define MAX_EARLY_MAPPING_SIZE PGDIR_SIZE pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE); -- 2.20.1 Thanks, Alex > scripts/config -d KASAN_INLINE -e KASAN_OUTLINE -d > CC_OPTIMIZE_FOR_PERFORMANCE -e CC_OPTIMIZE_FOR_SIZE > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- > qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel > arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial > console=ttyS0" > [this boots fine, at least at to starting init process] > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv > ^ permalink raw reply related [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2021-02-19 17:01 ` Alex Ghiti 0 siblings, 0 replies; 55+ messages in thread From: Alex Ghiti @ 2021-02-19 17:01 UTC (permalink / raw) To: Dmitry Vyukov Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv Hi Dmitry, Le 2/18/21 à 6:36 AM, Dmitry Vyukov a écrit : > On Thu, Feb 18, 2021 at 8:54 AM Alex Ghiti <alex@ghiti.fr> wrote: >> >> Hi Dmitry, >> >>> On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote: >>>> >>>> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit : >>>>> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote: >>>>>> >>>>>> Hi Dmitry, >>>>>> >>>>>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit : >>>>>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: >>>>>>>> >>>>>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: >>>>>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your >>>>>>>>>> issue: I built a kernel on top of the branch riscv/fixes using >>>>>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config >>>>>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of >>>>>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function >>>>>>>>>> is used in virt_addr_valid) but not the segfaults you describe. >>>>>>>>> >>>>>>>>> Hi Alex, >>>>>>>>> >>>>>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong >>>>>>>>> with my build, though, I did 'make clean' before doing. But at the >>>>>>>>> same time it worked back in June... >>>>>>>>> >>>>>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a >>>>>>>>> syzbot instance on riscv. If there a WARNING during boot then the >>>>>>>>> kernel will be marked as broken. No further testing will happen. >>>>>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or >>>>>>>>> replace it with pr_err. >>>>>>>> >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I've localized one issue with riscv/KASAN: >>>>>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I >>>>>>>> saw earlier. The following patch fixes it. >>>>>>>> Could somebody please upstream this fix? I don't know how to add/run >>>>>>>> tests for this. >>>>>>>> Thanks >>>>>>>> >>>>>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile >>>>>>>> index 0cfd6da784f84..cf3a383c1799d 100644 >>>>>>>> --- a/arch/riscv/kernel/vdso/Makefile >>>>>>>> +++ b/arch/riscv/kernel/vdso/Makefile >>>>>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os >>>>>>>> # Disable gcov profiling for VDSO code >>>>>>>> GCOV_PROFILE := n >>>>>>>> KCOV_INSTRUMENT := n >>>>>>>> +KASAN_SANITIZE := n >>>>>>>> >>>>>>>> # Force dependency >>>>>>>> $(obj)/vdso.o: $(obj)/vdso.so >>>>>> >>>>>> What's weird is that I don't have any issue without this patch with the >>>>>> following config whereas it indeed seems required for KASAN. But when >>>>>> looking at the segfaults you got earlier, the segfault address is 0xbb0 >>>>>> and the cause is an instruction page fault: this address is the PLT base >>>>>> address in vdso.so and an instruction page fault would mean that someone >>>>>> tried to jump at this address, which is weird. At first sight, that does >>>>>> not seem related to your patch above, but clearly I may be wrong. >>>>>> >>>>>> Tobias, did you observe the same segfaults as Dmitry ? >>>>> >>>>> >>>>> I noticed that not all buildroot images use VDSO, it seems to be >>>>> dependent on libc settings (at least I think I changed it in the >>>>> past). >>>> >>>> Ok, I used uClibc but then when using glibc, I have the same segfaults, >>>> only when KASAN is enabled. And your patch fixes the problem. I will try >>>> to take a look later to better understand the problem. >>>> >>>>> I also booted an image completely successfully including dhcpd/sshd >>>>> start, but then my executable crashed in clock_gettime. The executable >>>>> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static" >>>>> (10.2.1). >>>>> >>>>> >>>>>>> Second issue I am seeing seems to be related to text segment size. >>>>>>> I check out v5.11 and use this config: >>>>>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 >>>>>> >>>>>> This config gave my laptop a hard time ! Finally I was able to boot >>>>>> correctly to userspace, but I realized I used my sv48 branch...Either I >>>>>> fixed your issue along the way or I can't reproduce it, I'll give it a >>>>>> try tomorrow. >>>>> >>>>> Where is your branch? I could also test in my setup on your branch. >>>>> >>>> >>>> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2 >>>> here: https://github.com/AlexGhiti/riscv-linux.git >>> >>> No, it does not work for me. >>> >>> Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD, >>> int/alex/riscv_kernel_end_of_address_space_v2) >>> Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt >>> >>> riscv64-linux-gnu-gcc -v >>> gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1) >>> >>> qemu-system-riscv64 --version >>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) >>> >>> qemu-system-riscv64 \ >>> -machine virt -smp 2 -m 2G \ >>> -device virtio-blk-device,drive=hd0 \ >>> -drive file=image-riscv64,if=none,format=raw,id=hd0 \ >>> -kernel arch/riscv/boot/Image \ >>> -nographic \ >>> -device virtio-rng-device,rng=rng0 -object >>> rng-random,filename=/dev/urandom,id=rng0 \ >>> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device >>> virtio-net-device,netdev=net0 \ >>> -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic >>> panic_on_warn=1 panic=86400 earlycon" >> >> It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I >> don't think that changes anything at runtime). But your above command >> line does not work for me as it appears you do not load any firmware, if >> I add -bios images/fw_jump.elf, it works. But then I don't know where >> your opensbi output below comes from... >> >> And regarding your issue with calling clock_gettime 'directly' compared >> to using the syscall, I have the same consistent output from both calls. >> >> I have an older gcc (9.3.0) and the same qemu. I think what is missing >> here is your buildroot config, so that we have the exact same >> environment: could you post your buildroot config as well ? > > I don't think the image is relevant because I don't even get to kernel > code. If the kernel will complain about no init later, that's fine. > Re bios, this version of qemu already has OpenSBI bios builtin, you > can pass -bios default, but that's, well, the default :) > Here are more reproducible repro instructions that capture gcc and > qemu. I think gcc version may be potentially relevant as I suspect > code size. > > > curl https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt >> $KERNEL_SRC/.config > docker pull gcr.io/syzkaller/syzbot > docker run -it -v $KERNEL_SRC:/kernel gcr.io/syzkaller/syzbot > cd /kernel > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- olddefconfig > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- > qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel > arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial > console=ttyS0" > [this does not, only OpenSBI output] > Indeed the issue was code size, please find the fix below. I will send a proper patch once I made sure the fix is the right one, but I'm pretty confident, there's no reason to limit the mapping size to 128MB whereas we have a whole pgdir. diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index 9b0592b11a9f..ff2495707edb 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -287,7 +287,7 @@ pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_bss; pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss; pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss; -#define MAX_EARLY_MAPPING_SIZE SZ_128M +#define MAX_EARLY_MAPPING_SIZE PGDIR_SIZE pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE); -- 2.20.1 Thanks, Alex > scripts/config -d KASAN_INLINE -e KASAN_OUTLINE -d > CC_OPTIMIZE_FOR_PERFORMANCE -e CC_OPTIMIZE_FOR_SIZE > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- > qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel > arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial > console=ttyS0" > [this boots fine, at least at to starting init process] > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv > _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2021-02-19 17:01 ` Alex Ghiti @ 2021-02-19 18:53 ` Dmitry Vyukov -1 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-02-19 18:53 UTC (permalink / raw) To: Alex Ghiti Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv On Fri, Feb 19, 2021 at 6:01 PM Alex Ghiti <alex@ghiti.fr> wrote: > > Hi Dmitry, > > Le 2/18/21 à 6:36 AM, Dmitry Vyukov a écrit : > > On Thu, Feb 18, 2021 at 8:54 AM Alex Ghiti <alex@ghiti.fr> wrote: > >> > >> Hi Dmitry, > >> > >>> On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote: > >>>> > >>>> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit : > >>>>> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote: > >>>>>> > >>>>>> Hi Dmitry, > >>>>>> > >>>>>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit : > >>>>>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: > >>>>>>>> > >>>>>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: > >>>>>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your > >>>>>>>>>> issue: I built a kernel on top of the branch riscv/fixes using > >>>>>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config > >>>>>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of > >>>>>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function > >>>>>>>>>> is used in virt_addr_valid) but not the segfaults you describe. > >>>>>>>>> > >>>>>>>>> Hi Alex, > >>>>>>>>> > >>>>>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong > >>>>>>>>> with my build, though, I did 'make clean' before doing. But at the > >>>>>>>>> same time it worked back in June... > >>>>>>>>> > >>>>>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a > >>>>>>>>> syzbot instance on riscv. If there a WARNING during boot then the > >>>>>>>>> kernel will be marked as broken. No further testing will happen. > >>>>>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or > >>>>>>>>> replace it with pr_err. > >>>>>>>> > >>>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> I've localized one issue with riscv/KASAN: > >>>>>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I > >>>>>>>> saw earlier. The following patch fixes it. > >>>>>>>> Could somebody please upstream this fix? I don't know how to add/run > >>>>>>>> tests for this. > >>>>>>>> Thanks > >>>>>>>> > >>>>>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile > >>>>>>>> index 0cfd6da784f84..cf3a383c1799d 100644 > >>>>>>>> --- a/arch/riscv/kernel/vdso/Makefile > >>>>>>>> +++ b/arch/riscv/kernel/vdso/Makefile > >>>>>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os > >>>>>>>> # Disable gcov profiling for VDSO code > >>>>>>>> GCOV_PROFILE := n > >>>>>>>> KCOV_INSTRUMENT := n > >>>>>>>> +KASAN_SANITIZE := n > >>>>>>>> > >>>>>>>> # Force dependency > >>>>>>>> $(obj)/vdso.o: $(obj)/vdso.so > >>>>>> > >>>>>> What's weird is that I don't have any issue without this patch with the > >>>>>> following config whereas it indeed seems required for KASAN. But when > >>>>>> looking at the segfaults you got earlier, the segfault address is 0xbb0 > >>>>>> and the cause is an instruction page fault: this address is the PLT base > >>>>>> address in vdso.so and an instruction page fault would mean that someone > >>>>>> tried to jump at this address, which is weird. At first sight, that does > >>>>>> not seem related to your patch above, but clearly I may be wrong. > >>>>>> > >>>>>> Tobias, did you observe the same segfaults as Dmitry ? > >>>>> > >>>>> > >>>>> I noticed that not all buildroot images use VDSO, it seems to be > >>>>> dependent on libc settings (at least I think I changed it in the > >>>>> past). > >>>> > >>>> Ok, I used uClibc but then when using glibc, I have the same segfaults, > >>>> only when KASAN is enabled. And your patch fixes the problem. I will try > >>>> to take a look later to better understand the problem. > >>>> > >>>>> I also booted an image completely successfully including dhcpd/sshd > >>>>> start, but then my executable crashed in clock_gettime. The executable > >>>>> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static" > >>>>> (10.2.1). > >>>>> > >>>>> > >>>>>>> Second issue I am seeing seems to be related to text segment size. > >>>>>>> I check out v5.11 and use this config: > >>>>>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 > >>>>>> > >>>>>> This config gave my laptop a hard time ! Finally I was able to boot > >>>>>> correctly to userspace, but I realized I used my sv48 branch...Either I > >>>>>> fixed your issue along the way or I can't reproduce it, I'll give it a > >>>>>> try tomorrow. > >>>>> > >>>>> Where is your branch? I could also test in my setup on your branch. > >>>>> > >>>> > >>>> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2 > >>>> here: https://github.com/AlexGhiti/riscv-linux.git > >>> > >>> No, it does not work for me. > >>> > >>> Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD, > >>> int/alex/riscv_kernel_end_of_address_space_v2) > >>> Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt > >>> > >>> riscv64-linux-gnu-gcc -v > >>> gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1) > >>> > >>> qemu-system-riscv64 --version > >>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) > >>> > >>> qemu-system-riscv64 \ > >>> -machine virt -smp 2 -m 2G \ > >>> -device virtio-blk-device,drive=hd0 \ > >>> -drive file=image-riscv64,if=none,format=raw,id=hd0 \ > >>> -kernel arch/riscv/boot/Image \ > >>> -nographic \ > >>> -device virtio-rng-device,rng=rng0 -object > >>> rng-random,filename=/dev/urandom,id=rng0 \ > >>> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > >>> virtio-net-device,netdev=net0 \ > >>> -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > >>> panic_on_warn=1 panic=86400 earlycon" > >> > >> It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I > >> don't think that changes anything at runtime). But your above command > >> line does not work for me as it appears you do not load any firmware, if > >> I add -bios images/fw_jump.elf, it works. But then I don't know where > >> your opensbi output below comes from... > >> > >> And regarding your issue with calling clock_gettime 'directly' compared > >> to using the syscall, I have the same consistent output from both calls. > >> > >> I have an older gcc (9.3.0) and the same qemu. I think what is missing > >> here is your buildroot config, so that we have the exact same > >> environment: could you post your buildroot config as well ? > > > > I don't think the image is relevant because I don't even get to kernel > > code. If the kernel will complain about no init later, that's fine. > > Re bios, this version of qemu already has OpenSBI bios builtin, you > > can pass -bios default, but that's, well, the default :) > > Here are more reproducible repro instructions that capture gcc and > > qemu. I think gcc version may be potentially relevant as I suspect > > code size. > > > > > > curl https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt > >> $KERNEL_SRC/.config > > docker pull gcr.io/syzkaller/syzbot > > docker run -it -v $KERNEL_SRC:/kernel gcr.io/syzkaller/syzbot > > cd /kernel > > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- olddefconfig > > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- > > qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel > > arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial > > console=ttyS0" > > [this does not, only OpenSBI output] > > > > Indeed the issue was code size, please find the fix below. I will send a > proper patch once I made sure the fix is the right one, but I'm pretty > confident, there's no reason to limit the mapping size to 128MB whereas > we have a whole pgdir. Great you get to the bottom of this! Riscv kernels are going to be YUGE! > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c > index 9b0592b11a9f..ff2495707edb 100644 > --- a/arch/riscv/mm/init.c > +++ b/arch/riscv/mm/init.c > @@ -287,7 +287,7 @@ pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_bss; > pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss; > pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss; > > -#define MAX_EARLY_MAPPING_SIZE SZ_128M > +#define MAX_EARLY_MAPPING_SIZE PGDIR_SIZE > > pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE); > > -- > 2.20.1 > > Thanks, > > Alex > > > scripts/config -d KASAN_INLINE -e KASAN_OUTLINE -d > > CC_OPTIMIZE_FOR_PERFORMANCE -e CC_OPTIMIZE_FOR_SIZE > > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- > > qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel > > arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial > > console=ttyS0" > > [this boots fine, at least at to starting init process] > > > > _______________________________________________ > > linux-riscv mailing list > > linux-riscv@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-riscv > > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2021-02-19 18:53 ` Dmitry Vyukov 0 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-02-19 18:53 UTC (permalink / raw) To: Alex Ghiti Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv On Fri, Feb 19, 2021 at 6:01 PM Alex Ghiti <alex@ghiti.fr> wrote: > > Hi Dmitry, > > Le 2/18/21 à 6:36 AM, Dmitry Vyukov a écrit : > > On Thu, Feb 18, 2021 at 8:54 AM Alex Ghiti <alex@ghiti.fr> wrote: > >> > >> Hi Dmitry, > >> > >>> On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote: > >>>> > >>>> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit : > >>>>> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote: > >>>>>> > >>>>>> Hi Dmitry, > >>>>>> > >>>>>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit : > >>>>>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: > >>>>>>>> > >>>>>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: > >>>>>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your > >>>>>>>>>> issue: I built a kernel on top of the branch riscv/fixes using > >>>>>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config > >>>>>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of > >>>>>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function > >>>>>>>>>> is used in virt_addr_valid) but not the segfaults you describe. > >>>>>>>>> > >>>>>>>>> Hi Alex, > >>>>>>>>> > >>>>>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong > >>>>>>>>> with my build, though, I did 'make clean' before doing. But at the > >>>>>>>>> same time it worked back in June... > >>>>>>>>> > >>>>>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a > >>>>>>>>> syzbot instance on riscv. If there a WARNING during boot then the > >>>>>>>>> kernel will be marked as broken. No further testing will happen. > >>>>>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or > >>>>>>>>> replace it with pr_err. > >>>>>>>> > >>>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> I've localized one issue with riscv/KASAN: > >>>>>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I > >>>>>>>> saw earlier. The following patch fixes it. > >>>>>>>> Could somebody please upstream this fix? I don't know how to add/run > >>>>>>>> tests for this. > >>>>>>>> Thanks > >>>>>>>> > >>>>>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile > >>>>>>>> index 0cfd6da784f84..cf3a383c1799d 100644 > >>>>>>>> --- a/arch/riscv/kernel/vdso/Makefile > >>>>>>>> +++ b/arch/riscv/kernel/vdso/Makefile > >>>>>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os > >>>>>>>> # Disable gcov profiling for VDSO code > >>>>>>>> GCOV_PROFILE := n > >>>>>>>> KCOV_INSTRUMENT := n > >>>>>>>> +KASAN_SANITIZE := n > >>>>>>>> > >>>>>>>> # Force dependency > >>>>>>>> $(obj)/vdso.o: $(obj)/vdso.so > >>>>>> > >>>>>> What's weird is that I don't have any issue without this patch with the > >>>>>> following config whereas it indeed seems required for KASAN. But when > >>>>>> looking at the segfaults you got earlier, the segfault address is 0xbb0 > >>>>>> and the cause is an instruction page fault: this address is the PLT base > >>>>>> address in vdso.so and an instruction page fault would mean that someone > >>>>>> tried to jump at this address, which is weird. At first sight, that does > >>>>>> not seem related to your patch above, but clearly I may be wrong. > >>>>>> > >>>>>> Tobias, did you observe the same segfaults as Dmitry ? > >>>>> > >>>>> > >>>>> I noticed that not all buildroot images use VDSO, it seems to be > >>>>> dependent on libc settings (at least I think I changed it in the > >>>>> past). > >>>> > >>>> Ok, I used uClibc but then when using glibc, I have the same segfaults, > >>>> only when KASAN is enabled. And your patch fixes the problem. I will try > >>>> to take a look later to better understand the problem. > >>>> > >>>>> I also booted an image completely successfully including dhcpd/sshd > >>>>> start, but then my executable crashed in clock_gettime. The executable > >>>>> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static" > >>>>> (10.2.1). > >>>>> > >>>>> > >>>>>>> Second issue I am seeing seems to be related to text segment size. > >>>>>>> I check out v5.11 and use this config: > >>>>>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 > >>>>>> > >>>>>> This config gave my laptop a hard time ! Finally I was able to boot > >>>>>> correctly to userspace, but I realized I used my sv48 branch...Either I > >>>>>> fixed your issue along the way or I can't reproduce it, I'll give it a > >>>>>> try tomorrow. > >>>>> > >>>>> Where is your branch? I could also test in my setup on your branch. > >>>>> > >>>> > >>>> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2 > >>>> here: https://github.com/AlexGhiti/riscv-linux.git > >>> > >>> No, it does not work for me. > >>> > >>> Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD, > >>> int/alex/riscv_kernel_end_of_address_space_v2) > >>> Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt > >>> > >>> riscv64-linux-gnu-gcc -v > >>> gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1) > >>> > >>> qemu-system-riscv64 --version > >>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) > >>> > >>> qemu-system-riscv64 \ > >>> -machine virt -smp 2 -m 2G \ > >>> -device virtio-blk-device,drive=hd0 \ > >>> -drive file=image-riscv64,if=none,format=raw,id=hd0 \ > >>> -kernel arch/riscv/boot/Image \ > >>> -nographic \ > >>> -device virtio-rng-device,rng=rng0 -object > >>> rng-random,filename=/dev/urandom,id=rng0 \ > >>> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > >>> virtio-net-device,netdev=net0 \ > >>> -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > >>> panic_on_warn=1 panic=86400 earlycon" > >> > >> It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I > >> don't think that changes anything at runtime). But your above command > >> line does not work for me as it appears you do not load any firmware, if > >> I add -bios images/fw_jump.elf, it works. But then I don't know where > >> your opensbi output below comes from... > >> > >> And regarding your issue with calling clock_gettime 'directly' compared > >> to using the syscall, I have the same consistent output from both calls. > >> > >> I have an older gcc (9.3.0) and the same qemu. I think what is missing > >> here is your buildroot config, so that we have the exact same > >> environment: could you post your buildroot config as well ? > > > > I don't think the image is relevant because I don't even get to kernel > > code. If the kernel will complain about no init later, that's fine. > > Re bios, this version of qemu already has OpenSBI bios builtin, you > > can pass -bios default, but that's, well, the default :) > > Here are more reproducible repro instructions that capture gcc and > > qemu. I think gcc version may be potentially relevant as I suspect > > code size. > > > > > > curl https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt > >> $KERNEL_SRC/.config > > docker pull gcr.io/syzkaller/syzbot > > docker run -it -v $KERNEL_SRC:/kernel gcr.io/syzkaller/syzbot > > cd /kernel > > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- olddefconfig > > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- > > qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel > > arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial > > console=ttyS0" > > [this does not, only OpenSBI output] > > > > Indeed the issue was code size, please find the fix below. I will send a > proper patch once I made sure the fix is the right one, but I'm pretty > confident, there's no reason to limit the mapping size to 128MB whereas > we have a whole pgdir. Great you get to the bottom of this! Riscv kernels are going to be YUGE! > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c > index 9b0592b11a9f..ff2495707edb 100644 > --- a/arch/riscv/mm/init.c > +++ b/arch/riscv/mm/init.c > @@ -287,7 +287,7 @@ pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_bss; > pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss; > pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss; > > -#define MAX_EARLY_MAPPING_SIZE SZ_128M > +#define MAX_EARLY_MAPPING_SIZE PGDIR_SIZE > > pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE); > > -- > 2.20.1 > > Thanks, > > Alex > > > scripts/config -d KASAN_INLINE -e KASAN_OUTLINE -d > > CC_OPTIMIZE_FOR_PERFORMANCE -e CC_OPTIMIZE_FOR_SIZE > > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- > > qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel > > arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial > > console=ttyS0" > > [this boots fine, at least at to starting init process] > > > > _______________________________________________ > > linux-riscv mailing list > > linux-riscv@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-riscv > > _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2021-02-19 18:53 ` Dmitry Vyukov @ 2021-02-19 22:26 ` Palmer Dabbelt -1 siblings, 0 replies; 55+ messages in thread From: Palmer Dabbelt @ 2021-02-19 22:26 UTC (permalink / raw) To: dvyukov Cc: alex, aou, Bjorn Topel, linux-kernel, nylon7, syzkaller, schwab, Paul Walmsley, tklauser, linux-riscv On Fri, 19 Feb 2021 10:53:43 PST (-0800), dvyukov@google.com wrote: > On Fri, Feb 19, 2021 at 6:01 PM Alex Ghiti <alex@ghiti.fr> wrote: >> >> Hi Dmitry, >> >> Le 2/18/21 à 6:36 AM, Dmitry Vyukov a écrit : >> > On Thu, Feb 18, 2021 at 8:54 AM Alex Ghiti <alex@ghiti.fr> wrote: >> >> >> >> Hi Dmitry, >> >> >> >>> On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote: >> >>>> >> >>>> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit : >> >>>>> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote: >> >>>>>> >> >>>>>> Hi Dmitry, >> >>>>>> >> >>>>>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit : >> >>>>>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: >> >>>>>>>> >> >>>>>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: >> >>>>>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your >> >>>>>>>>>> issue: I built a kernel on top of the branch riscv/fixes using >> >>>>>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config >> >>>>>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of >> >>>>>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function >> >>>>>>>>>> is used in virt_addr_valid) but not the segfaults you describe. >> >>>>>>>>> >> >>>>>>>>> Hi Alex, >> >>>>>>>>> >> >>>>>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong >> >>>>>>>>> with my build, though, I did 'make clean' before doing. But at the >> >>>>>>>>> same time it worked back in June... >> >>>>>>>>> >> >>>>>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a >> >>>>>>>>> syzbot instance on riscv. If there a WARNING during boot then the >> >>>>>>>>> kernel will be marked as broken. No further testing will happen. >> >>>>>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or >> >>>>>>>>> replace it with pr_err. >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Hi, >> >>>>>>>> >> >>>>>>>> I've localized one issue with riscv/KASAN: >> >>>>>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I >> >>>>>>>> saw earlier. The following patch fixes it. >> >>>>>>>> Could somebody please upstream this fix? I don't know how to add/run >> >>>>>>>> tests for this. >> >>>>>>>> Thanks >> >>>>>>>> >> >>>>>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile >> >>>>>>>> index 0cfd6da784f84..cf3a383c1799d 100644 >> >>>>>>>> --- a/arch/riscv/kernel/vdso/Makefile >> >>>>>>>> +++ b/arch/riscv/kernel/vdso/Makefile >> >>>>>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os >> >>>>>>>> # Disable gcov profiling for VDSO code >> >>>>>>>> GCOV_PROFILE := n >> >>>>>>>> KCOV_INSTRUMENT := n >> >>>>>>>> +KASAN_SANITIZE := n >> >>>>>>>> >> >>>>>>>> # Force dependency >> >>>>>>>> $(obj)/vdso.o: $(obj)/vdso.so >> >>>>>> >> >>>>>> What's weird is that I don't have any issue without this patch with the >> >>>>>> following config whereas it indeed seems required for KASAN. But when >> >>>>>> looking at the segfaults you got earlier, the segfault address is 0xbb0 >> >>>>>> and the cause is an instruction page fault: this address is the PLT base >> >>>>>> address in vdso.so and an instruction page fault would mean that someone >> >>>>>> tried to jump at this address, which is weird. At first sight, that does >> >>>>>> not seem related to your patch above, but clearly I may be wrong. >> >>>>>> >> >>>>>> Tobias, did you observe the same segfaults as Dmitry ? >> >>>>> >> >>>>> >> >>>>> I noticed that not all buildroot images use VDSO, it seems to be >> >>>>> dependent on libc settings (at least I think I changed it in the >> >>>>> past). >> >>>> >> >>>> Ok, I used uClibc but then when using glibc, I have the same segfaults, >> >>>> only when KASAN is enabled. And your patch fixes the problem. I will try >> >>>> to take a look later to better understand the problem. >> >>>> >> >>>>> I also booted an image completely successfully including dhcpd/sshd >> >>>>> start, but then my executable crashed in clock_gettime. The executable >> >>>>> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static" >> >>>>> (10.2.1). >> >>>>> >> >>>>> >> >>>>>>> Second issue I am seeing seems to be related to text segment size. >> >>>>>>> I check out v5.11 and use this config: >> >>>>>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 >> >>>>>> >> >>>>>> This config gave my laptop a hard time ! Finally I was able to boot >> >>>>>> correctly to userspace, but I realized I used my sv48 branch...Either I >> >>>>>> fixed your issue along the way or I can't reproduce it, I'll give it a >> >>>>>> try tomorrow. >> >>>>> >> >>>>> Where is your branch? I could also test in my setup on your branch. >> >>>>> >> >>>> >> >>>> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2 >> >>>> here: https://github.com/AlexGhiti/riscv-linux.git >> >>> >> >>> No, it does not work for me. >> >>> >> >>> Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD, >> >>> int/alex/riscv_kernel_end_of_address_space_v2) >> >>> Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt >> >>> >> >>> riscv64-linux-gnu-gcc -v >> >>> gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1) >> >>> >> >>> qemu-system-riscv64 --version >> >>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) >> >>> >> >>> qemu-system-riscv64 \ >> >>> -machine virt -smp 2 -m 2G \ >> >>> -device virtio-blk-device,drive=hd0 \ >> >>> -drive file=image-riscv64,if=none,format=raw,id=hd0 \ >> >>> -kernel arch/riscv/boot/Image \ >> >>> -nographic \ >> >>> -device virtio-rng-device,rng=rng0 -object >> >>> rng-random,filename=/dev/urandom,id=rng0 \ >> >>> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device >> >>> virtio-net-device,netdev=net0 \ >> >>> -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic >> >>> panic_on_warn=1 panic=86400 earlycon" >> >> >> >> It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I >> >> don't think that changes anything at runtime). But your above command >> >> line does not work for me as it appears you do not load any firmware, if >> >> I add -bios images/fw_jump.elf, it works. But then I don't know where >> >> your opensbi output below comes from... >> >> >> >> And regarding your issue with calling clock_gettime 'directly' compared >> >> to using the syscall, I have the same consistent output from both calls. >> >> >> >> I have an older gcc (9.3.0) and the same qemu. I think what is missing >> >> here is your buildroot config, so that we have the exact same >> >> environment: could you post your buildroot config as well ? >> > >> > I don't think the image is relevant because I don't even get to kernel >> > code. If the kernel will complain about no init later, that's fine. >> > Re bios, this version of qemu already has OpenSBI bios builtin, you >> > can pass -bios default, but that's, well, the default :) >> > Here are more reproducible repro instructions that capture gcc and >> > qemu. I think gcc version may be potentially relevant as I suspect >> > code size. >> > >> > >> > curl https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt >> >> $KERNEL_SRC/.config >> > docker pull gcr.io/syzkaller/syzbot >> > docker run -it -v $KERNEL_SRC:/kernel gcr.io/syzkaller/syzbot >> > cd /kernel >> > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- olddefconfig >> > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- >> > qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel >> > arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial >> > console=ttyS0" >> > [this does not, only OpenSBI output] >> > >> >> Indeed the issue was code size, please find the fix below. I will send a >> proper patch once I made sure the fix is the right one, but I'm pretty >> confident, there's no reason to limit the mapping size to 128MB whereas >> we have a whole pgdir. > > Great you get to the bottom of this! > Riscv kernels are going to be YUGE! IIRC I tried that a while ago and it didn't work. It's possible I was just running into some other bug, but I'm just build testing allyesconfig as opposed to boot testing it. If you've got a setup that does boot I'm happy to take a patch, though. It'll at least be one step forward. > >> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c >> index 9b0592b11a9f..ff2495707edb 100644 >> --- a/arch/riscv/mm/init.c >> +++ b/arch/riscv/mm/init.c >> @@ -287,7 +287,7 @@ pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_bss; >> pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss; >> pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss; >> >> -#define MAX_EARLY_MAPPING_SIZE SZ_128M >> +#define MAX_EARLY_MAPPING_SIZE PGDIR_SIZE >> >> pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE); >> >> -- >> 2.20.1 >> >> Thanks, >> >> Alex >> >> > scripts/config -d KASAN_INLINE -e KASAN_OUTLINE -d >> > CC_OPTIMIZE_FOR_PERFORMANCE -e CC_OPTIMIZE_FOR_SIZE >> > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- >> > qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel >> > arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial >> > console=ttyS0" >> > [this boots fine, at least at to starting init process] >> > >> > _______________________________________________ >> > linux-riscv mailing list >> > linux-riscv@lists.infradead.org >> > http://lists.infradead.org/mailman/listinfo/linux-riscv >> > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2021-02-19 22:26 ` Palmer Dabbelt 0 siblings, 0 replies; 55+ messages in thread From: Palmer Dabbelt @ 2021-02-19 22:26 UTC (permalink / raw) To: dvyukov Cc: aou, alex, Bjorn Topel, linux-kernel, nylon7, syzkaller, schwab, Paul Walmsley, tklauser, linux-riscv On Fri, 19 Feb 2021 10:53:43 PST (-0800), dvyukov@google.com wrote: > On Fri, Feb 19, 2021 at 6:01 PM Alex Ghiti <alex@ghiti.fr> wrote: >> >> Hi Dmitry, >> >> Le 2/18/21 à 6:36 AM, Dmitry Vyukov a écrit : >> > On Thu, Feb 18, 2021 at 8:54 AM Alex Ghiti <alex@ghiti.fr> wrote: >> >> >> >> Hi Dmitry, >> >> >> >>> On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote: >> >>>> >> >>>> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit : >> >>>>> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote: >> >>>>>> >> >>>>>> Hi Dmitry, >> >>>>>> >> >>>>>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit : >> >>>>>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: >> >>>>>>>> >> >>>>>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: >> >>>>>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your >> >>>>>>>>>> issue: I built a kernel on top of the branch riscv/fixes using >> >>>>>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config >> >>>>>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of >> >>>>>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function >> >>>>>>>>>> is used in virt_addr_valid) but not the segfaults you describe. >> >>>>>>>>> >> >>>>>>>>> Hi Alex, >> >>>>>>>>> >> >>>>>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong >> >>>>>>>>> with my build, though, I did 'make clean' before doing. But at the >> >>>>>>>>> same time it worked back in June... >> >>>>>>>>> >> >>>>>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a >> >>>>>>>>> syzbot instance on riscv. If there a WARNING during boot then the >> >>>>>>>>> kernel will be marked as broken. No further testing will happen. >> >>>>>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or >> >>>>>>>>> replace it with pr_err. >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Hi, >> >>>>>>>> >> >>>>>>>> I've localized one issue with riscv/KASAN: >> >>>>>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I >> >>>>>>>> saw earlier. The following patch fixes it. >> >>>>>>>> Could somebody please upstream this fix? I don't know how to add/run >> >>>>>>>> tests for this. >> >>>>>>>> Thanks >> >>>>>>>> >> >>>>>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile >> >>>>>>>> index 0cfd6da784f84..cf3a383c1799d 100644 >> >>>>>>>> --- a/arch/riscv/kernel/vdso/Makefile >> >>>>>>>> +++ b/arch/riscv/kernel/vdso/Makefile >> >>>>>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os >> >>>>>>>> # Disable gcov profiling for VDSO code >> >>>>>>>> GCOV_PROFILE := n >> >>>>>>>> KCOV_INSTRUMENT := n >> >>>>>>>> +KASAN_SANITIZE := n >> >>>>>>>> >> >>>>>>>> # Force dependency >> >>>>>>>> $(obj)/vdso.o: $(obj)/vdso.so >> >>>>>> >> >>>>>> What's weird is that I don't have any issue without this patch with the >> >>>>>> following config whereas it indeed seems required for KASAN. But when >> >>>>>> looking at the segfaults you got earlier, the segfault address is 0xbb0 >> >>>>>> and the cause is an instruction page fault: this address is the PLT base >> >>>>>> address in vdso.so and an instruction page fault would mean that someone >> >>>>>> tried to jump at this address, which is weird. At first sight, that does >> >>>>>> not seem related to your patch above, but clearly I may be wrong. >> >>>>>> >> >>>>>> Tobias, did you observe the same segfaults as Dmitry ? >> >>>>> >> >>>>> >> >>>>> I noticed that not all buildroot images use VDSO, it seems to be >> >>>>> dependent on libc settings (at least I think I changed it in the >> >>>>> past). >> >>>> >> >>>> Ok, I used uClibc but then when using glibc, I have the same segfaults, >> >>>> only when KASAN is enabled. And your patch fixes the problem. I will try >> >>>> to take a look later to better understand the problem. >> >>>> >> >>>>> I also booted an image completely successfully including dhcpd/sshd >> >>>>> start, but then my executable crashed in clock_gettime. The executable >> >>>>> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static" >> >>>>> (10.2.1). >> >>>>> >> >>>>> >> >>>>>>> Second issue I am seeing seems to be related to text segment size. >> >>>>>>> I check out v5.11 and use this config: >> >>>>>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 >> >>>>>> >> >>>>>> This config gave my laptop a hard time ! Finally I was able to boot >> >>>>>> correctly to userspace, but I realized I used my sv48 branch...Either I >> >>>>>> fixed your issue along the way or I can't reproduce it, I'll give it a >> >>>>>> try tomorrow. >> >>>>> >> >>>>> Where is your branch? I could also test in my setup on your branch. >> >>>>> >> >>>> >> >>>> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2 >> >>>> here: https://github.com/AlexGhiti/riscv-linux.git >> >>> >> >>> No, it does not work for me. >> >>> >> >>> Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD, >> >>> int/alex/riscv_kernel_end_of_address_space_v2) >> >>> Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt >> >>> >> >>> riscv64-linux-gnu-gcc -v >> >>> gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1) >> >>> >> >>> qemu-system-riscv64 --version >> >>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) >> >>> >> >>> qemu-system-riscv64 \ >> >>> -machine virt -smp 2 -m 2G \ >> >>> -device virtio-blk-device,drive=hd0 \ >> >>> -drive file=image-riscv64,if=none,format=raw,id=hd0 \ >> >>> -kernel arch/riscv/boot/Image \ >> >>> -nographic \ >> >>> -device virtio-rng-device,rng=rng0 -object >> >>> rng-random,filename=/dev/urandom,id=rng0 \ >> >>> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device >> >>> virtio-net-device,netdev=net0 \ >> >>> -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic >> >>> panic_on_warn=1 panic=86400 earlycon" >> >> >> >> It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I >> >> don't think that changes anything at runtime). But your above command >> >> line does not work for me as it appears you do not load any firmware, if >> >> I add -bios images/fw_jump.elf, it works. But then I don't know where >> >> your opensbi output below comes from... >> >> >> >> And regarding your issue with calling clock_gettime 'directly' compared >> >> to using the syscall, I have the same consistent output from both calls. >> >> >> >> I have an older gcc (9.3.0) and the same qemu. I think what is missing >> >> here is your buildroot config, so that we have the exact same >> >> environment: could you post your buildroot config as well ? >> > >> > I don't think the image is relevant because I don't even get to kernel >> > code. If the kernel will complain about no init later, that's fine. >> > Re bios, this version of qemu already has OpenSBI bios builtin, you >> > can pass -bios default, but that's, well, the default :) >> > Here are more reproducible repro instructions that capture gcc and >> > qemu. I think gcc version may be potentially relevant as I suspect >> > code size. >> > >> > >> > curl https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt >> >> $KERNEL_SRC/.config >> > docker pull gcr.io/syzkaller/syzbot >> > docker run -it -v $KERNEL_SRC:/kernel gcr.io/syzkaller/syzbot >> > cd /kernel >> > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- olddefconfig >> > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- >> > qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel >> > arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial >> > console=ttyS0" >> > [this does not, only OpenSBI output] >> > >> >> Indeed the issue was code size, please find the fix below. I will send a >> proper patch once I made sure the fix is the right one, but I'm pretty >> confident, there's no reason to limit the mapping size to 128MB whereas >> we have a whole pgdir. > > Great you get to the bottom of this! > Riscv kernels are going to be YUGE! IIRC I tried that a while ago and it didn't work. It's possible I was just running into some other bug, but I'm just build testing allyesconfig as opposed to boot testing it. If you've got a setup that does boot I'm happy to take a patch, though. It'll at least be one step forward. > >> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c >> index 9b0592b11a9f..ff2495707edb 100644 >> --- a/arch/riscv/mm/init.c >> +++ b/arch/riscv/mm/init.c >> @@ -287,7 +287,7 @@ pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_bss; >> pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss; >> pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss; >> >> -#define MAX_EARLY_MAPPING_SIZE SZ_128M >> +#define MAX_EARLY_MAPPING_SIZE PGDIR_SIZE >> >> pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE); >> >> -- >> 2.20.1 >> >> Thanks, >> >> Alex >> >> > scripts/config -d KASAN_INLINE -e KASAN_OUTLINE -d >> > CC_OPTIMIZE_FOR_PERFORMANCE -e CC_OPTIMIZE_FOR_SIZE >> > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- >> > qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel >> > arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial >> > console=ttyS0" >> > [this boots fine, at least at to starting init process] >> > >> > _______________________________________________ >> > linux-riscv mailing list >> > linux-riscv@lists.infradead.org >> > http://lists.infradead.org/mailman/listinfo/linux-riscv >> > _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2021-02-19 22:26 ` Palmer Dabbelt @ 2021-03-09 17:11 ` Dmitry Vyukov -1 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-03-09 17:11 UTC (permalink / raw) To: Palmer Dabbelt Cc: Alex Ghiti, Albert Ou, Bjorn Topel, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv On Fri, Feb 19, 2021 at 11:26 PM 'Palmer Dabbelt' via syzkaller <syzkaller@googlegroups.com> wrote: > > On Fri, 19 Feb 2021 10:53:43 PST (-0800), dvyukov@google.com wrote: > > On Fri, Feb 19, 2021 at 6:01 PM Alex Ghiti <alex@ghiti.fr> wrote: > >> > >> Hi Dmitry, > >> > >> Le 2/18/21 à 6:36 AM, Dmitry Vyukov a écrit : > >> > On Thu, Feb 18, 2021 at 8:54 AM Alex Ghiti <alex@ghiti.fr> wrote: > >> >> > >> >> Hi Dmitry, > >> >> > >> >>> On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote: > >> >>>> > >> >>>> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit : > >> >>>>> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote: > >> >>>>>> > >> >>>>>> Hi Dmitry, > >> >>>>>> > >> >>>>>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit : > >> >>>>>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: > >> >>>>>>>> > >> >>>>>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: > >> >>>>>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your > >> >>>>>>>>>> issue: I built a kernel on top of the branch riscv/fixes using > >> >>>>>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config > >> >>>>>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of > >> >>>>>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function > >> >>>>>>>>>> is used in virt_addr_valid) but not the segfaults you describe. > >> >>>>>>>>> > >> >>>>>>>>> Hi Alex, > >> >>>>>>>>> > >> >>>>>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong > >> >>>>>>>>> with my build, though, I did 'make clean' before doing. But at the > >> >>>>>>>>> same time it worked back in June... > >> >>>>>>>>> > >> >>>>>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a > >> >>>>>>>>> syzbot instance on riscv. If there a WARNING during boot then the > >> >>>>>>>>> kernel will be marked as broken. No further testing will happen. > >> >>>>>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or > >> >>>>>>>>> replace it with pr_err. > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> Hi, > >> >>>>>>>> > >> >>>>>>>> I've localized one issue with riscv/KASAN: > >> >>>>>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I > >> >>>>>>>> saw earlier. The following patch fixes it. > >> >>>>>>>> Could somebody please upstream this fix? I don't know how to add/run > >> >>>>>>>> tests for this. > >> >>>>>>>> Thanks > >> >>>>>>>> > >> >>>>>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile > >> >>>>>>>> index 0cfd6da784f84..cf3a383c1799d 100644 > >> >>>>>>>> --- a/arch/riscv/kernel/vdso/Makefile > >> >>>>>>>> +++ b/arch/riscv/kernel/vdso/Makefile > >> >>>>>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os > >> >>>>>>>> # Disable gcov profiling for VDSO code > >> >>>>>>>> GCOV_PROFILE := n > >> >>>>>>>> KCOV_INSTRUMENT := n > >> >>>>>>>> +KASAN_SANITIZE := n > >> >>>>>>>> > >> >>>>>>>> # Force dependency > >> >>>>>>>> $(obj)/vdso.o: $(obj)/vdso.so > >> >>>>>> > >> >>>>>> What's weird is that I don't have any issue without this patch with the > >> >>>>>> following config whereas it indeed seems required for KASAN. But when > >> >>>>>> looking at the segfaults you got earlier, the segfault address is 0xbb0 > >> >>>>>> and the cause is an instruction page fault: this address is the PLT base > >> >>>>>> address in vdso.so and an instruction page fault would mean that someone > >> >>>>>> tried to jump at this address, which is weird. At first sight, that does > >> >>>>>> not seem related to your patch above, but clearly I may be wrong. > >> >>>>>> > >> >>>>>> Tobias, did you observe the same segfaults as Dmitry ? > >> >>>>> > >> >>>>> > >> >>>>> I noticed that not all buildroot images use VDSO, it seems to be > >> >>>>> dependent on libc settings (at least I think I changed it in the > >> >>>>> past). > >> >>>> > >> >>>> Ok, I used uClibc but then when using glibc, I have the same segfaults, > >> >>>> only when KASAN is enabled. And your patch fixes the problem. I will try > >> >>>> to take a look later to better understand the problem. > >> >>>> > >> >>>>> I also booted an image completely successfully including dhcpd/sshd > >> >>>>> start, but then my executable crashed in clock_gettime. The executable > >> >>>>> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static" > >> >>>>> (10.2.1). > >> >>>>> > >> >>>>> > >> >>>>>>> Second issue I am seeing seems to be related to text segment size. > >> >>>>>>> I check out v5.11 and use this config: > >> >>>>>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 > >> >>>>>> > >> >>>>>> This config gave my laptop a hard time ! Finally I was able to boot > >> >>>>>> correctly to userspace, but I realized I used my sv48 branch...Either I > >> >>>>>> fixed your issue along the way or I can't reproduce it, I'll give it a > >> >>>>>> try tomorrow. > >> >>>>> > >> >>>>> Where is your branch? I could also test in my setup on your branch. > >> >>>>> > >> >>>> > >> >>>> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2 > >> >>>> here: https://github.com/AlexGhiti/riscv-linux.git > >> >>> > >> >>> No, it does not work for me. > >> >>> > >> >>> Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD, > >> >>> int/alex/riscv_kernel_end_of_address_space_v2) > >> >>> Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt > >> >>> > >> >>> riscv64-linux-gnu-gcc -v > >> >>> gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1) > >> >>> > >> >>> qemu-system-riscv64 --version > >> >>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) > >> >>> > >> >>> qemu-system-riscv64 \ > >> >>> -machine virt -smp 2 -m 2G \ > >> >>> -device virtio-blk-device,drive=hd0 \ > >> >>> -drive file=image-riscv64,if=none,format=raw,id=hd0 \ > >> >>> -kernel arch/riscv/boot/Image \ > >> >>> -nographic \ > >> >>> -device virtio-rng-device,rng=rng0 -object > >> >>> rng-random,filename=/dev/urandom,id=rng0 \ > >> >>> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > >> >>> virtio-net-device,netdev=net0 \ > >> >>> -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > >> >>> panic_on_warn=1 panic=86400 earlycon" > >> >> > >> >> It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I > >> >> don't think that changes anything at runtime). But your above command > >> >> line does not work for me as it appears you do not load any firmware, if > >> >> I add -bios images/fw_jump.elf, it works. But then I don't know where > >> >> your opensbi output below comes from... > >> >> > >> >> And regarding your issue with calling clock_gettime 'directly' compared > >> >> to using the syscall, I have the same consistent output from both calls. > >> >> > >> >> I have an older gcc (9.3.0) and the same qemu. I think what is missing > >> >> here is your buildroot config, so that we have the exact same > >> >> environment: could you post your buildroot config as well ? > >> > > >> > I don't think the image is relevant because I don't even get to kernel > >> > code. If the kernel will complain about no init later, that's fine. > >> > Re bios, this version of qemu already has OpenSBI bios builtin, you > >> > can pass -bios default, but that's, well, the default :) > >> > Here are more reproducible repro instructions that capture gcc and > >> > qemu. I think gcc version may be potentially relevant as I suspect > >> > code size. > >> > > >> > > >> > curl https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt > >> >> $KERNEL_SRC/.config > >> > docker pull gcr.io/syzkaller/syzbot > >> > docker run -it -v $KERNEL_SRC:/kernel gcr.io/syzkaller/syzbot > >> > cd /kernel > >> > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- olddefconfig > >> > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- > >> > qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel > >> > arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial > >> > console=ttyS0" > >> > [this does not, only OpenSBI output] > >> > > >> > >> Indeed the issue was code size, please find the fix below. I will send a > >> proper patch once I made sure the fix is the right one, but I'm pretty > >> confident, there's no reason to limit the mapping size to 128MB whereas > >> we have a whole pgdir. > > > > Great you get to the bottom of this! > > Riscv kernels are going to be YUGE! > > IIRC I tried that a while ago and it didn't work. It's possible I was just > running into some other bug, but I'm just build testing allyesconfig as opposed > to boot testing it. > > If you've got a setup that does boot I'm happy to take a patch, though. It'll > at least be one step forward. OK, it's getting better. The next issue is called "512 bytes should be enough for everyone!" :) https://elixir.bootlin.com/linux/v5.12-rc2/source/include/uapi/asm-generic/setup.h#L5 Most other arches redefine it to something bigger: https://elixir.bootlin.com/linux/v5.12-rc2/source/arch/s390/include/uapi/asm/setup.h#L10 even arm32 redefines it. I am not sure the default is even reasonable anymore. Failure mode is also not nice (silent truncation). We are trying to pass this: earlyprintk=serial oops=panic nmi_watchdog=panic panic=86400 net.ifnames=0 sysctl.kernel.hung_task_all_cpu_backtrace=1 ima_policy=tcb kvm-intel.nested=1 nf-conntrack-ftp.ports=20000 nf-conntrack-tftp.ports=20000 nf-conntrack-sip.ports=20000 nf-conntrack-irc.ports=20000 nf-conntrack-sane.ports=20000 vivid.n_devs=16 vivid.multiplanar=1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2 netrom.nr_ndevs=16 rose.rose_ndevs=16 spec_store_bypass_disable=prctl numa=fake=2 nopcid dummy_hcd.num=8 binder.debug_mask=0 rcupdate.rcu_expedited=1 watchdog_thresh=165 workqueue.watchdog_thresh=420 panic_on_warn=1 The last part gets truncated and we are getting false workqueue watchdog stalls. Could you please increase it? ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2021-03-09 17:11 ` Dmitry Vyukov 0 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-03-09 17:11 UTC (permalink / raw) To: Palmer Dabbelt Cc: Alex Ghiti, Albert Ou, Bjorn Topel, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv On Fri, Feb 19, 2021 at 11:26 PM 'Palmer Dabbelt' via syzkaller <syzkaller@googlegroups.com> wrote: > > On Fri, 19 Feb 2021 10:53:43 PST (-0800), dvyukov@google.com wrote: > > On Fri, Feb 19, 2021 at 6:01 PM Alex Ghiti <alex@ghiti.fr> wrote: > >> > >> Hi Dmitry, > >> > >> Le 2/18/21 à 6:36 AM, Dmitry Vyukov a écrit : > >> > On Thu, Feb 18, 2021 at 8:54 AM Alex Ghiti <alex@ghiti.fr> wrote: > >> >> > >> >> Hi Dmitry, > >> >> > >> >>> On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote: > >> >>>> > >> >>>> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit : > >> >>>>> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote: > >> >>>>>> > >> >>>>>> Hi Dmitry, > >> >>>>>> > >> >>>>>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit : > >> >>>>>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: > >> >>>>>>>> > >> >>>>>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: > >> >>>>>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your > >> >>>>>>>>>> issue: I built a kernel on top of the branch riscv/fixes using > >> >>>>>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config > >> >>>>>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of > >> >>>>>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function > >> >>>>>>>>>> is used in virt_addr_valid) but not the segfaults you describe. > >> >>>>>>>>> > >> >>>>>>>>> Hi Alex, > >> >>>>>>>>> > >> >>>>>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong > >> >>>>>>>>> with my build, though, I did 'make clean' before doing. But at the > >> >>>>>>>>> same time it worked back in June... > >> >>>>>>>>> > >> >>>>>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a > >> >>>>>>>>> syzbot instance on riscv. If there a WARNING during boot then the > >> >>>>>>>>> kernel will be marked as broken. No further testing will happen. > >> >>>>>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or > >> >>>>>>>>> replace it with pr_err. > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> Hi, > >> >>>>>>>> > >> >>>>>>>> I've localized one issue with riscv/KASAN: > >> >>>>>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I > >> >>>>>>>> saw earlier. The following patch fixes it. > >> >>>>>>>> Could somebody please upstream this fix? I don't know how to add/run > >> >>>>>>>> tests for this. > >> >>>>>>>> Thanks > >> >>>>>>>> > >> >>>>>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile > >> >>>>>>>> index 0cfd6da784f84..cf3a383c1799d 100644 > >> >>>>>>>> --- a/arch/riscv/kernel/vdso/Makefile > >> >>>>>>>> +++ b/arch/riscv/kernel/vdso/Makefile > >> >>>>>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os > >> >>>>>>>> # Disable gcov profiling for VDSO code > >> >>>>>>>> GCOV_PROFILE := n > >> >>>>>>>> KCOV_INSTRUMENT := n > >> >>>>>>>> +KASAN_SANITIZE := n > >> >>>>>>>> > >> >>>>>>>> # Force dependency > >> >>>>>>>> $(obj)/vdso.o: $(obj)/vdso.so > >> >>>>>> > >> >>>>>> What's weird is that I don't have any issue without this patch with the > >> >>>>>> following config whereas it indeed seems required for KASAN. But when > >> >>>>>> looking at the segfaults you got earlier, the segfault address is 0xbb0 > >> >>>>>> and the cause is an instruction page fault: this address is the PLT base > >> >>>>>> address in vdso.so and an instruction page fault would mean that someone > >> >>>>>> tried to jump at this address, which is weird. At first sight, that does > >> >>>>>> not seem related to your patch above, but clearly I may be wrong. > >> >>>>>> > >> >>>>>> Tobias, did you observe the same segfaults as Dmitry ? > >> >>>>> > >> >>>>> > >> >>>>> I noticed that not all buildroot images use VDSO, it seems to be > >> >>>>> dependent on libc settings (at least I think I changed it in the > >> >>>>> past). > >> >>>> > >> >>>> Ok, I used uClibc but then when using glibc, I have the same segfaults, > >> >>>> only when KASAN is enabled. And your patch fixes the problem. I will try > >> >>>> to take a look later to better understand the problem. > >> >>>> > >> >>>>> I also booted an image completely successfully including dhcpd/sshd > >> >>>>> start, but then my executable crashed in clock_gettime. The executable > >> >>>>> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static" > >> >>>>> (10.2.1). > >> >>>>> > >> >>>>> > >> >>>>>>> Second issue I am seeing seems to be related to text segment size. > >> >>>>>>> I check out v5.11 and use this config: > >> >>>>>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 > >> >>>>>> > >> >>>>>> This config gave my laptop a hard time ! Finally I was able to boot > >> >>>>>> correctly to userspace, but I realized I used my sv48 branch...Either I > >> >>>>>> fixed your issue along the way or I can't reproduce it, I'll give it a > >> >>>>>> try tomorrow. > >> >>>>> > >> >>>>> Where is your branch? I could also test in my setup on your branch. > >> >>>>> > >> >>>> > >> >>>> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2 > >> >>>> here: https://github.com/AlexGhiti/riscv-linux.git > >> >>> > >> >>> No, it does not work for me. > >> >>> > >> >>> Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD, > >> >>> int/alex/riscv_kernel_end_of_address_space_v2) > >> >>> Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt > >> >>> > >> >>> riscv64-linux-gnu-gcc -v > >> >>> gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1) > >> >>> > >> >>> qemu-system-riscv64 --version > >> >>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) > >> >>> > >> >>> qemu-system-riscv64 \ > >> >>> -machine virt -smp 2 -m 2G \ > >> >>> -device virtio-blk-device,drive=hd0 \ > >> >>> -drive file=image-riscv64,if=none,format=raw,id=hd0 \ > >> >>> -kernel arch/riscv/boot/Image \ > >> >>> -nographic \ > >> >>> -device virtio-rng-device,rng=rng0 -object > >> >>> rng-random,filename=/dev/urandom,id=rng0 \ > >> >>> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > >> >>> virtio-net-device,netdev=net0 \ > >> >>> -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > >> >>> panic_on_warn=1 panic=86400 earlycon" > >> >> > >> >> It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I > >> >> don't think that changes anything at runtime). But your above command > >> >> line does not work for me as it appears you do not load any firmware, if > >> >> I add -bios images/fw_jump.elf, it works. But then I don't know where > >> >> your opensbi output below comes from... > >> >> > >> >> And regarding your issue with calling clock_gettime 'directly' compared > >> >> to using the syscall, I have the same consistent output from both calls. > >> >> > >> >> I have an older gcc (9.3.0) and the same qemu. I think what is missing > >> >> here is your buildroot config, so that we have the exact same > >> >> environment: could you post your buildroot config as well ? > >> > > >> > I don't think the image is relevant because I don't even get to kernel > >> > code. If the kernel will complain about no init later, that's fine. > >> > Re bios, this version of qemu already has OpenSBI bios builtin, you > >> > can pass -bios default, but that's, well, the default :) > >> > Here are more reproducible repro instructions that capture gcc and > >> > qemu. I think gcc version may be potentially relevant as I suspect > >> > code size. > >> > > >> > > >> > curl https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt > >> >> $KERNEL_SRC/.config > >> > docker pull gcr.io/syzkaller/syzbot > >> > docker run -it -v $KERNEL_SRC:/kernel gcr.io/syzkaller/syzbot > >> > cd /kernel > >> > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- olddefconfig > >> > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- > >> > qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel > >> > arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial > >> > console=ttyS0" > >> > [this does not, only OpenSBI output] > >> > > >> > >> Indeed the issue was code size, please find the fix below. I will send a > >> proper patch once I made sure the fix is the right one, but I'm pretty > >> confident, there's no reason to limit the mapping size to 128MB whereas > >> we have a whole pgdir. > > > > Great you get to the bottom of this! > > Riscv kernels are going to be YUGE! > > IIRC I tried that a while ago and it didn't work. It's possible I was just > running into some other bug, but I'm just build testing allyesconfig as opposed > to boot testing it. > > If you've got a setup that does boot I'm happy to take a patch, though. It'll > at least be one step forward. OK, it's getting better. The next issue is called "512 bytes should be enough for everyone!" :) https://elixir.bootlin.com/linux/v5.12-rc2/source/include/uapi/asm-generic/setup.h#L5 Most other arches redefine it to something bigger: https://elixir.bootlin.com/linux/v5.12-rc2/source/arch/s390/include/uapi/asm/setup.h#L10 even arm32 redefines it. I am not sure the default is even reasonable anymore. Failure mode is also not nice (silent truncation). We are trying to pass this: earlyprintk=serial oops=panic nmi_watchdog=panic panic=86400 net.ifnames=0 sysctl.kernel.hung_task_all_cpu_backtrace=1 ima_policy=tcb kvm-intel.nested=1 nf-conntrack-ftp.ports=20000 nf-conntrack-tftp.ports=20000 nf-conntrack-sip.ports=20000 nf-conntrack-irc.ports=20000 nf-conntrack-sane.ports=20000 vivid.n_devs=16 vivid.multiplanar=1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2 netrom.nr_ndevs=16 rose.rose_ndevs=16 spec_store_bypass_disable=prctl numa=fake=2 nopcid dummy_hcd.num=8 binder.debug_mask=0 rcupdate.rcu_expedited=1 watchdog_thresh=165 workqueue.watchdog_thresh=420 panic_on_warn=1 The last part gets truncated and we are getting false workqueue watchdog stalls. Could you please increase it? _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2021-03-09 17:11 ` Dmitry Vyukov @ 2021-03-09 19:49 ` Alex Ghiti -1 siblings, 0 replies; 55+ messages in thread From: Alex Ghiti @ 2021-03-09 19:49 UTC (permalink / raw) To: Dmitry Vyukov, Palmer Dabbelt Cc: Albert Ou, Bjorn Topel, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv Le 3/9/21 à 12:11 PM, Dmitry Vyukov a écrit : > On Fri, Feb 19, 2021 at 11:26 PM 'Palmer Dabbelt' via syzkaller > <syzkaller@googlegroups.com> wrote: >> >> On Fri, 19 Feb 2021 10:53:43 PST (-0800), dvyukov@google.com wrote: >>> On Fri, Feb 19, 2021 at 6:01 PM Alex Ghiti <alex@ghiti.fr> wrote: >>>> >>>> Hi Dmitry, >>>> >>>> Le 2/18/21 à 6:36 AM, Dmitry Vyukov a écrit : >>>>> On Thu, Feb 18, 2021 at 8:54 AM Alex Ghiti <alex@ghiti.fr> wrote: >>>>>> >>>>>> Hi Dmitry, >>>>>> >>>>>>> On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote: >>>>>>>> >>>>>>>> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit : >>>>>>>>> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote: >>>>>>>>>> >>>>>>>>>> Hi Dmitry, >>>>>>>>>> >>>>>>>>>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit : >>>>>>>>>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: >>>>>>>>>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your >>>>>>>>>>>>>> issue: I built a kernel on top of the branch riscv/fixes using >>>>>>>>>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config >>>>>>>>>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of >>>>>>>>>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function >>>>>>>>>>>>>> is used in virt_addr_valid) but not the segfaults you describe. >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Alex, >>>>>>>>>>>>> >>>>>>>>>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong >>>>>>>>>>>>> with my build, though, I did 'make clean' before doing. But at the >>>>>>>>>>>>> same time it worked back in June... >>>>>>>>>>>>> >>>>>>>>>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a >>>>>>>>>>>>> syzbot instance on riscv. If there a WARNING during boot then the >>>>>>>>>>>>> kernel will be marked as broken. No further testing will happen. >>>>>>>>>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or >>>>>>>>>>>>> replace it with pr_err. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I've localized one issue with riscv/KASAN: >>>>>>>>>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I >>>>>>>>>>>> saw earlier. The following patch fixes it. >>>>>>>>>>>> Could somebody please upstream this fix? I don't know how to add/run >>>>>>>>>>>> tests for this. >>>>>>>>>>>> Thanks >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile >>>>>>>>>>>> index 0cfd6da784f84..cf3a383c1799d 100644 >>>>>>>>>>>> --- a/arch/riscv/kernel/vdso/Makefile >>>>>>>>>>>> +++ b/arch/riscv/kernel/vdso/Makefile >>>>>>>>>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os >>>>>>>>>>>> # Disable gcov profiling for VDSO code >>>>>>>>>>>> GCOV_PROFILE := n >>>>>>>>>>>> KCOV_INSTRUMENT := n >>>>>>>>>>>> +KASAN_SANITIZE := n >>>>>>>>>>>> >>>>>>>>>>>> # Force dependency >>>>>>>>>>>> $(obj)/vdso.o: $(obj)/vdso.so >>>>>>>>>> >>>>>>>>>> What's weird is that I don't have any issue without this patch with the >>>>>>>>>> following config whereas it indeed seems required for KASAN. But when >>>>>>>>>> looking at the segfaults you got earlier, the segfault address is 0xbb0 >>>>>>>>>> and the cause is an instruction page fault: this address is the PLT base >>>>>>>>>> address in vdso.so and an instruction page fault would mean that someone >>>>>>>>>> tried to jump at this address, which is weird. At first sight, that does >>>>>>>>>> not seem related to your patch above, but clearly I may be wrong. >>>>>>>>>> >>>>>>>>>> Tobias, did you observe the same segfaults as Dmitry ? >>>>>>>>> >>>>>>>>> >>>>>>>>> I noticed that not all buildroot images use VDSO, it seems to be >>>>>>>>> dependent on libc settings (at least I think I changed it in the >>>>>>>>> past). >>>>>>>> >>>>>>>> Ok, I used uClibc but then when using glibc, I have the same segfaults, >>>>>>>> only when KASAN is enabled. And your patch fixes the problem. I will try >>>>>>>> to take a look later to better understand the problem. >>>>>>>> >>>>>>>>> I also booted an image completely successfully including dhcpd/sshd >>>>>>>>> start, but then my executable crashed in clock_gettime. The executable >>>>>>>>> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static" >>>>>>>>> (10.2.1). >>>>>>>>> >>>>>>>>> >>>>>>>>>>> Second issue I am seeing seems to be related to text segment size. >>>>>>>>>>> I check out v5.11 and use this config: >>>>>>>>>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 >>>>>>>>>> >>>>>>>>>> This config gave my laptop a hard time ! Finally I was able to boot >>>>>>>>>> correctly to userspace, but I realized I used my sv48 branch...Either I >>>>>>>>>> fixed your issue along the way or I can't reproduce it, I'll give it a >>>>>>>>>> try tomorrow. >>>>>>>>> >>>>>>>>> Where is your branch? I could also test in my setup on your branch. >>>>>>>>> >>>>>>>> >>>>>>>> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2 >>>>>>>> here: https://github.com/AlexGhiti/riscv-linux.git >>>>>>> >>>>>>> No, it does not work for me. >>>>>>> >>>>>>> Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD, >>>>>>> int/alex/riscv_kernel_end_of_address_space_v2) >>>>>>> Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt >>>>>>> >>>>>>> riscv64-linux-gnu-gcc -v >>>>>>> gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1) >>>>>>> >>>>>>> qemu-system-riscv64 --version >>>>>>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) >>>>>>> >>>>>>> qemu-system-riscv64 \ >>>>>>> -machine virt -smp 2 -m 2G \ >>>>>>> -device virtio-blk-device,drive=hd0 \ >>>>>>> -drive file=image-riscv64,if=none,format=raw,id=hd0 \ >>>>>>> -kernel arch/riscv/boot/Image \ >>>>>>> -nographic \ >>>>>>> -device virtio-rng-device,rng=rng0 -object >>>>>>> rng-random,filename=/dev/urandom,id=rng0 \ >>>>>>> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device >>>>>>> virtio-net-device,netdev=net0 \ >>>>>>> -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic >>>>>>> panic_on_warn=1 panic=86400 earlycon" >>>>>> >>>>>> It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I >>>>>> don't think that changes anything at runtime). But your above command >>>>>> line does not work for me as it appears you do not load any firmware, if >>>>>> I add -bios images/fw_jump.elf, it works. But then I don't know where >>>>>> your opensbi output below comes from... >>>>>> >>>>>> And regarding your issue with calling clock_gettime 'directly' compared >>>>>> to using the syscall, I have the same consistent output from both calls. >>>>>> >>>>>> I have an older gcc (9.3.0) and the same qemu. I think what is missing >>>>>> here is your buildroot config, so that we have the exact same >>>>>> environment: could you post your buildroot config as well ? >>>>> >>>>> I don't think the image is relevant because I don't even get to kernel >>>>> code. If the kernel will complain about no init later, that's fine. >>>>> Re bios, this version of qemu already has OpenSBI bios builtin, you >>>>> can pass -bios default, but that's, well, the default :) >>>>> Here are more reproducible repro instructions that capture gcc and >>>>> qemu. I think gcc version may be potentially relevant as I suspect >>>>> code size. >>>>> >>>>> >>>>> curl https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt >>>>>> $KERNEL_SRC/.config >>>>> docker pull gcr.io/syzkaller/syzbot >>>>> docker run -it -v $KERNEL_SRC:/kernel gcr.io/syzkaller/syzbot >>>>> cd /kernel >>>>> make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- olddefconfig >>>>> make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- >>>>> qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel >>>>> arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial >>>>> console=ttyS0" >>>>> [this does not, only OpenSBI output] >>>>> >>>> >>>> Indeed the issue was code size, please find the fix below. I will send a >>>> proper patch once I made sure the fix is the right one, but I'm pretty >>>> confident, there's no reason to limit the mapping size to 128MB whereas >>>> we have a whole pgdir. >>> >>> Great you get to the bottom of this! >>> Riscv kernels are going to be YUGE! >> >> IIRC I tried that a while ago and it didn't work. It's possible I was just >> running into some other bug, but I'm just build testing allyesconfig as opposed >> to boot testing it. >> >> If you've got a setup that does boot I'm happy to take a patch, though. It'll >> at least be one step forward. > > > > OK, it's getting better. Nice :) > The next issue is called "512 bytes should be enough for everyone!" :) > https://elixir.bootlin.com/linux/v5.12-rc2/source/include/uapi/asm-generic/setup.h#L5 > Most other arches redefine it to something bigger: > https://elixir.bootlin.com/linux/v5.12-rc2/source/arch/s390/include/uapi/asm/setup.h#L10 > even arm32 redefines it. > I am not sure the default is even reasonable anymore. Some archs override this value to 256, but git blame shows this is (very) old. I agree that 512 as default seems low. > Failure mode is > also not nice (silent truncation). Agreed, maybe we could still have the default value and checks the terminating null character is somewhere and bugs if not, I'll take a look. > We are trying to pass this: > > earlyprintk=serial oops=panic nmi_watchdog=panic panic=86400 > net.ifnames=0 sysctl.kernel.hung_task_all_cpu_backtrace=1 > ima_policy=tcb kvm-intel.nested=1 nf-conntrack-ftp.ports=20000 > nf-conntrack-tftp.ports=20000 nf-conntrack-sip.ports=20000 > nf-conntrack-irc.ports=20000 nf-conntrack-sane.ports=20000 > vivid.n_devs=16 vivid.multiplanar=1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2 > netrom.nr_ndevs=16 rose.rose_ndevs=16 spec_store_bypass_disable=prctl > numa=fake=2 nopcid dummy_hcd.num=8 binder.debug_mask=0 > rcupdate.rcu_expedited=1 watchdog_thresh=165 > workqueue.watchdog_thresh=420 panic_on_warn=1 > > The last part gets truncated and we are getting false workqueue watchdog stalls. > > Could you please increase it? I will propose a patchset that increases the default value and cleans archs up accordingly too. Thanks again, Alex > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv > ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2021-03-09 19:49 ` Alex Ghiti 0 siblings, 0 replies; 55+ messages in thread From: Alex Ghiti @ 2021-03-09 19:49 UTC (permalink / raw) To: Dmitry Vyukov, Palmer Dabbelt Cc: Albert Ou, Bjorn Topel, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv Le 3/9/21 à 12:11 PM, Dmitry Vyukov a écrit : > On Fri, Feb 19, 2021 at 11:26 PM 'Palmer Dabbelt' via syzkaller > <syzkaller@googlegroups.com> wrote: >> >> On Fri, 19 Feb 2021 10:53:43 PST (-0800), dvyukov@google.com wrote: >>> On Fri, Feb 19, 2021 at 6:01 PM Alex Ghiti <alex@ghiti.fr> wrote: >>>> >>>> Hi Dmitry, >>>> >>>> Le 2/18/21 à 6:36 AM, Dmitry Vyukov a écrit : >>>>> On Thu, Feb 18, 2021 at 8:54 AM Alex Ghiti <alex@ghiti.fr> wrote: >>>>>> >>>>>> Hi Dmitry, >>>>>> >>>>>>> On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote: >>>>>>>> >>>>>>>> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit : >>>>>>>>> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote: >>>>>>>>>> >>>>>>>>>> Hi Dmitry, >>>>>>>>>> >>>>>>>>>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit : >>>>>>>>>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: >>>>>>>>>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your >>>>>>>>>>>>>> issue: I built a kernel on top of the branch riscv/fixes using >>>>>>>>>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config >>>>>>>>>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of >>>>>>>>>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function >>>>>>>>>>>>>> is used in virt_addr_valid) but not the segfaults you describe. >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Alex, >>>>>>>>>>>>> >>>>>>>>>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong >>>>>>>>>>>>> with my build, though, I did 'make clean' before doing. But at the >>>>>>>>>>>>> same time it worked back in June... >>>>>>>>>>>>> >>>>>>>>>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a >>>>>>>>>>>>> syzbot instance on riscv. If there a WARNING during boot then the >>>>>>>>>>>>> kernel will be marked as broken. No further testing will happen. >>>>>>>>>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or >>>>>>>>>>>>> replace it with pr_err. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I've localized one issue with riscv/KASAN: >>>>>>>>>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I >>>>>>>>>>>> saw earlier. The following patch fixes it. >>>>>>>>>>>> Could somebody please upstream this fix? I don't know how to add/run >>>>>>>>>>>> tests for this. >>>>>>>>>>>> Thanks >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile >>>>>>>>>>>> index 0cfd6da784f84..cf3a383c1799d 100644 >>>>>>>>>>>> --- a/arch/riscv/kernel/vdso/Makefile >>>>>>>>>>>> +++ b/arch/riscv/kernel/vdso/Makefile >>>>>>>>>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os >>>>>>>>>>>> # Disable gcov profiling for VDSO code >>>>>>>>>>>> GCOV_PROFILE := n >>>>>>>>>>>> KCOV_INSTRUMENT := n >>>>>>>>>>>> +KASAN_SANITIZE := n >>>>>>>>>>>> >>>>>>>>>>>> # Force dependency >>>>>>>>>>>> $(obj)/vdso.o: $(obj)/vdso.so >>>>>>>>>> >>>>>>>>>> What's weird is that I don't have any issue without this patch with the >>>>>>>>>> following config whereas it indeed seems required for KASAN. But when >>>>>>>>>> looking at the segfaults you got earlier, the segfault address is 0xbb0 >>>>>>>>>> and the cause is an instruction page fault: this address is the PLT base >>>>>>>>>> address in vdso.so and an instruction page fault would mean that someone >>>>>>>>>> tried to jump at this address, which is weird. At first sight, that does >>>>>>>>>> not seem related to your patch above, but clearly I may be wrong. >>>>>>>>>> >>>>>>>>>> Tobias, did you observe the same segfaults as Dmitry ? >>>>>>>>> >>>>>>>>> >>>>>>>>> I noticed that not all buildroot images use VDSO, it seems to be >>>>>>>>> dependent on libc settings (at least I think I changed it in the >>>>>>>>> past). >>>>>>>> >>>>>>>> Ok, I used uClibc but then when using glibc, I have the same segfaults, >>>>>>>> only when KASAN is enabled. And your patch fixes the problem. I will try >>>>>>>> to take a look later to better understand the problem. >>>>>>>> >>>>>>>>> I also booted an image completely successfully including dhcpd/sshd >>>>>>>>> start, but then my executable crashed in clock_gettime. The executable >>>>>>>>> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static" >>>>>>>>> (10.2.1). >>>>>>>>> >>>>>>>>> >>>>>>>>>>> Second issue I am seeing seems to be related to text segment size. >>>>>>>>>>> I check out v5.11 and use this config: >>>>>>>>>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 >>>>>>>>>> >>>>>>>>>> This config gave my laptop a hard time ! Finally I was able to boot >>>>>>>>>> correctly to userspace, but I realized I used my sv48 branch...Either I >>>>>>>>>> fixed your issue along the way or I can't reproduce it, I'll give it a >>>>>>>>>> try tomorrow. >>>>>>>>> >>>>>>>>> Where is your branch? I could also test in my setup on your branch. >>>>>>>>> >>>>>>>> >>>>>>>> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2 >>>>>>>> here: https://github.com/AlexGhiti/riscv-linux.git >>>>>>> >>>>>>> No, it does not work for me. >>>>>>> >>>>>>> Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD, >>>>>>> int/alex/riscv_kernel_end_of_address_space_v2) >>>>>>> Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt >>>>>>> >>>>>>> riscv64-linux-gnu-gcc -v >>>>>>> gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1) >>>>>>> >>>>>>> qemu-system-riscv64 --version >>>>>>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) >>>>>>> >>>>>>> qemu-system-riscv64 \ >>>>>>> -machine virt -smp 2 -m 2G \ >>>>>>> -device virtio-blk-device,drive=hd0 \ >>>>>>> -drive file=image-riscv64,if=none,format=raw,id=hd0 \ >>>>>>> -kernel arch/riscv/boot/Image \ >>>>>>> -nographic \ >>>>>>> -device virtio-rng-device,rng=rng0 -object >>>>>>> rng-random,filename=/dev/urandom,id=rng0 \ >>>>>>> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device >>>>>>> virtio-net-device,netdev=net0 \ >>>>>>> -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic >>>>>>> panic_on_warn=1 panic=86400 earlycon" >>>>>> >>>>>> It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I >>>>>> don't think that changes anything at runtime). But your above command >>>>>> line does not work for me as it appears you do not load any firmware, if >>>>>> I add -bios images/fw_jump.elf, it works. But then I don't know where >>>>>> your opensbi output below comes from... >>>>>> >>>>>> And regarding your issue with calling clock_gettime 'directly' compared >>>>>> to using the syscall, I have the same consistent output from both calls. >>>>>> >>>>>> I have an older gcc (9.3.0) and the same qemu. I think what is missing >>>>>> here is your buildroot config, so that we have the exact same >>>>>> environment: could you post your buildroot config as well ? >>>>> >>>>> I don't think the image is relevant because I don't even get to kernel >>>>> code. If the kernel will complain about no init later, that's fine. >>>>> Re bios, this version of qemu already has OpenSBI bios builtin, you >>>>> can pass -bios default, but that's, well, the default :) >>>>> Here are more reproducible repro instructions that capture gcc and >>>>> qemu. I think gcc version may be potentially relevant as I suspect >>>>> code size. >>>>> >>>>> >>>>> curl https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt >>>>>> $KERNEL_SRC/.config >>>>> docker pull gcr.io/syzkaller/syzbot >>>>> docker run -it -v $KERNEL_SRC:/kernel gcr.io/syzkaller/syzbot >>>>> cd /kernel >>>>> make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- olddefconfig >>>>> make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- >>>>> qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel >>>>> arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial >>>>> console=ttyS0" >>>>> [this does not, only OpenSBI output] >>>>> >>>> >>>> Indeed the issue was code size, please find the fix below. I will send a >>>> proper patch once I made sure the fix is the right one, but I'm pretty >>>> confident, there's no reason to limit the mapping size to 128MB whereas >>>> we have a whole pgdir. >>> >>> Great you get to the bottom of this! >>> Riscv kernels are going to be YUGE! >> >> IIRC I tried that a while ago and it didn't work. It's possible I was just >> running into some other bug, but I'm just build testing allyesconfig as opposed >> to boot testing it. >> >> If you've got a setup that does boot I'm happy to take a patch, though. It'll >> at least be one step forward. > > > > OK, it's getting better. Nice :) > The next issue is called "512 bytes should be enough for everyone!" :) > https://elixir.bootlin.com/linux/v5.12-rc2/source/include/uapi/asm-generic/setup.h#L5 > Most other arches redefine it to something bigger: > https://elixir.bootlin.com/linux/v5.12-rc2/source/arch/s390/include/uapi/asm/setup.h#L10 > even arm32 redefines it. > I am not sure the default is even reasonable anymore. Some archs override this value to 256, but git blame shows this is (very) old. I agree that 512 as default seems low. > Failure mode is > also not nice (silent truncation). Agreed, maybe we could still have the default value and checks the terminating null character is somewhere and bugs if not, I'll take a look. > We are trying to pass this: > > earlyprintk=serial oops=panic nmi_watchdog=panic panic=86400 > net.ifnames=0 sysctl.kernel.hung_task_all_cpu_backtrace=1 > ima_policy=tcb kvm-intel.nested=1 nf-conntrack-ftp.ports=20000 > nf-conntrack-tftp.ports=20000 nf-conntrack-sip.ports=20000 > nf-conntrack-irc.ports=20000 nf-conntrack-sane.ports=20000 > vivid.n_devs=16 vivid.multiplanar=1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2 > netrom.nr_ndevs=16 rose.rose_ndevs=16 spec_store_bypass_disable=prctl > numa=fake=2 nopcid dummy_hcd.num=8 binder.debug_mask=0 > rcupdate.rcu_expedited=1 watchdog_thresh=165 > workqueue.watchdog_thresh=420 panic_on_warn=1 > > The last part gets truncated and we are getting false workqueue watchdog stalls. > > Could you please increase it? I will propose a patchset that increases the default value and cleans archs up accordingly too. Thanks again, Alex > > _______________________________________________ > linux-riscv mailing list > linux-riscv@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv > _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2021-03-09 19:49 ` Alex Ghiti @ 2021-03-10 17:25 ` Dmitry Vyukov -1 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-03-10 17:25 UTC (permalink / raw) To: Alex Ghiti Cc: Palmer Dabbelt, Albert Ou, Bjorn Topel, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv On Tue, Mar 9, 2021 at 8:49 PM Alex Ghiti <alex@ghiti.fr> wrote: > > Le 3/9/21 à 12:11 PM, Dmitry Vyukov a écrit : > > On Fri, Feb 19, 2021 at 11:26 PM 'Palmer Dabbelt' via syzkaller > > <syzkaller@googlegroups.com> wrote: > >> > >> On Fri, 19 Feb 2021 10:53:43 PST (-0800), dvyukov@google.com wrote: > >>> On Fri, Feb 19, 2021 at 6:01 PM Alex Ghiti <alex@ghiti.fr> wrote: > >>>> > >>>> Hi Dmitry, > >>>> > >>>> Le 2/18/21 à 6:36 AM, Dmitry Vyukov a écrit : > >>>>> On Thu, Feb 18, 2021 at 8:54 AM Alex Ghiti <alex@ghiti.fr> wrote: > >>>>>> > >>>>>> Hi Dmitry, > >>>>>> > >>>>>>> On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote: > >>>>>>>> > >>>>>>>> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit : > >>>>>>>>> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote: > >>>>>>>>>> > >>>>>>>>>> Hi Dmitry, > >>>>>>>>>> > >>>>>>>>>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit : > >>>>>>>>>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: > >>>>>>>>>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your > >>>>>>>>>>>>>> issue: I built a kernel on top of the branch riscv/fixes using > >>>>>>>>>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config > >>>>>>>>>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of > >>>>>>>>>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function > >>>>>>>>>>>>>> is used in virt_addr_valid) but not the segfaults you describe. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Hi Alex, > >>>>>>>>>>>>> > >>>>>>>>>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong > >>>>>>>>>>>>> with my build, though, I did 'make clean' before doing. But at the > >>>>>>>>>>>>> same time it worked back in June... > >>>>>>>>>>>>> > >>>>>>>>>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a > >>>>>>>>>>>>> syzbot instance on riscv. If there a WARNING during boot then the > >>>>>>>>>>>>> kernel will be marked as broken. No further testing will happen. > >>>>>>>>>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or > >>>>>>>>>>>>> replace it with pr_err. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Hi, > >>>>>>>>>>>> > >>>>>>>>>>>> I've localized one issue with riscv/KASAN: > >>>>>>>>>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I > >>>>>>>>>>>> saw earlier. The following patch fixes it. > >>>>>>>>>>>> Could somebody please upstream this fix? I don't know how to add/run > >>>>>>>>>>>> tests for this. > >>>>>>>>>>>> Thanks > >>>>>>>>>>>> > >>>>>>>>>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile > >>>>>>>>>>>> index 0cfd6da784f84..cf3a383c1799d 100644 > >>>>>>>>>>>> --- a/arch/riscv/kernel/vdso/Makefile > >>>>>>>>>>>> +++ b/arch/riscv/kernel/vdso/Makefile > >>>>>>>>>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os > >>>>>>>>>>>> # Disable gcov profiling for VDSO code > >>>>>>>>>>>> GCOV_PROFILE := n > >>>>>>>>>>>> KCOV_INSTRUMENT := n > >>>>>>>>>>>> +KASAN_SANITIZE := n > >>>>>>>>>>>> > >>>>>>>>>>>> # Force dependency > >>>>>>>>>>>> $(obj)/vdso.o: $(obj)/vdso.so > >>>>>>>>>> > >>>>>>>>>> What's weird is that I don't have any issue without this patch with the > >>>>>>>>>> following config whereas it indeed seems required for KASAN. But when > >>>>>>>>>> looking at the segfaults you got earlier, the segfault address is 0xbb0 > >>>>>>>>>> and the cause is an instruction page fault: this address is the PLT base > >>>>>>>>>> address in vdso.so and an instruction page fault would mean that someone > >>>>>>>>>> tried to jump at this address, which is weird. At first sight, that does > >>>>>>>>>> not seem related to your patch above, but clearly I may be wrong. > >>>>>>>>>> > >>>>>>>>>> Tobias, did you observe the same segfaults as Dmitry ? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> I noticed that not all buildroot images use VDSO, it seems to be > >>>>>>>>> dependent on libc settings (at least I think I changed it in the > >>>>>>>>> past). > >>>>>>>> > >>>>>>>> Ok, I used uClibc but then when using glibc, I have the same segfaults, > >>>>>>>> only when KASAN is enabled. And your patch fixes the problem. I will try > >>>>>>>> to take a look later to better understand the problem. > >>>>>>>> > >>>>>>>>> I also booted an image completely successfully including dhcpd/sshd > >>>>>>>>> start, but then my executable crashed in clock_gettime. The executable > >>>>>>>>> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static" > >>>>>>>>> (10.2.1). > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>>> Second issue I am seeing seems to be related to text segment size. > >>>>>>>>>>> I check out v5.11 and use this config: > >>>>>>>>>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 > >>>>>>>>>> > >>>>>>>>>> This config gave my laptop a hard time ! Finally I was able to boot > >>>>>>>>>> correctly to userspace, but I realized I used my sv48 branch...Either I > >>>>>>>>>> fixed your issue along the way or I can't reproduce it, I'll give it a > >>>>>>>>>> try tomorrow. > >>>>>>>>> > >>>>>>>>> Where is your branch? I could also test in my setup on your branch. > >>>>>>>>> > >>>>>>>> > >>>>>>>> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2 > >>>>>>>> here: https://github.com/AlexGhiti/riscv-linux.git > >>>>>>> > >>>>>>> No, it does not work for me. > >>>>>>> > >>>>>>> Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD, > >>>>>>> int/alex/riscv_kernel_end_of_address_space_v2) > >>>>>>> Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt > >>>>>>> > >>>>>>> riscv64-linux-gnu-gcc -v > >>>>>>> gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1) > >>>>>>> > >>>>>>> qemu-system-riscv64 --version > >>>>>>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) > >>>>>>> > >>>>>>> qemu-system-riscv64 \ > >>>>>>> -machine virt -smp 2 -m 2G \ > >>>>>>> -device virtio-blk-device,drive=hd0 \ > >>>>>>> -drive file=image-riscv64,if=none,format=raw,id=hd0 \ > >>>>>>> -kernel arch/riscv/boot/Image \ > >>>>>>> -nographic \ > >>>>>>> -device virtio-rng-device,rng=rng0 -object > >>>>>>> rng-random,filename=/dev/urandom,id=rng0 \ > >>>>>>> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > >>>>>>> virtio-net-device,netdev=net0 \ > >>>>>>> -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > >>>>>>> panic_on_warn=1 panic=86400 earlycon" > >>>>>> > >>>>>> It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I > >>>>>> don't think that changes anything at runtime). But your above command > >>>>>> line does not work for me as it appears you do not load any firmware, if > >>>>>> I add -bios images/fw_jump.elf, it works. But then I don't know where > >>>>>> your opensbi output below comes from... > >>>>>> > >>>>>> And regarding your issue with calling clock_gettime 'directly' compared > >>>>>> to using the syscall, I have the same consistent output from both calls. > >>>>>> > >>>>>> I have an older gcc (9.3.0) and the same qemu. I think what is missing > >>>>>> here is your buildroot config, so that we have the exact same > >>>>>> environment: could you post your buildroot config as well ? > >>>>> > >>>>> I don't think the image is relevant because I don't even get to kernel > >>>>> code. If the kernel will complain about no init later, that's fine. > >>>>> Re bios, this version of qemu already has OpenSBI bios builtin, you > >>>>> can pass -bios default, but that's, well, the default :) > >>>>> Here are more reproducible repro instructions that capture gcc and > >>>>> qemu. I think gcc version may be potentially relevant as I suspect > >>>>> code size. > >>>>> > >>>>> > >>>>> curl https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt > >>>>>> $KERNEL_SRC/.config > >>>>> docker pull gcr.io/syzkaller/syzbot > >>>>> docker run -it -v $KERNEL_SRC:/kernel gcr.io/syzkaller/syzbot > >>>>> cd /kernel > >>>>> make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- olddefconfig > >>>>> make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- > >>>>> qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel > >>>>> arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial > >>>>> console=ttyS0" > >>>>> [this does not, only OpenSBI output] > >>>>> > >>>> > >>>> Indeed the issue was code size, please find the fix below. I will send a > >>>> proper patch once I made sure the fix is the right one, but I'm pretty > >>>> confident, there's no reason to limit the mapping size to 128MB whereas > >>>> we have a whole pgdir. > >>> > >>> Great you get to the bottom of this! > >>> Riscv kernels are going to be YUGE! > >> > >> IIRC I tried that a while ago and it didn't work. It's possible I was just > >> running into some other bug, but I'm just build testing allyesconfig as opposed > >> to boot testing it. > >> > >> If you've got a setup that does boot I'm happy to take a patch, though. It'll > >> at least be one step forward. > > > > > > > > OK, it's getting better. > > Nice :) > > > The next issue is called "512 bytes should be enough for everyone!" :) > > https://elixir.bootlin.com/linux/v5.12-rc2/source/include/uapi/asm-generic/setup.h#L5 > > Most other arches redefine it to something bigger: > > https://elixir.bootlin.com/linux/v5.12-rc2/source/arch/s390/include/uapi/asm/setup.h#L10 > > even arm32 redefines it. > > I am not sure the default is even reasonable anymore. > > Some archs override this value to 256, but git blame shows this is > (very) old. I agree that 512 as default seems low. > > > Failure mode is > > also not nice (silent truncation). > > Agreed, maybe we could still have the default value and checks the > terminating null character is somewhere and bugs if not, I'll take a look. > > > We are trying to pass this: > > > > earlyprintk=serial oops=panic nmi_watchdog=panic panic=86400 > > net.ifnames=0 sysctl.kernel.hung_task_all_cpu_backtrace=1 > > ima_policy=tcb kvm-intel.nested=1 nf-conntrack-ftp.ports=20000 > > nf-conntrack-tftp.ports=20000 nf-conntrack-sip.ports=20000 > > nf-conntrack-irc.ports=20000 nf-conntrack-sane.ports=20000 > > vivid.n_devs=16 vivid.multiplanar=1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2 > > netrom.nr_ndevs=16 rose.rose_ndevs=16 spec_store_bypass_disable=prctl > > numa=fake=2 nopcid dummy_hcd.num=8 binder.debug_mask=0 > > rcupdate.rcu_expedited=1 watchdog_thresh=165 > > workqueue.watchdog_thresh=420 panic_on_warn=1 > > > > The last part gets truncated and we are getting false workqueue watchdog stalls. > > > > Could you please increase it? > > I will propose a patchset that increases the default value and cleans > archs up accordingly too. I've worked around the command line length for now by reducing command line size. The syzbot instance is alive and kicking now: https://syzkaller.appspot.com/upstream?manager=ci-qemu2-riscv64 with the first issue found: https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69 https://lore.kernel.org/lkml/000000000000b74f1b05bd316729@google.com/T/#u in my local testing it was happening very frequently, so until it's fixed, the instance probably won't find lots of other issues. FTR, the instance config is stored here: https://github.com/google/syzkaller/blob/master/dashboard/config/linux/upstream-riscv64-kasan.config The instance uses qemu emulation and heavy debug configs, so it's quite slow and it makes sense to target it at riscv-specific parts of the kernel (rather than stress generic subsystems that are already stressed on x86). So the question is: what riscv-specific parts are there that we reach? Can you think of any qemu flags (cpu features, device emulation, pstore, etc)? Any kernel parts that we may be missing? Thanks ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2021-03-10 17:25 ` Dmitry Vyukov 0 siblings, 0 replies; 55+ messages in thread From: Dmitry Vyukov @ 2021-03-10 17:25 UTC (permalink / raw) To: Alex Ghiti Cc: Palmer Dabbelt, Albert Ou, Bjorn Topel, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv On Tue, Mar 9, 2021 at 8:49 PM Alex Ghiti <alex@ghiti.fr> wrote: > > Le 3/9/21 à 12:11 PM, Dmitry Vyukov a écrit : > > On Fri, Feb 19, 2021 at 11:26 PM 'Palmer Dabbelt' via syzkaller > > <syzkaller@googlegroups.com> wrote: > >> > >> On Fri, 19 Feb 2021 10:53:43 PST (-0800), dvyukov@google.com wrote: > >>> On Fri, Feb 19, 2021 at 6:01 PM Alex Ghiti <alex@ghiti.fr> wrote: > >>>> > >>>> Hi Dmitry, > >>>> > >>>> Le 2/18/21 à 6:36 AM, Dmitry Vyukov a écrit : > >>>>> On Thu, Feb 18, 2021 at 8:54 AM Alex Ghiti <alex@ghiti.fr> wrote: > >>>>>> > >>>>>> Hi Dmitry, > >>>>>> > >>>>>>> On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote: > >>>>>>>> > >>>>>>>> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit : > >>>>>>>>> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote: > >>>>>>>>>> > >>>>>>>>>> Hi Dmitry, > >>>>>>>>>> > >>>>>>>>>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit : > >>>>>>>>>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: > >>>>>>>>>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your > >>>>>>>>>>>>>> issue: I built a kernel on top of the branch riscv/fixes using > >>>>>>>>>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config > >>>>>>>>>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of > >>>>>>>>>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function > >>>>>>>>>>>>>> is used in virt_addr_valid) but not the segfaults you describe. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Hi Alex, > >>>>>>>>>>>>> > >>>>>>>>>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong > >>>>>>>>>>>>> with my build, though, I did 'make clean' before doing. But at the > >>>>>>>>>>>>> same time it worked back in June... > >>>>>>>>>>>>> > >>>>>>>>>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a > >>>>>>>>>>>>> syzbot instance on riscv. If there a WARNING during boot then the > >>>>>>>>>>>>> kernel will be marked as broken. No further testing will happen. > >>>>>>>>>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or > >>>>>>>>>>>>> replace it with pr_err. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Hi, > >>>>>>>>>>>> > >>>>>>>>>>>> I've localized one issue with riscv/KASAN: > >>>>>>>>>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I > >>>>>>>>>>>> saw earlier. The following patch fixes it. > >>>>>>>>>>>> Could somebody please upstream this fix? I don't know how to add/run > >>>>>>>>>>>> tests for this. > >>>>>>>>>>>> Thanks > >>>>>>>>>>>> > >>>>>>>>>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile > >>>>>>>>>>>> index 0cfd6da784f84..cf3a383c1799d 100644 > >>>>>>>>>>>> --- a/arch/riscv/kernel/vdso/Makefile > >>>>>>>>>>>> +++ b/arch/riscv/kernel/vdso/Makefile > >>>>>>>>>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os > >>>>>>>>>>>> # Disable gcov profiling for VDSO code > >>>>>>>>>>>> GCOV_PROFILE := n > >>>>>>>>>>>> KCOV_INSTRUMENT := n > >>>>>>>>>>>> +KASAN_SANITIZE := n > >>>>>>>>>>>> > >>>>>>>>>>>> # Force dependency > >>>>>>>>>>>> $(obj)/vdso.o: $(obj)/vdso.so > >>>>>>>>>> > >>>>>>>>>> What's weird is that I don't have any issue without this patch with the > >>>>>>>>>> following config whereas it indeed seems required for KASAN. But when > >>>>>>>>>> looking at the segfaults you got earlier, the segfault address is 0xbb0 > >>>>>>>>>> and the cause is an instruction page fault: this address is the PLT base > >>>>>>>>>> address in vdso.so and an instruction page fault would mean that someone > >>>>>>>>>> tried to jump at this address, which is weird. At first sight, that does > >>>>>>>>>> not seem related to your patch above, but clearly I may be wrong. > >>>>>>>>>> > >>>>>>>>>> Tobias, did you observe the same segfaults as Dmitry ? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> I noticed that not all buildroot images use VDSO, it seems to be > >>>>>>>>> dependent on libc settings (at least I think I changed it in the > >>>>>>>>> past). > >>>>>>>> > >>>>>>>> Ok, I used uClibc but then when using glibc, I have the same segfaults, > >>>>>>>> only when KASAN is enabled. And your patch fixes the problem. I will try > >>>>>>>> to take a look later to better understand the problem. > >>>>>>>> > >>>>>>>>> I also booted an image completely successfully including dhcpd/sshd > >>>>>>>>> start, but then my executable crashed in clock_gettime. The executable > >>>>>>>>> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static" > >>>>>>>>> (10.2.1). > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>>> Second issue I am seeing seems to be related to text segment size. > >>>>>>>>>>> I check out v5.11 and use this config: > >>>>>>>>>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178 > >>>>>>>>>> > >>>>>>>>>> This config gave my laptop a hard time ! Finally I was able to boot > >>>>>>>>>> correctly to userspace, but I realized I used my sv48 branch...Either I > >>>>>>>>>> fixed your issue along the way or I can't reproduce it, I'll give it a > >>>>>>>>>> try tomorrow. > >>>>>>>>> > >>>>>>>>> Where is your branch? I could also test in my setup on your branch. > >>>>>>>>> > >>>>>>>> > >>>>>>>> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2 > >>>>>>>> here: https://github.com/AlexGhiti/riscv-linux.git > >>>>>>> > >>>>>>> No, it does not work for me. > >>>>>>> > >>>>>>> Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD, > >>>>>>> int/alex/riscv_kernel_end_of_address_space_v2) > >>>>>>> Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt > >>>>>>> > >>>>>>> riscv64-linux-gnu-gcc -v > >>>>>>> gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1) > >>>>>>> > >>>>>>> qemu-system-riscv64 --version > >>>>>>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3) > >>>>>>> > >>>>>>> qemu-system-riscv64 \ > >>>>>>> -machine virt -smp 2 -m 2G \ > >>>>>>> -device virtio-blk-device,drive=hd0 \ > >>>>>>> -drive file=image-riscv64,if=none,format=raw,id=hd0 \ > >>>>>>> -kernel arch/riscv/boot/Image \ > >>>>>>> -nographic \ > >>>>>>> -device virtio-rng-device,rng=rng0 -object > >>>>>>> rng-random,filename=/dev/urandom,id=rng0 \ > >>>>>>> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device > >>>>>>> virtio-net-device,netdev=net0 \ > >>>>>>> -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic > >>>>>>> panic_on_warn=1 panic=86400 earlycon" > >>>>>> > >>>>>> It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I > >>>>>> don't think that changes anything at runtime). But your above command > >>>>>> line does not work for me as it appears you do not load any firmware, if > >>>>>> I add -bios images/fw_jump.elf, it works. But then I don't know where > >>>>>> your opensbi output below comes from... > >>>>>> > >>>>>> And regarding your issue with calling clock_gettime 'directly' compared > >>>>>> to using the syscall, I have the same consistent output from both calls. > >>>>>> > >>>>>> I have an older gcc (9.3.0) and the same qemu. I think what is missing > >>>>>> here is your buildroot config, so that we have the exact same > >>>>>> environment: could you post your buildroot config as well ? > >>>>> > >>>>> I don't think the image is relevant because I don't even get to kernel > >>>>> code. If the kernel will complain about no init later, that's fine. > >>>>> Re bios, this version of qemu already has OpenSBI bios builtin, you > >>>>> can pass -bios default, but that's, well, the default :) > >>>>> Here are more reproducible repro instructions that capture gcc and > >>>>> qemu. I think gcc version may be potentially relevant as I suspect > >>>>> code size. > >>>>> > >>>>> > >>>>> curl https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt > >>>>>> $KERNEL_SRC/.config > >>>>> docker pull gcr.io/syzkaller/syzbot > >>>>> docker run -it -v $KERNEL_SRC:/kernel gcr.io/syzkaller/syzbot > >>>>> cd /kernel > >>>>> make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- olddefconfig > >>>>> make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- > >>>>> qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel > >>>>> arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial > >>>>> console=ttyS0" > >>>>> [this does not, only OpenSBI output] > >>>>> > >>>> > >>>> Indeed the issue was code size, please find the fix below. I will send a > >>>> proper patch once I made sure the fix is the right one, but I'm pretty > >>>> confident, there's no reason to limit the mapping size to 128MB whereas > >>>> we have a whole pgdir. > >>> > >>> Great you get to the bottom of this! > >>> Riscv kernels are going to be YUGE! > >> > >> IIRC I tried that a while ago and it didn't work. It's possible I was just > >> running into some other bug, but I'm just build testing allyesconfig as opposed > >> to boot testing it. > >> > >> If you've got a setup that does boot I'm happy to take a patch, though. It'll > >> at least be one step forward. > > > > > > > > OK, it's getting better. > > Nice :) > > > The next issue is called "512 bytes should be enough for everyone!" :) > > https://elixir.bootlin.com/linux/v5.12-rc2/source/include/uapi/asm-generic/setup.h#L5 > > Most other arches redefine it to something bigger: > > https://elixir.bootlin.com/linux/v5.12-rc2/source/arch/s390/include/uapi/asm/setup.h#L10 > > even arm32 redefines it. > > I am not sure the default is even reasonable anymore. > > Some archs override this value to 256, but git blame shows this is > (very) old. I agree that 512 as default seems low. > > > Failure mode is > > also not nice (silent truncation). > > Agreed, maybe we could still have the default value and checks the > terminating null character is somewhere and bugs if not, I'll take a look. > > > We are trying to pass this: > > > > earlyprintk=serial oops=panic nmi_watchdog=panic panic=86400 > > net.ifnames=0 sysctl.kernel.hung_task_all_cpu_backtrace=1 > > ima_policy=tcb kvm-intel.nested=1 nf-conntrack-ftp.ports=20000 > > nf-conntrack-tftp.ports=20000 nf-conntrack-sip.ports=20000 > > nf-conntrack-irc.ports=20000 nf-conntrack-sane.ports=20000 > > vivid.n_devs=16 vivid.multiplanar=1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2 > > netrom.nr_ndevs=16 rose.rose_ndevs=16 spec_store_bypass_disable=prctl > > numa=fake=2 nopcid dummy_hcd.num=8 binder.debug_mask=0 > > rcupdate.rcu_expedited=1 watchdog_thresh=165 > > workqueue.watchdog_thresh=420 panic_on_warn=1 > > > > The last part gets truncated and we are getting false workqueue watchdog stalls. > > > > Could you please increase it? > > I will propose a patchset that increases the default value and cleans > archs up accordingly too. I've worked around the command line length for now by reducing command line size. The syzbot instance is alive and kicking now: https://syzkaller.appspot.com/upstream?manager=ci-qemu2-riscv64 with the first issue found: https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69 https://lore.kernel.org/lkml/000000000000b74f1b05bd316729@google.com/T/#u in my local testing it was happening very frequently, so until it's fixed, the instance probably won't find lots of other issues. FTR, the instance config is stored here: https://github.com/google/syzkaller/blob/master/dashboard/config/linux/upstream-riscv64-kasan.config The instance uses qemu emulation and heavy debug configs, so it's quite slow and it makes sense to target it at riscv-specific parts of the kernel (rather than stress generic subsystems that are already stressed on x86). So the question is: what riscv-specific parts are there that we reach? Can you think of any qemu flags (cpu features, device emulation, pstore, etc)? Any kernel parts that we may be missing? Thanks _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot 2021-02-16 11:17 ` Dmitry Vyukov @ 2021-02-16 17:35 ` Tobias Klauser -1 siblings, 0 replies; 55+ messages in thread From: Tobias Klauser @ 2021-02-16 17:35 UTC (permalink / raw) To: Dmitry Vyukov Cc: Alex Ghiti, Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, linux-riscv On 2021-02-16 at 12:17:30 +0100, Dmitry Vyukov <dvyukov@google.com> wrote: > On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > I was fixing KASAN support for my sv48 patchset so I took a look at your > > > issue: I built a kernel on top of the branch riscv/fixes using > > > https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config > > > and Buildroot 2020.11. I have the warnings regarding the use of > > > __virt_to_phys on wrong addresses (but that's normal since this function > > > is used in virt_addr_valid) but not the segfaults you describe. > > > > Hi Alex, > > > > Let me try to rebuild buildroot image. Maybe there was something wrong > > with my build, though, I did 'make clean' before doing. But at the > > same time it worked back in June... > > > > Re WARNINGs, they indicate kernel bugs. I am working on setting up a > > syzbot instance on riscv. If there a WARNING during boot then the > > kernel will be marked as broken. No further testing will happen. > > Is it a mis-use of WARN_ON? If so, could anybody please remove it or > > replace it with pr_err. > > > Hi, > > I've localized one issue with riscv/KASAN: > KASAN breaks VDSO and that's I think the root cause of weird faults I > saw earlier. The following patch fixes it. > Could somebody please upstream this fix? I don't know how to add/run > tests for this. Thanks. I've tested the fix locally using vDSO selftests and sent the fix upstream [1] [1] https://lore.kernel.org/linux-riscv/20210216173305.2500-1-tklauser@distanz.ch/T/#u ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: riscv+KASAN does not boot @ 2021-02-16 17:35 ` Tobias Klauser 0 siblings, 0 replies; 55+ messages in thread From: Tobias Klauser @ 2021-02-16 17:35 UTC (permalink / raw) To: Dmitry Vyukov Cc: Albert Ou, Alex Ghiti, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller, Andreas Schwab, Paul Walmsley, linux-riscv On 2021-02-16 at 12:17:30 +0100, Dmitry Vyukov <dvyukov@google.com> wrote: > On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > > I was fixing KASAN support for my sv48 patchset so I took a look at your > > > issue: I built a kernel on top of the branch riscv/fixes using > > > https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config > > > and Buildroot 2020.11. I have the warnings regarding the use of > > > __virt_to_phys on wrong addresses (but that's normal since this function > > > is used in virt_addr_valid) but not the segfaults you describe. > > > > Hi Alex, > > > > Let me try to rebuild buildroot image. Maybe there was something wrong > > with my build, though, I did 'make clean' before doing. But at the > > same time it worked back in June... > > > > Re WARNINGs, they indicate kernel bugs. I am working on setting up a > > syzbot instance on riscv. If there a WARNING during boot then the > > kernel will be marked as broken. No further testing will happen. > > Is it a mis-use of WARN_ON? If so, could anybody please remove it or > > replace it with pr_err. > > > Hi, > > I've localized one issue with riscv/KASAN: > KASAN breaks VDSO and that's I think the root cause of weird faults I > saw earlier. The following patch fixes it. > Could somebody please upstream this fix? I don't know how to add/run > tests for this. Thanks. I've tested the fix locally using vDSO selftests and sent the fix upstream [1] [1] https://lore.kernel.org/linux-riscv/20210216173305.2500-1-tklauser@distanz.ch/T/#u _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 55+ messages in thread
end of thread, other threads:[~2021-03-10 17:26 UTC | newest] Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-12-25 14:55 riscv+KASAN does not boot Dmitry Vyukov 2020-12-25 14:55 ` Dmitry Vyukov 2020-12-25 16:58 ` Andreas Schwab 2020-12-25 16:58 ` Andreas Schwab 2020-12-25 17:13 ` Dmitry Vyukov 2020-12-25 17:13 ` Dmitry Vyukov 2021-01-14 4:57 ` Palmer Dabbelt 2021-01-14 4:57 ` Palmer Dabbelt 2021-01-14 9:23 ` Dmitry Vyukov 2021-01-14 9:23 ` Dmitry Vyukov 2021-01-14 10:24 ` Dmitry Vyukov 2021-01-14 10:24 ` Dmitry Vyukov 2021-01-14 11:24 ` Dmitry Vyukov 2021-01-14 11:24 ` Dmitry Vyukov 2021-01-18 14:53 ` Tobias Klauser 2021-01-18 14:53 ` Tobias Klauser 2021-01-18 15:05 ` Dmitry Vyukov 2021-01-18 15:05 ` Dmitry Vyukov 2021-01-18 15:43 ` Dmitry Vyukov 2021-01-18 15:43 ` Dmitry Vyukov 2021-01-29 7:45 ` Alex Ghiti 2021-01-29 7:45 ` Alex Ghiti 2021-01-29 8:11 ` Dmitry Vyukov 2021-02-16 11:17 ` Dmitry Vyukov 2021-02-16 11:17 ` Dmitry Vyukov 2021-02-16 11:25 ` Dmitry Vyukov 2021-02-16 11:25 ` Dmitry Vyukov 2021-02-16 13:45 ` Dmitry Vyukov 2021-02-16 13:45 ` Dmitry Vyukov 2021-02-16 20:42 ` Alex Ghiti 2021-02-16 20:42 ` Alex Ghiti 2021-02-17 4:42 ` Dmitry Vyukov 2021-02-17 4:42 ` Dmitry Vyukov 2021-02-17 16:36 ` Alex Ghiti 2021-02-17 16:36 ` Alex Ghiti 2021-02-17 17:34 ` Dmitry Vyukov 2021-02-17 17:34 ` Dmitry Vyukov 2021-02-18 7:54 ` Alex Ghiti 2021-02-18 7:54 ` Alex Ghiti 2021-02-18 11:36 ` Dmitry Vyukov 2021-02-18 11:36 ` Dmitry Vyukov 2021-02-19 17:01 ` Alex Ghiti 2021-02-19 17:01 ` Alex Ghiti 2021-02-19 18:53 ` Dmitry Vyukov 2021-02-19 18:53 ` Dmitry Vyukov 2021-02-19 22:26 ` Palmer Dabbelt 2021-02-19 22:26 ` Palmer Dabbelt 2021-03-09 17:11 ` Dmitry Vyukov 2021-03-09 17:11 ` Dmitry Vyukov 2021-03-09 19:49 ` Alex Ghiti 2021-03-09 19:49 ` Alex Ghiti 2021-03-10 17:25 ` Dmitry Vyukov 2021-03-10 17:25 ` Dmitry Vyukov 2021-02-16 17:35 ` Tobias Klauser 2021-02-16 17:35 ` Tobias Klauser
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.