linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* riscv+KASAN does not boot
@ 2020-12-25 14:55 Dmitry Vyukov
  2020-12-25 16:58 ` Andreas Schwab
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Vyukov @ 2020-12-25 14:55 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, LKML, nylon7
  Cc: Björn Töpel, Tobias Klauser, syzkaller, Palmer Dabbelt

Hello,

I am considering setting up a syzbot instance for riscv arch (using
qemu emulation) and testing kernel config/image/etc. I can boot
defconfig+kvmconfig riscv kernel, but so far I can't get a booting
CONFIG_KASAN+CONFIG+KCOV kernel.

But first of all I would like to ask if the riscv port is stable
enough at this point and if there is interest in continuous fuzzing
and receiving bugs? If there is no interest, then the rest is not
worth spending time on.
Second, what git tree/branch should be used for testing (to find bugs
sooner and get fixes faster)?
Currently it seems that riscv/fixes is the most up-to-date branch with
most fixes, is it the right one?

Re non-booting kernel problem. If I do:
defconfig+kvm_guest.config+ scripts/config -e KASAN -e KASAN_INLINE
I only see OpenSBI banner and then nothing happens (qemu consumes 100% CPU).
I've tried on v5.10, current upstream head (71c5f03154ac) and
riscv/fixes (20620d72c31e). The result is the same.

I see this recent patch from Nylon:
https://lore.kernel.org/linux-riscv/1606727599-8598-1-git-send-email-nylon7@andestech.com/
which suggests that KASAN is working for Nylon.

I am using qemu 5.1.0 as:

qemu-system-riscv64 \
-machine virt -bios default -smp 1 -m 2G \
-device virtio-blk-device,drive=hd0 \
-drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \
-kernel arch/riscv/boot/Image \
-nographic \
-device virtio-rng-device,rng=rng0 -object
rng-random,filename=/dev/urandom,id=rng0 \
-netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
virtio-net-device,netdev=net0 \
-append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
panic_on_warn=1 panic=86400"


I've also tried this config (slightly larger than defconfig, but does
NOT include KASAN nor KCOV):
https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt
and this is the ultimate large config that I would like to use:
https://gist.githubusercontent.com/dvyukov/2b4e621d5252dbc5a2f28802b8d71d95/raw/3ef2b8d8eda60d3acfc4bf7916ffb9e77671ed76/gistfile1.txt

Both of them hang after the OpenSBI banner in the same way.

Is it a known issue? Am I doing something wrong?

TIA

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2020-12-25 14:55 riscv+KASAN does not boot Dmitry Vyukov
@ 2020-12-25 16:58 ` Andreas Schwab
  2020-12-25 17:13   ` Dmitry Vyukov
  0 siblings, 1 reply; 27+ messages in thread
From: Andreas Schwab @ 2020-12-25 16:58 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, LKML,
	nylon7, Palmer Dabbelt, Björn Töpel, Tobias Klauser,
	syzkaller

On Dez 25 2020, Dmitry Vyukov wrote:

> qemu-system-riscv64 \
> -machine virt -bios default -smp 1 -m 2G \
> -device virtio-blk-device,drive=hd0 \
> -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \
> -kernel arch/riscv/boot/Image \
> -nographic \
> -device virtio-rng-device,rng=rng0 -object
> rng-random,filename=/dev/urandom,id=rng0 \
> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
> virtio-net-device,netdev=net0 \
> -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
> panic_on_warn=1 panic=86400"

Do you get more output with earlycon=sbi?

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2020-12-25 16:58 ` Andreas Schwab
@ 2020-12-25 17:13   ` Dmitry Vyukov
  2021-01-14  4:57     ` Palmer Dabbelt
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Vyukov @ 2020-12-25 17:13 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, LKML,
	nylon7, Palmer Dabbelt, Björn Töpel, Tobias Klauser,
	syzkaller

On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote:
>
> On Dez 25 2020, Dmitry Vyukov wrote:
>
> > qemu-system-riscv64 \
> > -machine virt -bios default -smp 1 -m 2G \
> > -device virtio-blk-device,drive=hd0 \
> > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \
> > -kernel arch/riscv/boot/Image \
> > -nographic \
> > -device virtio-rng-device,rng=rng0 -object
> > rng-random,filename=/dev/urandom,id=rng0 \
> > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
> > virtio-net-device,netdev=net0 \
> > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
> > panic_on_warn=1 panic=86400"
>
> Do you get more output with earlycon=sbi?

Hi Andreas,

For defconfig+kvm_guest.config+ scripts/config -e KASAN -e
KASAN_INLINE it actually gave me more output:


OpenSBI v0.7
   ____                    _____ ____ _____
  / __ \                  / ____|  _ \_   _|
 | |  | |_ __   ___ _ __ | (___ | |_) || |
 | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
 | |__| | |_) |  __/ | | |____) | |_) || |_
  \____/| .__/ \___|_| |_|_____/|____/_____|
        | |
        |_|

Platform Name          : QEMU Virt Machine
Platform HART Features : RV64ACDFIMSU
Current Hart           : 0
Firmware Base          : 0x80000000
Firmware Size          : 132 KB
Runtime SBI Version    : 0.2

MIDELEG : 0x0000000000000222
MEDELEG : 0x000000000000b109
PMP0    : 0x0000000080000000-0x000000008003ffff (A)
PMP1    : 0x0000000000000000-0xffffffffffffffff (A,R,W,X)
[    0.000000] Linux version 5.10.0-01370-g71c5f03154ac
(dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc
(Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17
SMP Fri Dec 25 18:10:12 CET 2020
[    0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
[    0.000000] earlycon: sbi0 at I/O port 0x0 (options '')
[    0.000000] printk: bootconsole [sbi0] enabled
[    0.000000] efi: UEFI not found.
[    0.000000] Zone ranges:
[    0.000000]   DMA32    [mem 0x0000000080200000-0x00000000ffffffff]
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000080200000-0x00000000ffffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff]
[    0.000000] SBI specification v0.2 detected
[    0.000000] SBI implementation ID=0x1 Version=0x7
[    0.000000] SBI v0.2 TIME extension detected
[    0.000000] SBI v0.2 IPI extension detected
[    0.000000] SBI v0.2 RFENCE extension detected
[    0.000000] software IO TLB: mapped [mem
0x00000000fa3f9000-0x00000000fe3f9000] (64MB)
[    0.000000] Unable to handle kernel paging request at virtual
address dfffffc810040000
[    0.000000] Oops [#1]
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
5.10.0-01370-g71c5f03154ac #17
[    0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0
[    0.000000]  gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 :
dfffffc810040000
[    0.000000]  t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 :
ffffffe001603f50
[    0.000000]  s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 :
0000000000000000
[    0.000000]  a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 :
0000000000000000
[    0.000000]  a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 :
0000000000000900
[    0.000000]  s2 : dfffffc820000000 s3 : dfffffc800000000 s4 :
0000000000000001
[    0.000000]  s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 :
dfffffc810040000
[    0.000000]  s8 : 0000000000000080 s9 : ffffffffffffffff s10:
ffffffe07a119000
[    0.000000]  s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 :
0000000000000001
[    0.000000]  t5 : ffffffc4001c150a t6 : ffffffe001603be8
[    0.000000] status: 0000000000000100 badaddr: dfffffc810040000
cause: 000000000000000f
[    0.000000] random: get_random_bytes called from
oops_exit+0x30/0x58 with crng_init=0
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Kernel panic - not syncing: Fatal exception
[    0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]---


But I first tried with a the kernel image I had in the dir, I think it
was this config (no KASAN):
https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt

and earlycon=sbi did not change anything (no output after OpenSBI).
So potentially there are 2 different problems.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2020-12-25 17:13   ` Dmitry Vyukov
@ 2021-01-14  4:57     ` Palmer Dabbelt
  2021-01-14  9:23       ` Dmitry Vyukov
  0 siblings, 1 reply; 27+ messages in thread
From: Palmer Dabbelt @ 2021-01-14  4:57 UTC (permalink / raw)
  To: dvyukov
  Cc: schwab, Paul Walmsley, aou, linux-riscv, linux-kernel, nylon7,
	Bjorn Topel, tklauser, syzkaller

On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote:
> On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote:
>>
>> On Dez 25 2020, Dmitry Vyukov wrote:
>>
>> > qemu-system-riscv64 \
>> > -machine virt -bios default -smp 1 -m 2G \
>> > -device virtio-blk-device,drive=hd0 \
>> > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \
>> > -kernel arch/riscv/boot/Image \
>> > -nographic \
>> > -device virtio-rng-device,rng=rng0 -object
>> > rng-random,filename=/dev/urandom,id=rng0 \
>> > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
>> > virtio-net-device,netdev=net0 \
>> > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
>> > panic_on_warn=1 panic=86400"
>>
>> Do you get more output with earlycon=sbi?
>
> Hi Andreas,
>
> For defconfig+kvm_guest.config+ scripts/config -e KASAN -e
> KASAN_INLINE it actually gave me more output:
>
>
> OpenSBI v0.7
>    ____                    _____ ____ _____
>   / __ \                  / ____|  _ \_   _|
>  | |  | |_ __   ___ _ __ | (___ | |_) || |
>  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
>  | |__| | |_) |  __/ | | |____) | |_) || |_
>   \____/| .__/ \___|_| |_|_____/|____/_____|
>         | |
>         |_|
>
> Platform Name          : QEMU Virt Machine
> Platform HART Features : RV64ACDFIMSU
> Current Hart           : 0
> Firmware Base          : 0x80000000
> Firmware Size          : 132 KB
> Runtime SBI Version    : 0.2
>
> MIDELEG : 0x0000000000000222
> MEDELEG : 0x000000000000b109
> PMP0    : 0x0000000080000000-0x000000008003ffff (A)
> PMP1    : 0x0000000000000000-0xffffffffffffffff (A,R,W,X)
> [    0.000000] Linux version 5.10.0-01370-g71c5f03154ac
> (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc
> (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17
> SMP Fri Dec 25 18:10:12 CET 2020
> [    0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
> [    0.000000] earlycon: sbi0 at I/O port 0x0 (options '')
> [    0.000000] printk: bootconsole [sbi0] enabled
> [    0.000000] efi: UEFI not found.
> [    0.000000] Zone ranges:
> [    0.000000]   DMA32    [mem 0x0000000080200000-0x00000000ffffffff]
> [    0.000000]   Normal   empty
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x0000000080200000-0x00000000ffffffff]
> [    0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff]
> [    0.000000] SBI specification v0.2 detected
> [    0.000000] SBI implementation ID=0x1 Version=0x7
> [    0.000000] SBI v0.2 TIME extension detected
> [    0.000000] SBI v0.2 IPI extension detected
> [    0.000000] SBI v0.2 RFENCE extension detected
> [    0.000000] software IO TLB: mapped [mem
> 0x00000000fa3f9000-0x00000000fe3f9000] (64MB)
> [    0.000000] Unable to handle kernel paging request at virtual
> address dfffffc810040000
> [    0.000000] Oops [#1]
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
> 5.10.0-01370-g71c5f03154ac #17
> [    0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0
> [    0.000000]  gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 :
> dfffffc810040000
> [    0.000000]  t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 :
> ffffffe001603f50
> [    0.000000]  s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 :
> 0000000000000000
> [    0.000000]  a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 :
> 0000000000000000
> [    0.000000]  a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 :
> 0000000000000900
> [    0.000000]  s2 : dfffffc820000000 s3 : dfffffc800000000 s4 :
> 0000000000000001
> [    0.000000]  s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 :
> dfffffc810040000
> [    0.000000]  s8 : 0000000000000080 s9 : ffffffffffffffff s10:
> ffffffe07a119000
> [    0.000000]  s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 :
> 0000000000000001
> [    0.000000]  t5 : ffffffc4001c150a t6 : ffffffe001603be8
> [    0.000000] status: 0000000000000100 badaddr: dfffffc810040000
> cause: 000000000000000f
> [    0.000000] random: get_random_bytes called from
> oops_exit+0x30/0x58 with crng_init=0
> [    0.000000] ---[ end trace 0000000000000000 ]---
> [    0.000000] Kernel panic - not syncing: Fatal exception
> [    0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]---
>
>
> But I first tried with a the kernel image I had in the dir, I think it
> was this config (no KASAN):
> https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt
>
> and earlycon=sbi did not change anything (no output after OpenSBI).
> So potentially there are 2 different problems.

Thanks for reporting this.  Looks like I'd forgotten to add a kasan config to
my tests.  There's one in there now, and it's passing as of the fix that Nylon
posted.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2021-01-14  4:57     ` Palmer Dabbelt
@ 2021-01-14  9:23       ` Dmitry Vyukov
  2021-01-14 10:24         ` Dmitry Vyukov
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Vyukov @ 2021-01-14  9:23 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Andreas Schwab, Paul Walmsley, Albert Ou, linux-riscv, LKML,
	nylon7, Bjorn Topel, Tobias Klauser, syzkaller

On Thu, Jan 14, 2021 at 5:57 AM Palmer Dabbelt <palmerdabbelt@google.com> wrote:
>
> On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote:
> > On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote:
> >>
> >> On Dez 25 2020, Dmitry Vyukov wrote:
> >>
> >> > qemu-system-riscv64 \
> >> > -machine virt -bios default -smp 1 -m 2G \
> >> > -device virtio-blk-device,drive=hd0 \
> >> > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \
> >> > -kernel arch/riscv/boot/Image \
> >> > -nographic \
> >> > -device virtio-rng-device,rng=rng0 -object
> >> > rng-random,filename=/dev/urandom,id=rng0 \
> >> > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
> >> > virtio-net-device,netdev=net0 \
> >> > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
> >> > panic_on_warn=1 panic=86400"
> >>
> >> Do you get more output with earlycon=sbi?
> >
> > Hi Andreas,
> >
> > For defconfig+kvm_guest.config+ scripts/config -e KASAN -e
> > KASAN_INLINE it actually gave me more output:
> >
> >
> > OpenSBI v0.7
> >    ____                    _____ ____ _____
> >   / __ \                  / ____|  _ \_   _|
> >  | |  | |_ __   ___ _ __ | (___ | |_) || |
> >  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
> >  | |__| | |_) |  __/ | | |____) | |_) || |_
> >   \____/| .__/ \___|_| |_|_____/|____/_____|
> >         | |
> >         |_|
> >
> > Platform Name          : QEMU Virt Machine
> > Platform HART Features : RV64ACDFIMSU
> > Current Hart           : 0
> > Firmware Base          : 0x80000000
> > Firmware Size          : 132 KB
> > Runtime SBI Version    : 0.2
> >
> > MIDELEG : 0x0000000000000222
> > MEDELEG : 0x000000000000b109
> > PMP0    : 0x0000000080000000-0x000000008003ffff (A)
> > PMP1    : 0x0000000000000000-0xffffffffffffffff (A,R,W,X)
> > [    0.000000] Linux version 5.10.0-01370-g71c5f03154ac
> > (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc
> > (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17
> > SMP Fri Dec 25 18:10:12 CET 2020
> > [    0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
> > [    0.000000] earlycon: sbi0 at I/O port 0x0 (options '')
> > [    0.000000] printk: bootconsole [sbi0] enabled
> > [    0.000000] efi: UEFI not found.
> > [    0.000000] Zone ranges:
> > [    0.000000]   DMA32    [mem 0x0000000080200000-0x00000000ffffffff]
> > [    0.000000]   Normal   empty
> > [    0.000000] Movable zone start for each node
> > [    0.000000] Early memory node ranges
> > [    0.000000]   node   0: [mem 0x0000000080200000-0x00000000ffffffff]
> > [    0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff]
> > [    0.000000] SBI specification v0.2 detected
> > [    0.000000] SBI implementation ID=0x1 Version=0x7
> > [    0.000000] SBI v0.2 TIME extension detected
> > [    0.000000] SBI v0.2 IPI extension detected
> > [    0.000000] SBI v0.2 RFENCE extension detected
> > [    0.000000] software IO TLB: mapped [mem
> > 0x00000000fa3f9000-0x00000000fe3f9000] (64MB)
> > [    0.000000] Unable to handle kernel paging request at virtual
> > address dfffffc810040000
> > [    0.000000] Oops [#1]
> > [    0.000000] Modules linked in:
> > [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
> > 5.10.0-01370-g71c5f03154ac #17
> > [    0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0
> > [    0.000000]  gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 :
> > dfffffc810040000
> > [    0.000000]  t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 :
> > ffffffe001603f50
> > [    0.000000]  s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 :
> > 0000000000000000
> > [    0.000000]  a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 :
> > 0000000000000000
> > [    0.000000]  a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 :
> > 0000000000000900
> > [    0.000000]  s2 : dfffffc820000000 s3 : dfffffc800000000 s4 :
> > 0000000000000001
> > [    0.000000]  s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 :
> > dfffffc810040000
> > [    0.000000]  s8 : 0000000000000080 s9 : ffffffffffffffff s10:
> > ffffffe07a119000
> > [    0.000000]  s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 :
> > 0000000000000001
> > [    0.000000]  t5 : ffffffc4001c150a t6 : ffffffe001603be8
> > [    0.000000] status: 0000000000000100 badaddr: dfffffc810040000
> > cause: 000000000000000f
> > [    0.000000] random: get_random_bytes called from
> > oops_exit+0x30/0x58 with crng_init=0
> > [    0.000000] ---[ end trace 0000000000000000 ]---
> > [    0.000000] Kernel panic - not syncing: Fatal exception
> > [    0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]---
> >
> >
> > But I first tried with a the kernel image I had in the dir, I think it
> > was this config (no KASAN):
> > https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt
> >
> > and earlycon=sbi did not change anything (no output after OpenSBI).
> > So potentially there are 2 different problems.
>
> Thanks for reporting this.  Looks like I'd forgotten to add a kasan config to
> my tests.  There's one in there now, and it's passing as of the fix that Nylon
> posted.

I can boot the KASAN kernel now on riscv/fixes.

Next problem: I've got only to:

[   90.498967][    T1] Run /sbin/init as init process
[   91.164353][ T4022] init[4022]: unhandled signal 11 code 0x1 at
0x0000000000000bb0 in busybox[10000+d7000]
[   91.179640][ T4022] CPU: 1 PID: 4022 Comm: init Not tainted
5.11.0-rc2-00012-g0983834a8393 #19
[   91.180853][ T4022] epc: 0000000000000bb0 ra : 0000003fccab09d0 sp
: 0000003fffa8c7b0
[   91.181861][ T4022]  gp : 00000000000e8d70 tp : 0000003fccaaf820 t0
: 000000000000001e
[   91.182810][ T4022]  t1 : 0000003fccab0bfc t2 : 000000000000000a s0
: 0000003fffa8c850
[   91.183749][ T4022]  s1 : 0000003fccab1070 a0 : 0000003fccab1070 a1
: 0000003fffa8c8c8
[   91.184689][ T4022]  a2 : 0000000000000001 a3 : 0000000000000020 a4
: 0000000000000000
[   91.185620][ T4022]  a5 : 0000000000000000 a6 : 0000003fcc9c4260 a7
: fffffffffffffffe
[   91.186566][ T4022]  s2 : 0000000000000000 s3 : 0000003fffa8c8c8 s4
: 0000003fccab1000
[   91.187500][ T4022]  s5 : 0000003fccab1078 s6 : 0000003fffa8c8d0 s7
: 0000000000000010
[   91.189672][ T4022]  s8 : 0000000000000016 s9 : 0000000000000000
s10: 0000003fffa8c8c8
[   91.190637][ T4022]  s11: 0000000000000000 t3 : 0000000000000bb0 t4
: 0000000000000000
[   91.191568][ T4022]  t5 : 0000003fffa8c360 t6 : 0000000000000000
[   91.192389][ T4022] status: 8000000000004020 badaddr:
0000000000000bb0 cause: 000000000000000c
[   91.201573][    T1] Kernel panic - not syncing: Attempted to kill
init! exitcode=0x0000000b
[   91.202906][    T1] CPU: 0 PID: 1 Comm: init Not tainted
5.11.0-rc2-00012-g0983834a8393 #19
[   91.204139][    T1] Call Trace:
[   91.204849][    T1] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0
[   91.206124][    T1] [<ffffffe00458b2d8>] show_stack+0x3a/0x46
[   91.207240][    T1] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180
[   91.208732][    T1] [<ffffffe00458b6a0>] panic+0x20a/0x5cc
[   91.209890][    T1] [<ffffffe00002eea4>] do_exit+0x1846/0x1874
[   91.211052][    T1] [<ffffffe00002efdc>] do_group_exit+0xa0/0x192
[   91.212224][    T1] [<ffffffe000047d30>] get_signal+0x2d6/0x13dc
[   91.213390][    T1] [<ffffffe000007eb0>] do_notify_resume+0xa8/0x912
[   91.214567][    T1] [<ffffffe00000559c>] ret_from_exception+0x0/0x14

The image is buildroot on 2020.11.x built with this script:
https://gist.githubusercontent.com/dvyukov/1a9a01ca2189e35175a021820c95b04d/raw/5c01d755e83f4eab0d56aa7dc84af3b2d5e80423/gistfile1.txt

Readelf for init shows the following (is it that [10000+d7000] address
is not .text at all?):

$ riscv64-linux-gnu-readelf --sections image/bin/busybox
There are 27 section headers, starting at offset 0xd7f20:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         0000000000010238  00000238
       0000000000000021  0000000000000000   A       0     0     1
  [ 2] .note.ABI-tag     NOTE             000000000001025c  0000025c
       0000000000000020  0000000000000000   A       0     0     4
  [ 3] .hash             HASH             0000000000010280  00000280
       00000000000009cc  0000000000000004   A       5     0     8
  [ 4] .gnu.hash         GNU_HASH         0000000000010c50  00000c50
       0000000000000ac8  0000000000000000   A       5     0     8
  [ 5] .dynsym           DYNSYM           0000000000011718  00001718
       00000000000021f0  0000000000000018   A       6     1     8
  [ 6] .dynstr           STRTAB           0000000000013908  00003908
       0000000000000c66  0000000000000000   A       0     0     1
  [ 7] .gnu.version      VERSYM           000000000001456e  0000456e
       00000000000002d4  0000000000000002   A       5     0     2
  [ 8] .gnu.version_r    VERNEED          0000000000014848  00004848
       0000000000000050  0000000000000000   A       6     2     8
  [ 9] .rela.dyn         RELA             0000000000014898  00004898
       00000000000000c0  0000000000000018   A       5     0     8
  [10] .rela.plt         RELA             0000000000014958  00004958
       00000000000020a0  0000000000000018  AI       5    22     8
  [11] .plt              PROGBITS         0000000000016a00  00006a00
       00000000000015e0  0000000000000010  AX       0     0     16
  [12] .text             PROGBITS         0000000000017fe0  00007fe0
       00000000000a3668  0000000000000000  AX       0     0     4
  [13] .rodata           PROGBITS         00000000000bb648  000ab648
       000000000002b076  0000000000000000   A       0     0     8
  [14] .sdata2           PROGBITS         00000000000e66c0  000d66c0
       0000000000000163  0000000000000000   A       0     0     8
  [15] .eh_frame_hdr     PROGBITS         00000000000e6824  000d6824
       0000000000000014  0000000000000000   A       0     0     4
  [16] .eh_frame         PROGBITS         00000000000e6838  000d6838
       000000000000002c  0000000000000000   A       0     0     8
  [17] .preinit_array    PREINIT_ARRAY    00000000000e7df8  000d6df8
       0000000000000008  0000000000000008  WA       0     0     1
  [18] .init_array       INIT_ARRAY       00000000000e7e00  000d6e00
       0000000000000008  0000000000000008  WA       0     0     8
  [19] .fini_array       FINI_ARRAY       00000000000e7e08  000d6e08
       0000000000000008  0000000000000008  WA       0     0     8
  [20] .dynamic          DYNAMIC          00000000000e7e10  000d6e10
       00000000000001f0  0000000000000010  WA       6     0     8
  [21] .data             PROGBITS         00000000000e8000  000d7000
       0000000000000240  0000000000000000  WA       0     0     8
  [22] .got              PROGBITS         00000000000e8240  000d7240
       0000000000000af8  0000000000000008  WA       0     0     8
  [23] .sdata            PROGBITS         00000000000e8d38  000d7d38
       0000000000000101  0000000000000000  WA       0     0     8
  [24] .sbss             NOBITS           00000000000e8e40  000d7e39
       000000000000017f  0000000000000000  WA       0     0     8
  [25] .bss              NOBITS           00000000000e8fc0  000d7e39
       00000000000005b0  0000000000000000  WA       0     0     8
  [26] .shstrtab         STRTAB           0000000000000000  000d7e39
       00000000000000e6  0000000000000000           0     0     1


Before I spent more time on this, am I doing anything obviously wrong?
Is it a known issue? Are there any fresh working recipes?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2021-01-14  9:23       ` Dmitry Vyukov
@ 2021-01-14 10:24         ` Dmitry Vyukov
  2021-01-14 11:24           ` Dmitry Vyukov
  2021-01-18 14:53           ` Tobias Klauser
  0 siblings, 2 replies; 27+ messages in thread
From: Dmitry Vyukov @ 2021-01-14 10:24 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Andreas Schwab, Paul Walmsley, Albert Ou, linux-riscv, LKML,
	nylon7, Bjorn Topel, Tobias Klauser, syzkaller

On Thu, Jan 14, 2021 at 10:23 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Thu, Jan 14, 2021 at 5:57 AM Palmer Dabbelt <palmerdabbelt@google.com> wrote:
> >
> > On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote:
> > > On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote:
> > >>
> > >> On Dez 25 2020, Dmitry Vyukov wrote:
> > >>
> > >> > qemu-system-riscv64 \
> > >> > -machine virt -bios default -smp 1 -m 2G \
> > >> > -device virtio-blk-device,drive=hd0 \
> > >> > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \
> > >> > -kernel arch/riscv/boot/Image \
> > >> > -nographic \
> > >> > -device virtio-rng-device,rng=rng0 -object
> > >> > rng-random,filename=/dev/urandom,id=rng0 \
> > >> > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
> > >> > virtio-net-device,netdev=net0 \
> > >> > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
> > >> > panic_on_warn=1 panic=86400"
> > >>
> > >> Do you get more output with earlycon=sbi?
> > >
> > > Hi Andreas,
> > >
> > > For defconfig+kvm_guest.config+ scripts/config -e KASAN -e
> > > KASAN_INLINE it actually gave me more output:
> > >
> > >
> > > OpenSBI v0.7
> > >    ____                    _____ ____ _____
> > >   / __ \                  / ____|  _ \_   _|
> > >  | |  | |_ __   ___ _ __ | (___ | |_) || |
> > >  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
> > >  | |__| | |_) |  __/ | | |____) | |_) || |_
> > >   \____/| .__/ \___|_| |_|_____/|____/_____|
> > >         | |
> > >         |_|
> > >
> > > Platform Name          : QEMU Virt Machine
> > > Platform HART Features : RV64ACDFIMSU
> > > Current Hart           : 0
> > > Firmware Base          : 0x80000000
> > > Firmware Size          : 132 KB
> > > Runtime SBI Version    : 0.2
> > >
> > > MIDELEG : 0x0000000000000222
> > > MEDELEG : 0x000000000000b109
> > > PMP0    : 0x0000000080000000-0x000000008003ffff (A)
> > > PMP1    : 0x0000000000000000-0xffffffffffffffff (A,R,W,X)
> > > [    0.000000] Linux version 5.10.0-01370-g71c5f03154ac
> > > (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc
> > > (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17
> > > SMP Fri Dec 25 18:10:12 CET 2020
> > > [    0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
> > > [    0.000000] earlycon: sbi0 at I/O port 0x0 (options '')
> > > [    0.000000] printk: bootconsole [sbi0] enabled
> > > [    0.000000] efi: UEFI not found.
> > > [    0.000000] Zone ranges:
> > > [    0.000000]   DMA32    [mem 0x0000000080200000-0x00000000ffffffff]
> > > [    0.000000]   Normal   empty
> > > [    0.000000] Movable zone start for each node
> > > [    0.000000] Early memory node ranges
> > > [    0.000000]   node   0: [mem 0x0000000080200000-0x00000000ffffffff]
> > > [    0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff]
> > > [    0.000000] SBI specification v0.2 detected
> > > [    0.000000] SBI implementation ID=0x1 Version=0x7
> > > [    0.000000] SBI v0.2 TIME extension detected
> > > [    0.000000] SBI v0.2 IPI extension detected
> > > [    0.000000] SBI v0.2 RFENCE extension detected
> > > [    0.000000] software IO TLB: mapped [mem
> > > 0x00000000fa3f9000-0x00000000fe3f9000] (64MB)
> > > [    0.000000] Unable to handle kernel paging request at virtual
> > > address dfffffc810040000
> > > [    0.000000] Oops [#1]
> > > [    0.000000] Modules linked in:
> > > [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
> > > 5.10.0-01370-g71c5f03154ac #17
> > > [    0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0
> > > [    0.000000]  gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 :
> > > dfffffc810040000
> > > [    0.000000]  t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 :
> > > ffffffe001603f50
> > > [    0.000000]  s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 :
> > > 0000000000000000
> > > [    0.000000]  a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 :
> > > 0000000000000000
> > > [    0.000000]  a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 :
> > > 0000000000000900
> > > [    0.000000]  s2 : dfffffc820000000 s3 : dfffffc800000000 s4 :
> > > 0000000000000001
> > > [    0.000000]  s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 :
> > > dfffffc810040000
> > > [    0.000000]  s8 : 0000000000000080 s9 : ffffffffffffffff s10:
> > > ffffffe07a119000
> > > [    0.000000]  s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 :
> > > 0000000000000001
> > > [    0.000000]  t5 : ffffffc4001c150a t6 : ffffffe001603be8
> > > [    0.000000] status: 0000000000000100 badaddr: dfffffc810040000
> > > cause: 000000000000000f
> > > [    0.000000] random: get_random_bytes called from
> > > oops_exit+0x30/0x58 with crng_init=0
> > > [    0.000000] ---[ end trace 0000000000000000 ]---
> > > [    0.000000] Kernel panic - not syncing: Fatal exception
> > > [    0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]---
> > >
> > >
> > > But I first tried with a the kernel image I had in the dir, I think it
> > > was this config (no KASAN):
> > > https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt
> > >
> > > and earlycon=sbi did not change anything (no output after OpenSBI).
> > > So potentially there are 2 different problems.
> >
> > Thanks for reporting this.  Looks like I'd forgotten to add a kasan config to
> > my tests.  There's one in there now, and it's passing as of the fix that Nylon
> > posted.
>
> I can boot the KASAN kernel now on riscv/fixes.
>
> Next problem: I've got only to:
>
> [   90.498967][    T1] Run /sbin/init as init process
> [   91.164353][ T4022] init[4022]: unhandled signal 11 code 0x1 at
> 0x0000000000000bb0 in busybox[10000+d7000]
> [   91.179640][ T4022] CPU: 1 PID: 4022 Comm: init Not tainted
> 5.11.0-rc2-00012-g0983834a8393 #19
> [   91.180853][ T4022] epc: 0000000000000bb0 ra : 0000003fccab09d0 sp
> : 0000003fffa8c7b0
> [   91.181861][ T4022]  gp : 00000000000e8d70 tp : 0000003fccaaf820 t0
> : 000000000000001e
> [   91.182810][ T4022]  t1 : 0000003fccab0bfc t2 : 000000000000000a s0
> : 0000003fffa8c850
> [   91.183749][ T4022]  s1 : 0000003fccab1070 a0 : 0000003fccab1070 a1
> : 0000003fffa8c8c8
> [   91.184689][ T4022]  a2 : 0000000000000001 a3 : 0000000000000020 a4
> : 0000000000000000
> [   91.185620][ T4022]  a5 : 0000000000000000 a6 : 0000003fcc9c4260 a7
> : fffffffffffffffe
> [   91.186566][ T4022]  s2 : 0000000000000000 s3 : 0000003fffa8c8c8 s4
> : 0000003fccab1000
> [   91.187500][ T4022]  s5 : 0000003fccab1078 s6 : 0000003fffa8c8d0 s7
> : 0000000000000010
> [   91.189672][ T4022]  s8 : 0000000000000016 s9 : 0000000000000000
> s10: 0000003fffa8c8c8
> [   91.190637][ T4022]  s11: 0000000000000000 t3 : 0000000000000bb0 t4
> : 0000000000000000
> [   91.191568][ T4022]  t5 : 0000003fffa8c360 t6 : 0000000000000000
> [   91.192389][ T4022] status: 8000000000004020 badaddr:
> 0000000000000bb0 cause: 000000000000000c
> [   91.201573][    T1] Kernel panic - not syncing: Attempted to kill
> init! exitcode=0x0000000b
> [   91.202906][    T1] CPU: 0 PID: 1 Comm: init Not tainted
> 5.11.0-rc2-00012-g0983834a8393 #19
> [   91.204139][    T1] Call Trace:
> [   91.204849][    T1] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0
> [   91.206124][    T1] [<ffffffe00458b2d8>] show_stack+0x3a/0x46
> [   91.207240][    T1] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180
> [   91.208732][    T1] [<ffffffe00458b6a0>] panic+0x20a/0x5cc
> [   91.209890][    T1] [<ffffffe00002eea4>] do_exit+0x1846/0x1874
> [   91.211052][    T1] [<ffffffe00002efdc>] do_group_exit+0xa0/0x192
> [   91.212224][    T1] [<ffffffe000047d30>] get_signal+0x2d6/0x13dc
> [   91.213390][    T1] [<ffffffe000007eb0>] do_notify_resume+0xa8/0x912
> [   91.214567][    T1] [<ffffffe00000559c>] ret_from_exception+0x0/0x14
>
> The image is buildroot on 2020.11.x built with this script:
> https://gist.githubusercontent.com/dvyukov/1a9a01ca2189e35175a021820c95b04d/raw/5c01d755e83f4eab0d56aa7dc84af3b2d5e80423/gistfile1.txt
>
> Readelf for init shows the following (is it that [10000+d7000] address
> is not .text at all?):
>
> $ riscv64-linux-gnu-readelf --sections image/bin/busybox
> There are 27 section headers, starting at offset 0xd7f20:
>
> Section Headers:
>   [Nr] Name              Type             Address           Offset
>        Size              EntSize          Flags  Link  Info  Align
>   [ 0]                   NULL             0000000000000000  00000000
>        0000000000000000  0000000000000000           0     0     0
>   [ 1] .interp           PROGBITS         0000000000010238  00000238
>        0000000000000021  0000000000000000   A       0     0     1
>   [ 2] .note.ABI-tag     NOTE             000000000001025c  0000025c
>        0000000000000020  0000000000000000   A       0     0     4
>   [ 3] .hash             HASH             0000000000010280  00000280
>        00000000000009cc  0000000000000004   A       5     0     8
>   [ 4] .gnu.hash         GNU_HASH         0000000000010c50  00000c50
>        0000000000000ac8  0000000000000000   A       5     0     8
>   [ 5] .dynsym           DYNSYM           0000000000011718  00001718
>        00000000000021f0  0000000000000018   A       6     1     8
>   [ 6] .dynstr           STRTAB           0000000000013908  00003908
>        0000000000000c66  0000000000000000   A       0     0     1
>   [ 7] .gnu.version      VERSYM           000000000001456e  0000456e
>        00000000000002d4  0000000000000002   A       5     0     2
>   [ 8] .gnu.version_r    VERNEED          0000000000014848  00004848
>        0000000000000050  0000000000000000   A       6     2     8
>   [ 9] .rela.dyn         RELA             0000000000014898  00004898
>        00000000000000c0  0000000000000018   A       5     0     8
>   [10] .rela.plt         RELA             0000000000014958  00004958
>        00000000000020a0  0000000000000018  AI       5    22     8
>   [11] .plt              PROGBITS         0000000000016a00  00006a00
>        00000000000015e0  0000000000000010  AX       0     0     16
>   [12] .text             PROGBITS         0000000000017fe0  00007fe0
>        00000000000a3668  0000000000000000  AX       0     0     4
>   [13] .rodata           PROGBITS         00000000000bb648  000ab648
>        000000000002b076  0000000000000000   A       0     0     8
>   [14] .sdata2           PROGBITS         00000000000e66c0  000d66c0
>        0000000000000163  0000000000000000   A       0     0     8
>   [15] .eh_frame_hdr     PROGBITS         00000000000e6824  000d6824
>        0000000000000014  0000000000000000   A       0     0     4
>   [16] .eh_frame         PROGBITS         00000000000e6838  000d6838
>        000000000000002c  0000000000000000   A       0     0     8
>   [17] .preinit_array    PREINIT_ARRAY    00000000000e7df8  000d6df8
>        0000000000000008  0000000000000008  WA       0     0     1
>   [18] .init_array       INIT_ARRAY       00000000000e7e00  000d6e00
>        0000000000000008  0000000000000008  WA       0     0     8
>   [19] .fini_array       FINI_ARRAY       00000000000e7e08  000d6e08
>        0000000000000008  0000000000000008  WA       0     0     8
>   [20] .dynamic          DYNAMIC          00000000000e7e10  000d6e10
>        00000000000001f0  0000000000000010  WA       6     0     8
>   [21] .data             PROGBITS         00000000000e8000  000d7000
>        0000000000000240  0000000000000000  WA       0     0     8
>   [22] .got              PROGBITS         00000000000e8240  000d7240
>        0000000000000af8  0000000000000008  WA       0     0     8
>   [23] .sdata            PROGBITS         00000000000e8d38  000d7d38
>        0000000000000101  0000000000000000  WA       0     0     8
>   [24] .sbss             NOBITS           00000000000e8e40  000d7e39
>        000000000000017f  0000000000000000  WA       0     0     8
>   [25] .bss              NOBITS           00000000000e8fc0  000d7e39
>        00000000000005b0  0000000000000000  WA       0     0     8
>   [26] .shstrtab         STRTAB           0000000000000000  000d7e39
>        00000000000000e6  0000000000000000           0     0     1
>
>
> Before I spent more time on this, am I doing anything obviously wrong?
> Is it a known issue? Are there any fresh working recipes?

Humm.. I tried to use 2020.05 which Tobias used here:
https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_riscv64-kernel.md#image
But there is no make qemu_riscv64_virt_defconfig target... though I
remember I tested these instructions at the time...

To be precise I used 2020.11, I see there is now 2020.11.1 but I don't
see any mentions of riscv in the log.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2021-01-14 10:24         ` Dmitry Vyukov
@ 2021-01-14 11:24           ` Dmitry Vyukov
  2021-01-18 14:53           ` Tobias Klauser
  1 sibling, 0 replies; 27+ messages in thread
From: Dmitry Vyukov @ 2021-01-14 11:24 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Andreas Schwab, Paul Walmsley, Albert Ou, linux-riscv, LKML,
	nylon7, Bjorn Topel, Tobias Klauser, syzkaller

On Thu, Jan 14, 2021 at 11:24 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Thu, Jan 14, 2021 at 10:23 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > On Thu, Jan 14, 2021 at 5:57 AM Palmer Dabbelt <palmerdabbelt@google.com> wrote:
> > >
> > > On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote:
> > > > On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote:
> > > >>
> > > >> On Dez 25 2020, Dmitry Vyukov wrote:
> > > >>
> > > >> > qemu-system-riscv64 \
> > > >> > -machine virt -bios default -smp 1 -m 2G \
> > > >> > -device virtio-blk-device,drive=hd0 \
> > > >> > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \
> > > >> > -kernel arch/riscv/boot/Image \
> > > >> > -nographic \
> > > >> > -device virtio-rng-device,rng=rng0 -object
> > > >> > rng-random,filename=/dev/urandom,id=rng0 \
> > > >> > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
> > > >> > virtio-net-device,netdev=net0 \
> > > >> > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
> > > >> > panic_on_warn=1 panic=86400"
> > > >>
> > > >> Do you get more output with earlycon=sbi?
> > > >
> > > > Hi Andreas,
> > > >
> > > > For defconfig+kvm_guest.config+ scripts/config -e KASAN -e
> > > > KASAN_INLINE it actually gave me more output:
> > > >
> > > >
> > > > OpenSBI v0.7
> > > >    ____                    _____ ____ _____
> > > >   / __ \                  / ____|  _ \_   _|
> > > >  | |  | |_ __   ___ _ __ | (___ | |_) || |
> > > >  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
> > > >  | |__| | |_) |  __/ | | |____) | |_) || |_
> > > >   \____/| .__/ \___|_| |_|_____/|____/_____|
> > > >         | |
> > > >         |_|
> > > >
> > > > Platform Name          : QEMU Virt Machine
> > > > Platform HART Features : RV64ACDFIMSU
> > > > Current Hart           : 0
> > > > Firmware Base          : 0x80000000
> > > > Firmware Size          : 132 KB
> > > > Runtime SBI Version    : 0.2
> > > >
> > > > MIDELEG : 0x0000000000000222
> > > > MEDELEG : 0x000000000000b109
> > > > PMP0    : 0x0000000080000000-0x000000008003ffff (A)
> > > > PMP1    : 0x0000000000000000-0xffffffffffffffff (A,R,W,X)
> > > > [    0.000000] Linux version 5.10.0-01370-g71c5f03154ac
> > > > (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc
> > > > (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17
> > > > SMP Fri Dec 25 18:10:12 CET 2020
> > > > [    0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
> > > > [    0.000000] earlycon: sbi0 at I/O port 0x0 (options '')
> > > > [    0.000000] printk: bootconsole [sbi0] enabled
> > > > [    0.000000] efi: UEFI not found.
> > > > [    0.000000] Zone ranges:
> > > > [    0.000000]   DMA32    [mem 0x0000000080200000-0x00000000ffffffff]
> > > > [    0.000000]   Normal   empty
> > > > [    0.000000] Movable zone start for each node
> > > > [    0.000000] Early memory node ranges
> > > > [    0.000000]   node   0: [mem 0x0000000080200000-0x00000000ffffffff]
> > > > [    0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff]
> > > > [    0.000000] SBI specification v0.2 detected
> > > > [    0.000000] SBI implementation ID=0x1 Version=0x7
> > > > [    0.000000] SBI v0.2 TIME extension detected
> > > > [    0.000000] SBI v0.2 IPI extension detected
> > > > [    0.000000] SBI v0.2 RFENCE extension detected
> > > > [    0.000000] software IO TLB: mapped [mem
> > > > 0x00000000fa3f9000-0x00000000fe3f9000] (64MB)
> > > > [    0.000000] Unable to handle kernel paging request at virtual
> > > > address dfffffc810040000
> > > > [    0.000000] Oops [#1]
> > > > [    0.000000] Modules linked in:
> > > > [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
> > > > 5.10.0-01370-g71c5f03154ac #17
> > > > [    0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0
> > > > [    0.000000]  gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 :
> > > > dfffffc810040000
> > > > [    0.000000]  t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 :
> > > > ffffffe001603f50
> > > > [    0.000000]  s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 :
> > > > 0000000000000000
> > > > [    0.000000]  a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 :
> > > > 0000000000000000
> > > > [    0.000000]  a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 :
> > > > 0000000000000900
> > > > [    0.000000]  s2 : dfffffc820000000 s3 : dfffffc800000000 s4 :
> > > > 0000000000000001
> > > > [    0.000000]  s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 :
> > > > dfffffc810040000
> > > > [    0.000000]  s8 : 0000000000000080 s9 : ffffffffffffffff s10:
> > > > ffffffe07a119000
> > > > [    0.000000]  s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 :
> > > > 0000000000000001
> > > > [    0.000000]  t5 : ffffffc4001c150a t6 : ffffffe001603be8
> > > > [    0.000000] status: 0000000000000100 badaddr: dfffffc810040000
> > > > cause: 000000000000000f
> > > > [    0.000000] random: get_random_bytes called from
> > > > oops_exit+0x30/0x58 with crng_init=0
> > > > [    0.000000] ---[ end trace 0000000000000000 ]---
> > > > [    0.000000] Kernel panic - not syncing: Fatal exception
> > > > [    0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]---
> > > >
> > > >
> > > > But I first tried with a the kernel image I had in the dir, I think it
> > > > was this config (no KASAN):
> > > > https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt
> > > >
> > > > and earlycon=sbi did not change anything (no output after OpenSBI).
> > > > So potentially there are 2 different problems.
> > >
> > > Thanks for reporting this.  Looks like I'd forgotten to add a kasan config to
> > > my tests.  There's one in there now, and it's passing as of the fix that Nylon
> > > posted.
> >
> > I can boot the KASAN kernel now on riscv/fixes.
> >
> > Next problem: I've got only to:
> >
> > [   90.498967][    T1] Run /sbin/init as init process
> > [   91.164353][ T4022] init[4022]: unhandled signal 11 code 0x1 at
> > 0x0000000000000bb0 in busybox[10000+d7000]
> > [   91.179640][ T4022] CPU: 1 PID: 4022 Comm: init Not tainted
> > 5.11.0-rc2-00012-g0983834a8393 #19
> > [   91.180853][ T4022] epc: 0000000000000bb0 ra : 0000003fccab09d0 sp
> > : 0000003fffa8c7b0
> > [   91.181861][ T4022]  gp : 00000000000e8d70 tp : 0000003fccaaf820 t0
> > : 000000000000001e
> > [   91.182810][ T4022]  t1 : 0000003fccab0bfc t2 : 000000000000000a s0
> > : 0000003fffa8c850
> > [   91.183749][ T4022]  s1 : 0000003fccab1070 a0 : 0000003fccab1070 a1
> > : 0000003fffa8c8c8
> > [   91.184689][ T4022]  a2 : 0000000000000001 a3 : 0000000000000020 a4
> > : 0000000000000000
> > [   91.185620][ T4022]  a5 : 0000000000000000 a6 : 0000003fcc9c4260 a7
> > : fffffffffffffffe
> > [   91.186566][ T4022]  s2 : 0000000000000000 s3 : 0000003fffa8c8c8 s4
> > : 0000003fccab1000
> > [   91.187500][ T4022]  s5 : 0000003fccab1078 s6 : 0000003fffa8c8d0 s7
> > : 0000000000000010
> > [   91.189672][ T4022]  s8 : 0000000000000016 s9 : 0000000000000000
> > s10: 0000003fffa8c8c8
> > [   91.190637][ T4022]  s11: 0000000000000000 t3 : 0000000000000bb0 t4
> > : 0000000000000000
> > [   91.191568][ T4022]  t5 : 0000003fffa8c360 t6 : 0000000000000000
> > [   91.192389][ T4022] status: 8000000000004020 badaddr:
> > 0000000000000bb0 cause: 000000000000000c
> > [   91.201573][    T1] Kernel panic - not syncing: Attempted to kill
> > init! exitcode=0x0000000b
> > [   91.202906][    T1] CPU: 0 PID: 1 Comm: init Not tainted
> > 5.11.0-rc2-00012-g0983834a8393 #19
> > [   91.204139][    T1] Call Trace:
> > [   91.204849][    T1] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0
> > [   91.206124][    T1] [<ffffffe00458b2d8>] show_stack+0x3a/0x46
> > [   91.207240][    T1] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180
> > [   91.208732][    T1] [<ffffffe00458b6a0>] panic+0x20a/0x5cc
> > [   91.209890][    T1] [<ffffffe00002eea4>] do_exit+0x1846/0x1874
> > [   91.211052][    T1] [<ffffffe00002efdc>] do_group_exit+0xa0/0x192
> > [   91.212224][    T1] [<ffffffe000047d30>] get_signal+0x2d6/0x13dc
> > [   91.213390][    T1] [<ffffffe000007eb0>] do_notify_resume+0xa8/0x912
> > [   91.214567][    T1] [<ffffffe00000559c>] ret_from_exception+0x0/0x14
> >
> > The image is buildroot on 2020.11.x built with this script:
> > https://gist.githubusercontent.com/dvyukov/1a9a01ca2189e35175a021820c95b04d/raw/5c01d755e83f4eab0d56aa7dc84af3b2d5e80423/gistfile1.txt
> >
> > Readelf for init shows the following (is it that [10000+d7000] address
> > is not .text at all?):
> >
> > $ riscv64-linux-gnu-readelf --sections image/bin/busybox
> > There are 27 section headers, starting at offset 0xd7f20:
> >
> > Section Headers:
> >   [Nr] Name              Type             Address           Offset
> >        Size              EntSize          Flags  Link  Info  Align
> >   [ 0]                   NULL             0000000000000000  00000000
> >        0000000000000000  0000000000000000           0     0     0
> >   [ 1] .interp           PROGBITS         0000000000010238  00000238
> >        0000000000000021  0000000000000000   A       0     0     1
> >   [ 2] .note.ABI-tag     NOTE             000000000001025c  0000025c
> >        0000000000000020  0000000000000000   A       0     0     4
> >   [ 3] .hash             HASH             0000000000010280  00000280
> >        00000000000009cc  0000000000000004   A       5     0     8
> >   [ 4] .gnu.hash         GNU_HASH         0000000000010c50  00000c50
> >        0000000000000ac8  0000000000000000   A       5     0     8
> >   [ 5] .dynsym           DYNSYM           0000000000011718  00001718
> >        00000000000021f0  0000000000000018   A       6     1     8
> >   [ 6] .dynstr           STRTAB           0000000000013908  00003908
> >        0000000000000c66  0000000000000000   A       0     0     1
> >   [ 7] .gnu.version      VERSYM           000000000001456e  0000456e
> >        00000000000002d4  0000000000000002   A       5     0     2
> >   [ 8] .gnu.version_r    VERNEED          0000000000014848  00004848
> >        0000000000000050  0000000000000000   A       6     2     8
> >   [ 9] .rela.dyn         RELA             0000000000014898  00004898
> >        00000000000000c0  0000000000000018   A       5     0     8
> >   [10] .rela.plt         RELA             0000000000014958  00004958
> >        00000000000020a0  0000000000000018  AI       5    22     8
> >   [11] .plt              PROGBITS         0000000000016a00  00006a00
> >        00000000000015e0  0000000000000010  AX       0     0     16
> >   [12] .text             PROGBITS         0000000000017fe0  00007fe0
> >        00000000000a3668  0000000000000000  AX       0     0     4
> >   [13] .rodata           PROGBITS         00000000000bb648  000ab648
> >        000000000002b076  0000000000000000   A       0     0     8
> >   [14] .sdata2           PROGBITS         00000000000e66c0  000d66c0
> >        0000000000000163  0000000000000000   A       0     0     8
> >   [15] .eh_frame_hdr     PROGBITS         00000000000e6824  000d6824
> >        0000000000000014  0000000000000000   A       0     0     4
> >   [16] .eh_frame         PROGBITS         00000000000e6838  000d6838
> >        000000000000002c  0000000000000000   A       0     0     8
> >   [17] .preinit_array    PREINIT_ARRAY    00000000000e7df8  000d6df8
> >        0000000000000008  0000000000000008  WA       0     0     1
> >   [18] .init_array       INIT_ARRAY       00000000000e7e00  000d6e00
> >        0000000000000008  0000000000000008  WA       0     0     8
> >   [19] .fini_array       FINI_ARRAY       00000000000e7e08  000d6e08
> >        0000000000000008  0000000000000008  WA       0     0     8
> >   [20] .dynamic          DYNAMIC          00000000000e7e10  000d6e10
> >        00000000000001f0  0000000000000010  WA       6     0     8
> >   [21] .data             PROGBITS         00000000000e8000  000d7000
> >        0000000000000240  0000000000000000  WA       0     0     8
> >   [22] .got              PROGBITS         00000000000e8240  000d7240
> >        0000000000000af8  0000000000000008  WA       0     0     8
> >   [23] .sdata            PROGBITS         00000000000e8d38  000d7d38
> >        0000000000000101  0000000000000000  WA       0     0     8
> >   [24] .sbss             NOBITS           00000000000e8e40  000d7e39
> >        000000000000017f  0000000000000000  WA       0     0     8
> >   [25] .bss              NOBITS           00000000000e8fc0  000d7e39
> >        00000000000005b0  0000000000000000  WA       0     0     8
> >   [26] .shstrtab         STRTAB           0000000000000000  000d7e39
> >        00000000000000e6  0000000000000000           0     0     1
> >
> >
> > Before I spent more time on this, am I doing anything obviously wrong?
> > Is it a known issue? Are there any fresh working recipes?
>
> Humm.. I tried to use 2020.05 which Tobias used here:
> https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_riscv64-kernel.md#image
> But there is no make qemu_riscv64_virt_defconfig target... though I
> remember I tested these instructions at the time...
>
> To be precise I used 2020.11, I see there is now 2020.11.1 but I don't
> see any mentions of riscv in the log.

For completeness, kernel config I used is:
https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2021-01-14 10:24         ` Dmitry Vyukov
  2021-01-14 11:24           ` Dmitry Vyukov
@ 2021-01-18 14:53           ` Tobias Klauser
  2021-01-18 15:05             ` Dmitry Vyukov
  1 sibling, 1 reply; 27+ messages in thread
From: Tobias Klauser @ 2021-01-18 14:53 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Palmer Dabbelt, Andreas Schwab, Paul Walmsley, Albert Ou,
	linux-riscv, LKML, nylon7, Bjorn Topel, syzkaller

On 2021-01-14 at 11:24:07 +0100, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Thu, Jan 14, 2021 at 10:23 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > On Thu, Jan 14, 2021 at 5:57 AM Palmer Dabbelt <palmerdabbelt@google.com> wrote:
> > >
> > > On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote:
> > > > On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote:
> > > >>
> > > >> On Dez 25 2020, Dmitry Vyukov wrote:
> > > >>
> > > >> > qemu-system-riscv64 \
> > > >> > -machine virt -bios default -smp 1 -m 2G \
> > > >> > -device virtio-blk-device,drive=hd0 \
> > > >> > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \
> > > >> > -kernel arch/riscv/boot/Image \
> > > >> > -nographic \
> > > >> > -device virtio-rng-device,rng=rng0 -object
> > > >> > rng-random,filename=/dev/urandom,id=rng0 \
> > > >> > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
> > > >> > virtio-net-device,netdev=net0 \
> > > >> > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
> > > >> > panic_on_warn=1 panic=86400"
> > > >>
> > > >> Do you get more output with earlycon=sbi?
> > > >
> > > > Hi Andreas,
> > > >
> > > > For defconfig+kvm_guest.config+ scripts/config -e KASAN -e
> > > > KASAN_INLINE it actually gave me more output:
> > > >
> > > >
> > > > OpenSBI v0.7
> > > >    ____                    _____ ____ _____
> > > >   / __ \                  / ____|  _ \_   _|
> > > >  | |  | |_ __   ___ _ __ | (___ | |_) || |
> > > >  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
> > > >  | |__| | |_) |  __/ | | |____) | |_) || |_
> > > >   \____/| .__/ \___|_| |_|_____/|____/_____|
> > > >         | |
> > > >         |_|
> > > >
> > > > Platform Name          : QEMU Virt Machine
> > > > Platform HART Features : RV64ACDFIMSU
> > > > Current Hart           : 0
> > > > Firmware Base          : 0x80000000
> > > > Firmware Size          : 132 KB
> > > > Runtime SBI Version    : 0.2
> > > >
> > > > MIDELEG : 0x0000000000000222
> > > > MEDELEG : 0x000000000000b109
> > > > PMP0    : 0x0000000080000000-0x000000008003ffff (A)
> > > > PMP1    : 0x0000000000000000-0xffffffffffffffff (A,R,W,X)
> > > > [    0.000000] Linux version 5.10.0-01370-g71c5f03154ac
> > > > (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc
> > > > (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17
> > > > SMP Fri Dec 25 18:10:12 CET 2020
> > > > [    0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
> > > > [    0.000000] earlycon: sbi0 at I/O port 0x0 (options '')
> > > > [    0.000000] printk: bootconsole [sbi0] enabled
> > > > [    0.000000] efi: UEFI not found.
> > > > [    0.000000] Zone ranges:
> > > > [    0.000000]   DMA32    [mem 0x0000000080200000-0x00000000ffffffff]
> > > > [    0.000000]   Normal   empty
> > > > [    0.000000] Movable zone start for each node
> > > > [    0.000000] Early memory node ranges
> > > > [    0.000000]   node   0: [mem 0x0000000080200000-0x00000000ffffffff]
> > > > [    0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff]
> > > > [    0.000000] SBI specification v0.2 detected
> > > > [    0.000000] SBI implementation ID=0x1 Version=0x7
> > > > [    0.000000] SBI v0.2 TIME extension detected
> > > > [    0.000000] SBI v0.2 IPI extension detected
> > > > [    0.000000] SBI v0.2 RFENCE extension detected
> > > > [    0.000000] software IO TLB: mapped [mem
> > > > 0x00000000fa3f9000-0x00000000fe3f9000] (64MB)
> > > > [    0.000000] Unable to handle kernel paging request at virtual
> > > > address dfffffc810040000
> > > > [    0.000000] Oops [#1]
> > > > [    0.000000] Modules linked in:
> > > > [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
> > > > 5.10.0-01370-g71c5f03154ac #17
> > > > [    0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0
> > > > [    0.000000]  gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 :
> > > > dfffffc810040000
> > > > [    0.000000]  t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 :
> > > > ffffffe001603f50
> > > > [    0.000000]  s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 :
> > > > 0000000000000000
> > > > [    0.000000]  a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 :
> > > > 0000000000000000
> > > > [    0.000000]  a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 :
> > > > 0000000000000900
> > > > [    0.000000]  s2 : dfffffc820000000 s3 : dfffffc800000000 s4 :
> > > > 0000000000000001
> > > > [    0.000000]  s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 :
> > > > dfffffc810040000
> > > > [    0.000000]  s8 : 0000000000000080 s9 : ffffffffffffffff s10:
> > > > ffffffe07a119000
> > > > [    0.000000]  s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 :
> > > > 0000000000000001
> > > > [    0.000000]  t5 : ffffffc4001c150a t6 : ffffffe001603be8
> > > > [    0.000000] status: 0000000000000100 badaddr: dfffffc810040000
> > > > cause: 000000000000000f
> > > > [    0.000000] random: get_random_bytes called from
> > > > oops_exit+0x30/0x58 with crng_init=0
> > > > [    0.000000] ---[ end trace 0000000000000000 ]---
> > > > [    0.000000] Kernel panic - not syncing: Fatal exception
> > > > [    0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]---
> > > >
> > > >
> > > > But I first tried with a the kernel image I had in the dir, I think it
> > > > was this config (no KASAN):
> > > > https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt
> > > >
> > > > and earlycon=sbi did not change anything (no output after OpenSBI).
> > > > So potentially there are 2 different problems.
> > >
> > > Thanks for reporting this.  Looks like I'd forgotten to add a kasan config to
> > > my tests.  There's one in there now, and it's passing as of the fix that Nylon
> > > posted.
> >
> > I can boot the KASAN kernel now on riscv/fixes.
> >
> > Next problem: I've got only to:
> >
> > [   90.498967][    T1] Run /sbin/init as init process
> > [   91.164353][ T4022] init[4022]: unhandled signal 11 code 0x1 at
> > 0x0000000000000bb0 in busybox[10000+d7000]
> > [   91.179640][ T4022] CPU: 1 PID: 4022 Comm: init Not tainted
> > 5.11.0-rc2-00012-g0983834a8393 #19
> > [   91.180853][ T4022] epc: 0000000000000bb0 ra : 0000003fccab09d0 sp
> > : 0000003fffa8c7b0
> > [   91.181861][ T4022]  gp : 00000000000e8d70 tp : 0000003fccaaf820 t0
> > : 000000000000001e
> > [   91.182810][ T4022]  t1 : 0000003fccab0bfc t2 : 000000000000000a s0
> > : 0000003fffa8c850
> > [   91.183749][ T4022]  s1 : 0000003fccab1070 a0 : 0000003fccab1070 a1
> > : 0000003fffa8c8c8
> > [   91.184689][ T4022]  a2 : 0000000000000001 a3 : 0000000000000020 a4
> > : 0000000000000000
> > [   91.185620][ T4022]  a5 : 0000000000000000 a6 : 0000003fcc9c4260 a7
> > : fffffffffffffffe
> > [   91.186566][ T4022]  s2 : 0000000000000000 s3 : 0000003fffa8c8c8 s4
> > : 0000003fccab1000
> > [   91.187500][ T4022]  s5 : 0000003fccab1078 s6 : 0000003fffa8c8d0 s7
> > : 0000000000000010
> > [   91.189672][ T4022]  s8 : 0000000000000016 s9 : 0000000000000000
> > s10: 0000003fffa8c8c8
> > [   91.190637][ T4022]  s11: 0000000000000000 t3 : 0000000000000bb0 t4
> > : 0000000000000000
> > [   91.191568][ T4022]  t5 : 0000003fffa8c360 t6 : 0000000000000000
> > [   91.192389][ T4022] status: 8000000000004020 badaddr:
> > 0000000000000bb0 cause: 000000000000000c
> > [   91.201573][    T1] Kernel panic - not syncing: Attempted to kill
> > init! exitcode=0x0000000b
> > [   91.202906][    T1] CPU: 0 PID: 1 Comm: init Not tainted
> > 5.11.0-rc2-00012-g0983834a8393 #19
> > [   91.204139][    T1] Call Trace:
> > [   91.204849][    T1] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0
> > [   91.206124][    T1] [<ffffffe00458b2d8>] show_stack+0x3a/0x46
> > [   91.207240][    T1] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180
> > [   91.208732][    T1] [<ffffffe00458b6a0>] panic+0x20a/0x5cc
> > [   91.209890][    T1] [<ffffffe00002eea4>] do_exit+0x1846/0x1874
> > [   91.211052][    T1] [<ffffffe00002efdc>] do_group_exit+0xa0/0x192
> > [   91.212224][    T1] [<ffffffe000047d30>] get_signal+0x2d6/0x13dc
> > [   91.213390][    T1] [<ffffffe000007eb0>] do_notify_resume+0xa8/0x912
> > [   91.214567][    T1] [<ffffffe00000559c>] ret_from_exception+0x0/0x14
> >
> > The image is buildroot on 2020.11.x built with this script:
> > https://gist.githubusercontent.com/dvyukov/1a9a01ca2189e35175a021820c95b04d/raw/5c01d755e83f4eab0d56aa7dc84af3b2d5e80423/gistfile1.txt
> >
> > Readelf for init shows the following (is it that [10000+d7000] address
> > is not .text at all?):
> >
> > $ riscv64-linux-gnu-readelf --sections image/bin/busybox
> > There are 27 section headers, starting at offset 0xd7f20:
> >
> > Section Headers:
> >   [Nr] Name              Type             Address           Offset
> >        Size              EntSize          Flags  Link  Info  Align
> >   [ 0]                   NULL             0000000000000000  00000000
> >        0000000000000000  0000000000000000           0     0     0
> >   [ 1] .interp           PROGBITS         0000000000010238  00000238
> >        0000000000000021  0000000000000000   A       0     0     1
> >   [ 2] .note.ABI-tag     NOTE             000000000001025c  0000025c
> >        0000000000000020  0000000000000000   A       0     0     4
> >   [ 3] .hash             HASH             0000000000010280  00000280
> >        00000000000009cc  0000000000000004   A       5     0     8
> >   [ 4] .gnu.hash         GNU_HASH         0000000000010c50  00000c50
> >        0000000000000ac8  0000000000000000   A       5     0     8
> >   [ 5] .dynsym           DYNSYM           0000000000011718  00001718
> >        00000000000021f0  0000000000000018   A       6     1     8
> >   [ 6] .dynstr           STRTAB           0000000000013908  00003908
> >        0000000000000c66  0000000000000000   A       0     0     1
> >   [ 7] .gnu.version      VERSYM           000000000001456e  0000456e
> >        00000000000002d4  0000000000000002   A       5     0     2
> >   [ 8] .gnu.version_r    VERNEED          0000000000014848  00004848
> >        0000000000000050  0000000000000000   A       6     2     8
> >   [ 9] .rela.dyn         RELA             0000000000014898  00004898
> >        00000000000000c0  0000000000000018   A       5     0     8
> >   [10] .rela.plt         RELA             0000000000014958  00004958
> >        00000000000020a0  0000000000000018  AI       5    22     8
> >   [11] .plt              PROGBITS         0000000000016a00  00006a00
> >        00000000000015e0  0000000000000010  AX       0     0     16
> >   [12] .text             PROGBITS         0000000000017fe0  00007fe0
> >        00000000000a3668  0000000000000000  AX       0     0     4
> >   [13] .rodata           PROGBITS         00000000000bb648  000ab648
> >        000000000002b076  0000000000000000   A       0     0     8
> >   [14] .sdata2           PROGBITS         00000000000e66c0  000d66c0
> >        0000000000000163  0000000000000000   A       0     0     8
> >   [15] .eh_frame_hdr     PROGBITS         00000000000e6824  000d6824
> >        0000000000000014  0000000000000000   A       0     0     4
> >   [16] .eh_frame         PROGBITS         00000000000e6838  000d6838
> >        000000000000002c  0000000000000000   A       0     0     8
> >   [17] .preinit_array    PREINIT_ARRAY    00000000000e7df8  000d6df8
> >        0000000000000008  0000000000000008  WA       0     0     1
> >   [18] .init_array       INIT_ARRAY       00000000000e7e00  000d6e00
> >        0000000000000008  0000000000000008  WA       0     0     8
> >   [19] .fini_array       FINI_ARRAY       00000000000e7e08  000d6e08
> >        0000000000000008  0000000000000008  WA       0     0     8
> >   [20] .dynamic          DYNAMIC          00000000000e7e10  000d6e10
> >        00000000000001f0  0000000000000010  WA       6     0     8
> >   [21] .data             PROGBITS         00000000000e8000  000d7000
> >        0000000000000240  0000000000000000  WA       0     0     8
> >   [22] .got              PROGBITS         00000000000e8240  000d7240
> >        0000000000000af8  0000000000000008  WA       0     0     8
> >   [23] .sdata            PROGBITS         00000000000e8d38  000d7d38
> >        0000000000000101  0000000000000000  WA       0     0     8
> >   [24] .sbss             NOBITS           00000000000e8e40  000d7e39
> >        000000000000017f  0000000000000000  WA       0     0     8
> >   [25] .bss              NOBITS           00000000000e8fc0  000d7e39
> >        00000000000005b0  0000000000000000  WA       0     0     8
> >   [26] .shstrtab         STRTAB           0000000000000000  000d7e39
> >        00000000000000e6  0000000000000000           0     0     1
> >
> >
> > Before I spent more time on this, am I doing anything obviously wrong?
> > Is it a known issue? Are there any fresh working recipes?
> 
> Humm.. I tried to use 2020.05 which Tobias used here:
> https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_riscv64-kernel.md#image
> But there is no make qemu_riscv64_virt_defconfig target... though I
> remember I tested these instructions at the time...

Weird. `make qemu_riscv64_virt_defconfig` works here on buildroot
202.05, 2020.11.1 and on latest master.

Do you see these in your configs/ directory?

$ ls -l configs/qemu_riscv*
-rw-rw-r-- 1 tklauser tklauser 673 Jan 18 15:51 configs/qemu_riscv32_virt_defconfig
-rw-rw-r-- 1 tklauser tklauser 682 Jan 18 15:51 configs/qemu_riscv64_virt_defconfig

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2021-01-18 14:53           ` Tobias Klauser
@ 2021-01-18 15:05             ` Dmitry Vyukov
  2021-01-18 15:43               ` Dmitry Vyukov
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Vyukov @ 2021-01-18 15:05 UTC (permalink / raw)
  To: Tobias Klauser
  Cc: Palmer Dabbelt, Andreas Schwab, Paul Walmsley, Albert Ou,
	linux-riscv, LKML, nylon7, Bjorn Topel, syzkaller

On Mon, Jan 18, 2021 at 3:53 PM Tobias Klauser <tklauser@distanz.ch> wrote:
> > > On Thu, Jan 14, 2021 at 5:57 AM Palmer Dabbelt <palmerdabbelt@google.com> wrote:
> > > >
> > > > On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote:
> > > > > On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote:
> > > > >>
> > > > >> On Dez 25 2020, Dmitry Vyukov wrote:
> > > > >>
> > > > >> > qemu-system-riscv64 \
> > > > >> > -machine virt -bios default -smp 1 -m 2G \
> > > > >> > -device virtio-blk-device,drive=hd0 \
> > > > >> > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \
> > > > >> > -kernel arch/riscv/boot/Image \
> > > > >> > -nographic \
> > > > >> > -device virtio-rng-device,rng=rng0 -object
> > > > >> > rng-random,filename=/dev/urandom,id=rng0 \
> > > > >> > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
> > > > >> > virtio-net-device,netdev=net0 \
> > > > >> > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
> > > > >> > panic_on_warn=1 panic=86400"
> > > > >>
> > > > >> Do you get more output with earlycon=sbi?
> > > > >
> > > > > Hi Andreas,
> > > > >
> > > > > For defconfig+kvm_guest.config+ scripts/config -e KASAN -e
> > > > > KASAN_INLINE it actually gave me more output:
> > > > >
> > > > >
> > > > > OpenSBI v0.7
> > > > >    ____                    _____ ____ _____
> > > > >   / __ \                  / ____|  _ \_   _|
> > > > >  | |  | |_ __   ___ _ __ | (___ | |_) || |
> > > > >  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
> > > > >  | |__| | |_) |  __/ | | |____) | |_) || |_
> > > > >   \____/| .__/ \___|_| |_|_____/|____/_____|
> > > > >         | |
> > > > >         |_|
> > > > >
> > > > > Platform Name          : QEMU Virt Machine
> > > > > Platform HART Features : RV64ACDFIMSU
> > > > > Current Hart           : 0
> > > > > Firmware Base          : 0x80000000
> > > > > Firmware Size          : 132 KB
> > > > > Runtime SBI Version    : 0.2
> > > > >
> > > > > MIDELEG : 0x0000000000000222
> > > > > MEDELEG : 0x000000000000b109
> > > > > PMP0    : 0x0000000080000000-0x000000008003ffff (A)
> > > > > PMP1    : 0x0000000000000000-0xffffffffffffffff (A,R,W,X)
> > > > > [    0.000000] Linux version 5.10.0-01370-g71c5f03154ac
> > > > > (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc
> > > > > (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17
> > > > > SMP Fri Dec 25 18:10:12 CET 2020
> > > > > [    0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
> > > > > [    0.000000] earlycon: sbi0 at I/O port 0x0 (options '')
> > > > > [    0.000000] printk: bootconsole [sbi0] enabled
> > > > > [    0.000000] efi: UEFI not found.
> > > > > [    0.000000] Zone ranges:
> > > > > [    0.000000]   DMA32    [mem 0x0000000080200000-0x00000000ffffffff]
> > > > > [    0.000000]   Normal   empty
> > > > > [    0.000000] Movable zone start for each node
> > > > > [    0.000000] Early memory node ranges
> > > > > [    0.000000]   node   0: [mem 0x0000000080200000-0x00000000ffffffff]
> > > > > [    0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff]
> > > > > [    0.000000] SBI specification v0.2 detected
> > > > > [    0.000000] SBI implementation ID=0x1 Version=0x7
> > > > > [    0.000000] SBI v0.2 TIME extension detected
> > > > > [    0.000000] SBI v0.2 IPI extension detected
> > > > > [    0.000000] SBI v0.2 RFENCE extension detected
> > > > > [    0.000000] software IO TLB: mapped [mem
> > > > > 0x00000000fa3f9000-0x00000000fe3f9000] (64MB)
> > > > > [    0.000000] Unable to handle kernel paging request at virtual
> > > > > address dfffffc810040000
> > > > > [    0.000000] Oops [#1]
> > > > > [    0.000000] Modules linked in:
> > > > > [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
> > > > > 5.10.0-01370-g71c5f03154ac #17
> > > > > [    0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0
> > > > > [    0.000000]  gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 :
> > > > > dfffffc810040000
> > > > > [    0.000000]  t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 :
> > > > > ffffffe001603f50
> > > > > [    0.000000]  s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 :
> > > > > 0000000000000000
> > > > > [    0.000000]  a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 :
> > > > > 0000000000000000
> > > > > [    0.000000]  a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 :
> > > > > 0000000000000900
> > > > > [    0.000000]  s2 : dfffffc820000000 s3 : dfffffc800000000 s4 :
> > > > > 0000000000000001
> > > > > [    0.000000]  s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 :
> > > > > dfffffc810040000
> > > > > [    0.000000]  s8 : 0000000000000080 s9 : ffffffffffffffff s10:
> > > > > ffffffe07a119000
> > > > > [    0.000000]  s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 :
> > > > > 0000000000000001
> > > > > [    0.000000]  t5 : ffffffc4001c150a t6 : ffffffe001603be8
> > > > > [    0.000000] status: 0000000000000100 badaddr: dfffffc810040000
> > > > > cause: 000000000000000f
> > > > > [    0.000000] random: get_random_bytes called from
> > > > > oops_exit+0x30/0x58 with crng_init=0
> > > > > [    0.000000] ---[ end trace 0000000000000000 ]---
> > > > > [    0.000000] Kernel panic - not syncing: Fatal exception
> > > > > [    0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]---
> > > > >
> > > > >
> > > > > But I first tried with a the kernel image I had in the dir, I think it
> > > > > was this config (no KASAN):
> > > > > https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt
> > > > >
> > > > > and earlycon=sbi did not change anything (no output after OpenSBI).
> > > > > So potentially there are 2 different problems.
> > > >
> > > > Thanks for reporting this.  Looks like I'd forgotten to add a kasan config to
> > > > my tests.  There's one in there now, and it's passing as of the fix that Nylon
> > > > posted.
> > >
> > > I can boot the KASAN kernel now on riscv/fixes.
> > >
> > > Next problem: I've got only to:
> > >
> > > [   90.498967][    T1] Run /sbin/init as init process
> > > [   91.164353][ T4022] init[4022]: unhandled signal 11 code 0x1 at
> > > 0x0000000000000bb0 in busybox[10000+d7000]
> > > [   91.179640][ T4022] CPU: 1 PID: 4022 Comm: init Not tainted
> > > 5.11.0-rc2-00012-g0983834a8393 #19
> > > [   91.180853][ T4022] epc: 0000000000000bb0 ra : 0000003fccab09d0 sp
> > > : 0000003fffa8c7b0
> > > [   91.181861][ T4022]  gp : 00000000000e8d70 tp : 0000003fccaaf820 t0
> > > : 000000000000001e
> > > [   91.182810][ T4022]  t1 : 0000003fccab0bfc t2 : 000000000000000a s0
> > > : 0000003fffa8c850
> > > [   91.183749][ T4022]  s1 : 0000003fccab1070 a0 : 0000003fccab1070 a1
> > > : 0000003fffa8c8c8
> > > [   91.184689][ T4022]  a2 : 0000000000000001 a3 : 0000000000000020 a4
> > > : 0000000000000000
> > > [   91.185620][ T4022]  a5 : 0000000000000000 a6 : 0000003fcc9c4260 a7
> > > : fffffffffffffffe
> > > [   91.186566][ T4022]  s2 : 0000000000000000 s3 : 0000003fffa8c8c8 s4
> > > : 0000003fccab1000
> > > [   91.187500][ T4022]  s5 : 0000003fccab1078 s6 : 0000003fffa8c8d0 s7
> > > : 0000000000000010
> > > [   91.189672][ T4022]  s8 : 0000000000000016 s9 : 0000000000000000
> > > s10: 0000003fffa8c8c8
> > > [   91.190637][ T4022]  s11: 0000000000000000 t3 : 0000000000000bb0 t4
> > > : 0000000000000000
> > > [   91.191568][ T4022]  t5 : 0000003fffa8c360 t6 : 0000000000000000
> > > [   91.192389][ T4022] status: 8000000000004020 badaddr:
> > > 0000000000000bb0 cause: 000000000000000c
> > > [   91.201573][    T1] Kernel panic - not syncing: Attempted to kill
> > > init! exitcode=0x0000000b
> > > [   91.202906][    T1] CPU: 0 PID: 1 Comm: init Not tainted
> > > 5.11.0-rc2-00012-g0983834a8393 #19
> > > [   91.204139][    T1] Call Trace:
> > > [   91.204849][    T1] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0
> > > [   91.206124][    T1] [<ffffffe00458b2d8>] show_stack+0x3a/0x46
> > > [   91.207240][    T1] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180
> > > [   91.208732][    T1] [<ffffffe00458b6a0>] panic+0x20a/0x5cc
> > > [   91.209890][    T1] [<ffffffe00002eea4>] do_exit+0x1846/0x1874
> > > [   91.211052][    T1] [<ffffffe00002efdc>] do_group_exit+0xa0/0x192
> > > [   91.212224][    T1] [<ffffffe000047d30>] get_signal+0x2d6/0x13dc
> > > [   91.213390][    T1] [<ffffffe000007eb0>] do_notify_resume+0xa8/0x912
> > > [   91.214567][    T1] [<ffffffe00000559c>] ret_from_exception+0x0/0x14
> > >
> > > The image is buildroot on 2020.11.x built with this script:
> > > https://gist.githubusercontent.com/dvyukov/1a9a01ca2189e35175a021820c95b04d/raw/5c01d755e83f4eab0d56aa7dc84af3b2d5e80423/gistfile1.txt
> > >
> > > Readelf for init shows the following (is it that [10000+d7000] address
> > > is not .text at all?):
> > >
> > > $ riscv64-linux-gnu-readelf --sections image/bin/busybox
> > > There are 27 section headers, starting at offset 0xd7f20:
> > >
> > > Section Headers:
> > >   [Nr] Name              Type             Address           Offset
> > >        Size              EntSize          Flags  Link  Info  Align
> > >   [ 0]                   NULL             0000000000000000  00000000
> > >        0000000000000000  0000000000000000           0     0     0
> > >   [ 1] .interp           PROGBITS         0000000000010238  00000238
> > >        0000000000000021  0000000000000000   A       0     0     1
> > >   [ 2] .note.ABI-tag     NOTE             000000000001025c  0000025c
> > >        0000000000000020  0000000000000000   A       0     0     4
> > >   [ 3] .hash             HASH             0000000000010280  00000280
> > >        00000000000009cc  0000000000000004   A       5     0     8
> > >   [ 4] .gnu.hash         GNU_HASH         0000000000010c50  00000c50
> > >        0000000000000ac8  0000000000000000   A       5     0     8
> > >   [ 5] .dynsym           DYNSYM           0000000000011718  00001718
> > >        00000000000021f0  0000000000000018   A       6     1     8
> > >   [ 6] .dynstr           STRTAB           0000000000013908  00003908
> > >        0000000000000c66  0000000000000000   A       0     0     1
> > >   [ 7] .gnu.version      VERSYM           000000000001456e  0000456e
> > >        00000000000002d4  0000000000000002   A       5     0     2
> > >   [ 8] .gnu.version_r    VERNEED          0000000000014848  00004848
> > >        0000000000000050  0000000000000000   A       6     2     8
> > >   [ 9] .rela.dyn         RELA             0000000000014898  00004898
> > >        00000000000000c0  0000000000000018   A       5     0     8
> > >   [10] .rela.plt         RELA             0000000000014958  00004958
> > >        00000000000020a0  0000000000000018  AI       5    22     8
> > >   [11] .plt              PROGBITS         0000000000016a00  00006a00
> > >        00000000000015e0  0000000000000010  AX       0     0     16
> > >   [12] .text             PROGBITS         0000000000017fe0  00007fe0
> > >        00000000000a3668  0000000000000000  AX       0     0     4
> > >   [13] .rodata           PROGBITS         00000000000bb648  000ab648
> > >        000000000002b076  0000000000000000   A       0     0     8
> > >   [14] .sdata2           PROGBITS         00000000000e66c0  000d66c0
> > >        0000000000000163  0000000000000000   A       0     0     8
> > >   [15] .eh_frame_hdr     PROGBITS         00000000000e6824  000d6824
> > >        0000000000000014  0000000000000000   A       0     0     4
> > >   [16] .eh_frame         PROGBITS         00000000000e6838  000d6838
> > >        000000000000002c  0000000000000000   A       0     0     8
> > >   [17] .preinit_array    PREINIT_ARRAY    00000000000e7df8  000d6df8
> > >        0000000000000008  0000000000000008  WA       0     0     1
> > >   [18] .init_array       INIT_ARRAY       00000000000e7e00  000d6e00
> > >        0000000000000008  0000000000000008  WA       0     0     8
> > >   [19] .fini_array       FINI_ARRAY       00000000000e7e08  000d6e08
> > >        0000000000000008  0000000000000008  WA       0     0     8
> > >   [20] .dynamic          DYNAMIC          00000000000e7e10  000d6e10
> > >        00000000000001f0  0000000000000010  WA       6     0     8
> > >   [21] .data             PROGBITS         00000000000e8000  000d7000
> > >        0000000000000240  0000000000000000  WA       0     0     8
> > >   [22] .got              PROGBITS         00000000000e8240  000d7240
> > >        0000000000000af8  0000000000000008  WA       0     0     8
> > >   [23] .sdata            PROGBITS         00000000000e8d38  000d7d38
> > >        0000000000000101  0000000000000000  WA       0     0     8
> > >   [24] .sbss             NOBITS           00000000000e8e40  000d7e39
> > >        000000000000017f  0000000000000000  WA       0     0     8
> > >   [25] .bss              NOBITS           00000000000e8fc0  000d7e39
> > >        00000000000005b0  0000000000000000  WA       0     0     8
> > >   [26] .shstrtab         STRTAB           0000000000000000  000d7e39
> > >        00000000000000e6  0000000000000000           0     0     1
> > >
> > >
> > > Before I spent more time on this, am I doing anything obviously wrong?
> > > Is it a known issue? Are there any fresh working recipes?
> >
> > Humm.. I tried to use 2020.05 which Tobias used here:
> > https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_riscv64-kernel.md#image
> > But there is no make qemu_riscv64_virt_defconfig target... though I
> > remember I tested these instructions at the time...
>
> Weird. `make qemu_riscv64_virt_defconfig` works here on buildroot
> 202.05, 2020.11.1 and on latest master.
>
> Do you see these in your configs/ directory?
>
> $ ls -l configs/qemu_riscv*
> -rw-rw-r-- 1 tklauser tklauser 673 Jan 18 15:51 configs/qemu_riscv32_virt_defconfig
> -rw-rw-r-- 1 tklauser tklauser 682 Jan 18 15:51 configs/qemu_riscv64_virt_defconfig

Oh, turned out I previously checked out 2011.05 somehow...
Yes, 2020.05 has qemu_riscv64_virt_defconfig and I am building it now.
2020.11 has the config, but init crashes (see above).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2021-01-18 15:05             ` Dmitry Vyukov
@ 2021-01-18 15:43               ` Dmitry Vyukov
  2021-01-29  7:45                 ` Alex Ghiti
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Vyukov @ 2021-01-18 15:43 UTC (permalink / raw)
  To: Tobias Klauser
  Cc: Palmer Dabbelt, Andreas Schwab, Paul Walmsley, Albert Ou,
	linux-riscv, LKML, nylon7, Bjorn Topel, syzkaller

On Mon, Jan 18, 2021 at 4:05 PM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Mon, Jan 18, 2021 at 3:53 PM Tobias Klauser <tklauser@distanz.ch> wrote:
> > > > On Thu, Jan 14, 2021 at 5:57 AM Palmer Dabbelt <palmerdabbelt@google.com> wrote:
> > > > >
> > > > > On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote:
> > > > > > On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote:
> > > > > >>
> > > > > >> On Dez 25 2020, Dmitry Vyukov wrote:
> > > > > >>
> > > > > >> > qemu-system-riscv64 \
> > > > > >> > -machine virt -bios default -smp 1 -m 2G \
> > > > > >> > -device virtio-blk-device,drive=hd0 \
> > > > > >> > -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \
> > > > > >> > -kernel arch/riscv/boot/Image \
> > > > > >> > -nographic \
> > > > > >> > -device virtio-rng-device,rng=rng0 -object
> > > > > >> > rng-random,filename=/dev/urandom,id=rng0 \
> > > > > >> > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
> > > > > >> > virtio-net-device,netdev=net0 \
> > > > > >> > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
> > > > > >> > panic_on_warn=1 panic=86400"
> > > > > >>
> > > > > >> Do you get more output with earlycon=sbi?
> > > > > >
> > > > > > Hi Andreas,
> > > > > >
> > > > > > For defconfig+kvm_guest.config+ scripts/config -e KASAN -e
> > > > > > KASAN_INLINE it actually gave me more output:
> > > > > >
> > > > > >
> > > > > > OpenSBI v0.7
> > > > > >    ____                    _____ ____ _____
> > > > > >   / __ \                  / ____|  _ \_   _|
> > > > > >  | |  | |_ __   ___ _ __ | (___ | |_) || |
> > > > > >  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
> > > > > >  | |__| | |_) |  __/ | | |____) | |_) || |_
> > > > > >   \____/| .__/ \___|_| |_|_____/|____/_____|
> > > > > >         | |
> > > > > >         |_|
> > > > > >
> > > > > > Platform Name          : QEMU Virt Machine
> > > > > > Platform HART Features : RV64ACDFIMSU
> > > > > > Current Hart           : 0
> > > > > > Firmware Base          : 0x80000000
> > > > > > Firmware Size          : 132 KB
> > > > > > Runtime SBI Version    : 0.2
> > > > > >
> > > > > > MIDELEG : 0x0000000000000222
> > > > > > MEDELEG : 0x000000000000b109
> > > > > > PMP0    : 0x0000000080000000-0x000000008003ffff (A)
> > > > > > PMP1    : 0x0000000000000000-0xffffffffffffffff (A,R,W,X)
> > > > > > [    0.000000] Linux version 5.10.0-01370-g71c5f03154ac
> > > > > > (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc
> > > > > > (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17
> > > > > > SMP Fri Dec 25 18:10:12 CET 2020
> > > > > > [    0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
> > > > > > [    0.000000] earlycon: sbi0 at I/O port 0x0 (options '')
> > > > > > [    0.000000] printk: bootconsole [sbi0] enabled
> > > > > > [    0.000000] efi: UEFI not found.
> > > > > > [    0.000000] Zone ranges:
> > > > > > [    0.000000]   DMA32    [mem 0x0000000080200000-0x00000000ffffffff]
> > > > > > [    0.000000]   Normal   empty
> > > > > > [    0.000000] Movable zone start for each node
> > > > > > [    0.000000] Early memory node ranges
> > > > > > [    0.000000]   node   0: [mem 0x0000000080200000-0x00000000ffffffff]
> > > > > > [    0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff]
> > > > > > [    0.000000] SBI specification v0.2 detected
> > > > > > [    0.000000] SBI implementation ID=0x1 Version=0x7
> > > > > > [    0.000000] SBI v0.2 TIME extension detected
> > > > > > [    0.000000] SBI v0.2 IPI extension detected
> > > > > > [    0.000000] SBI v0.2 RFENCE extension detected
> > > > > > [    0.000000] software IO TLB: mapped [mem
> > > > > > 0x00000000fa3f9000-0x00000000fe3f9000] (64MB)
> > > > > > [    0.000000] Unable to handle kernel paging request at virtual
> > > > > > address dfffffc810040000
> > > > > > [    0.000000] Oops [#1]
> > > > > > [    0.000000] Modules linked in:
> > > > > > [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
> > > > > > 5.10.0-01370-g71c5f03154ac #17
> > > > > > [    0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0
> > > > > > [    0.000000]  gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 :
> > > > > > dfffffc810040000
> > > > > > [    0.000000]  t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 :
> > > > > > ffffffe001603f50
> > > > > > [    0.000000]  s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 :
> > > > > > 0000000000000000
> > > > > > [    0.000000]  a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 :
> > > > > > 0000000000000000
> > > > > > [    0.000000]  a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 :
> > > > > > 0000000000000900
> > > > > > [    0.000000]  s2 : dfffffc820000000 s3 : dfffffc800000000 s4 :
> > > > > > 0000000000000001
> > > > > > [    0.000000]  s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 :
> > > > > > dfffffc810040000
> > > > > > [    0.000000]  s8 : 0000000000000080 s9 : ffffffffffffffff s10:
> > > > > > ffffffe07a119000
> > > > > > [    0.000000]  s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 :
> > > > > > 0000000000000001
> > > > > > [    0.000000]  t5 : ffffffc4001c150a t6 : ffffffe001603be8
> > > > > > [    0.000000] status: 0000000000000100 badaddr: dfffffc810040000
> > > > > > cause: 000000000000000f
> > > > > > [    0.000000] random: get_random_bytes called from
> > > > > > oops_exit+0x30/0x58 with crng_init=0
> > > > > > [    0.000000] ---[ end trace 0000000000000000 ]---
> > > > > > [    0.000000] Kernel panic - not syncing: Fatal exception
> > > > > > [    0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]---
> > > > > >
> > > > > >
> > > > > > But I first tried with a the kernel image I had in the dir, I think it
> > > > > > was this config (no KASAN):
> > > > > > https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt
> > > > > >
> > > > > > and earlycon=sbi did not change anything (no output after OpenSBI).
> > > > > > So potentially there are 2 different problems.
> > > > >
> > > > > Thanks for reporting this.  Looks like I'd forgotten to add a kasan config to
> > > > > my tests.  There's one in there now, and it's passing as of the fix that Nylon
> > > > > posted.
> > > >
> > > > I can boot the KASAN kernel now on riscv/fixes.
> > > >
> > > > Next problem: I've got only to:
> > > >
> > > > [   90.498967][    T1] Run /sbin/init as init process
> > > > [   91.164353][ T4022] init[4022]: unhandled signal 11 code 0x1 at
> > > > 0x0000000000000bb0 in busybox[10000+d7000]
> > > > [   91.179640][ T4022] CPU: 1 PID: 4022 Comm: init Not tainted
> > > > 5.11.0-rc2-00012-g0983834a8393 #19
> > > > [   91.180853][ T4022] epc: 0000000000000bb0 ra : 0000003fccab09d0 sp
> > > > : 0000003fffa8c7b0
> > > > [   91.181861][ T4022]  gp : 00000000000e8d70 tp : 0000003fccaaf820 t0
> > > > : 000000000000001e
> > > > [   91.182810][ T4022]  t1 : 0000003fccab0bfc t2 : 000000000000000a s0
> > > > : 0000003fffa8c850
> > > > [   91.183749][ T4022]  s1 : 0000003fccab1070 a0 : 0000003fccab1070 a1
> > > > : 0000003fffa8c8c8
> > > > [   91.184689][ T4022]  a2 : 0000000000000001 a3 : 0000000000000020 a4
> > > > : 0000000000000000
> > > > [   91.185620][ T4022]  a5 : 0000000000000000 a6 : 0000003fcc9c4260 a7
> > > > : fffffffffffffffe
> > > > [   91.186566][ T4022]  s2 : 0000000000000000 s3 : 0000003fffa8c8c8 s4
> > > > : 0000003fccab1000
> > > > [   91.187500][ T4022]  s5 : 0000003fccab1078 s6 : 0000003fffa8c8d0 s7
> > > > : 0000000000000010
> > > > [   91.189672][ T4022]  s8 : 0000000000000016 s9 : 0000000000000000
> > > > s10: 0000003fffa8c8c8
> > > > [   91.190637][ T4022]  s11: 0000000000000000 t3 : 0000000000000bb0 t4
> > > > : 0000000000000000
> > > > [   91.191568][ T4022]  t5 : 0000003fffa8c360 t6 : 0000000000000000
> > > > [   91.192389][ T4022] status: 8000000000004020 badaddr:
> > > > 0000000000000bb0 cause: 000000000000000c
> > > > [   91.201573][    T1] Kernel panic - not syncing: Attempted to kill
> > > > init! exitcode=0x0000000b
> > > > [   91.202906][    T1] CPU: 0 PID: 1 Comm: init Not tainted
> > > > 5.11.0-rc2-00012-g0983834a8393 #19
> > > > [   91.204139][    T1] Call Trace:
> > > > [   91.204849][    T1] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0
> > > > [   91.206124][    T1] [<ffffffe00458b2d8>] show_stack+0x3a/0x46
> > > > [   91.207240][    T1] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180
> > > > [   91.208732][    T1] [<ffffffe00458b6a0>] panic+0x20a/0x5cc
> > > > [   91.209890][    T1] [<ffffffe00002eea4>] do_exit+0x1846/0x1874
> > > > [   91.211052][    T1] [<ffffffe00002efdc>] do_group_exit+0xa0/0x192
> > > > [   91.212224][    T1] [<ffffffe000047d30>] get_signal+0x2d6/0x13dc
> > > > [   91.213390][    T1] [<ffffffe000007eb0>] do_notify_resume+0xa8/0x912
> > > > [   91.214567][    T1] [<ffffffe00000559c>] ret_from_exception+0x0/0x14
> > > >
> > > > The image is buildroot on 2020.11.x built with this script:
> > > > https://gist.githubusercontent.com/dvyukov/1a9a01ca2189e35175a021820c95b04d/raw/5c01d755e83f4eab0d56aa7dc84af3b2d5e80423/gistfile1.txt
> > > >
> > > > Readelf for init shows the following (is it that [10000+d7000] address
> > > > is not .text at all?):
> > > >
> > > > $ riscv64-linux-gnu-readelf --sections image/bin/busybox
> > > > There are 27 section headers, starting at offset 0xd7f20:
> > > >
> > > > Section Headers:
> > > >   [Nr] Name              Type             Address           Offset
> > > >        Size              EntSize          Flags  Link  Info  Align
> > > >   [ 0]                   NULL             0000000000000000  00000000
> > > >        0000000000000000  0000000000000000           0     0     0
> > > >   [ 1] .interp           PROGBITS         0000000000010238  00000238
> > > >        0000000000000021  0000000000000000   A       0     0     1
> > > >   [ 2] .note.ABI-tag     NOTE             000000000001025c  0000025c
> > > >        0000000000000020  0000000000000000   A       0     0     4
> > > >   [ 3] .hash             HASH             0000000000010280  00000280
> > > >        00000000000009cc  0000000000000004   A       5     0     8
> > > >   [ 4] .gnu.hash         GNU_HASH         0000000000010c50  00000c50
> > > >        0000000000000ac8  0000000000000000   A       5     0     8
> > > >   [ 5] .dynsym           DYNSYM           0000000000011718  00001718
> > > >        00000000000021f0  0000000000000018   A       6     1     8
> > > >   [ 6] .dynstr           STRTAB           0000000000013908  00003908
> > > >        0000000000000c66  0000000000000000   A       0     0     1
> > > >   [ 7] .gnu.version      VERSYM           000000000001456e  0000456e
> > > >        00000000000002d4  0000000000000002   A       5     0     2
> > > >   [ 8] .gnu.version_r    VERNEED          0000000000014848  00004848
> > > >        0000000000000050  0000000000000000   A       6     2     8
> > > >   [ 9] .rela.dyn         RELA             0000000000014898  00004898
> > > >        00000000000000c0  0000000000000018   A       5     0     8
> > > >   [10] .rela.plt         RELA             0000000000014958  00004958
> > > >        00000000000020a0  0000000000000018  AI       5    22     8
> > > >   [11] .plt              PROGBITS         0000000000016a00  00006a00
> > > >        00000000000015e0  0000000000000010  AX       0     0     16
> > > >   [12] .text             PROGBITS         0000000000017fe0  00007fe0
> > > >        00000000000a3668  0000000000000000  AX       0     0     4
> > > >   [13] .rodata           PROGBITS         00000000000bb648  000ab648
> > > >        000000000002b076  0000000000000000   A       0     0     8
> > > >   [14] .sdata2           PROGBITS         00000000000e66c0  000d66c0
> > > >        0000000000000163  0000000000000000   A       0     0     8
> > > >   [15] .eh_frame_hdr     PROGBITS         00000000000e6824  000d6824
> > > >        0000000000000014  0000000000000000   A       0     0     4
> > > >   [16] .eh_frame         PROGBITS         00000000000e6838  000d6838
> > > >        000000000000002c  0000000000000000   A       0     0     8
> > > >   [17] .preinit_array    PREINIT_ARRAY    00000000000e7df8  000d6df8
> > > >        0000000000000008  0000000000000008  WA       0     0     1
> > > >   [18] .init_array       INIT_ARRAY       00000000000e7e00  000d6e00
> > > >        0000000000000008  0000000000000008  WA       0     0     8
> > > >   [19] .fini_array       FINI_ARRAY       00000000000e7e08  000d6e08
> > > >        0000000000000008  0000000000000008  WA       0     0     8
> > > >   [20] .dynamic          DYNAMIC          00000000000e7e10  000d6e10
> > > >        00000000000001f0  0000000000000010  WA       6     0     8
> > > >   [21] .data             PROGBITS         00000000000e8000  000d7000
> > > >        0000000000000240  0000000000000000  WA       0     0     8
> > > >   [22] .got              PROGBITS         00000000000e8240  000d7240
> > > >        0000000000000af8  0000000000000008  WA       0     0     8
> > > >   [23] .sdata            PROGBITS         00000000000e8d38  000d7d38
> > > >        0000000000000101  0000000000000000  WA       0     0     8
> > > >   [24] .sbss             NOBITS           00000000000e8e40  000d7e39
> > > >        000000000000017f  0000000000000000  WA       0     0     8
> > > >   [25] .bss              NOBITS           00000000000e8fc0  000d7e39
> > > >        00000000000005b0  0000000000000000  WA       0     0     8
> > > >   [26] .shstrtab         STRTAB           0000000000000000  000d7e39
> > > >        00000000000000e6  0000000000000000           0     0     1
> > > >
> > > >
> > > > Before I spent more time on this, am I doing anything obviously wrong?
> > > > Is it a known issue? Are there any fresh working recipes?
> > >
> > > Humm.. I tried to use 2020.05 which Tobias used here:
> > > https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_riscv64-kernel.md#image
> > > But there is no make qemu_riscv64_virt_defconfig target... though I
> > > remember I tested these instructions at the time...
> >
> > Weird. `make qemu_riscv64_virt_defconfig` works here on buildroot
> > 202.05, 2020.11.1 and on latest master.
> >
> > Do you see these in your configs/ directory?
> >
> > $ ls -l configs/qemu_riscv*
> > -rw-rw-r-- 1 tklauser tklauser 673 Jan 18 15:51 configs/qemu_riscv32_virt_defconfig
> > -rw-rw-r-- 1 tklauser tklauser 682 Jan 18 15:51 configs/qemu_riscv64_virt_defconfig
>
> Oh, turned out I previously checked out 2011.05 somehow...
> Yes, 2020.05 has qemu_riscv64_virt_defconfig and I am building it now.
> 2020.11 has the config, but init crashes (see above).


2020.05 is a bit better, but still failed in several ways.
First, a number of user-space services including sshd still crashed.
Second, kernel also crashed a bit later.
And 2020.11 seems to regress even more.
It's with the same kernel from the previous email (I did not rebuilt it).


2020.05 buildroot:
[   90.381218][    T1] devtmpfs: mounted
[   90.534531][    T1] Freeing unused kernel memory: 2328K
[   90.537085][    T1] Run /sbin/init as init process
[   91.754610][ T4022] EXT4-fs (vda): re-mounted. Opts: (null). Quota
mode: none.
Starting syslogd: OK
Starting klogd: OK
Running sysctl: OK
Populating /dev using udev: [   99.413418][ T4051] udevd[4051]:
starting version 3.2.9
[  100.480500][ T4052] udevd[4052]: starting eudev-3.2.9
[  101.904876][ T4052] udevd[4052]: unhandled signal 11 code 0x1 at
0x0000000000000bb0 in udevd[10000+35000]
[  101.911401][ T4052] CPU: 1 PID: 4052 Comm: udevd Not tainted
5.11.0-rc2-00012-g0983834a8393 #19
[  101.913136][ T4052] epc: 0000000000000bb0 ra : 0000003ff5921872 sp
: 0000003fffb0c3a0
[  101.914593][ T4052]  gp : 000000000004f908 tp : 0000003ff552b720 t0
: 0000003ff5943160
[  101.915740][ T4052]  t1 : 0000003ff5921bec t2 : 000000000004f450 s0
: 0000003fffb0c440
[  101.916872][ T4052]  s1 : 0000003ff5922000 a0 : 0000003ff5922000 a1
: 0000003fffb0c460
[  101.949318][ T4052]  a2 : 0000000000000001 a3 : 0000000000000002 a4
: 0000000000000002
[  101.950529][ T4052]  a5 : 000000000000000f a6 : 0000000000000007 a7
: 0000000000000016
[  101.951653][ T4052]  s2 : 0000000000000001 s3 : 0000003fffb0c460 s4
: 0000003ff5922030
[  101.952771][ T4052]  s5 : 0000003ff5922010 s6 : 0000000000000000 s7
: 0000000000000000
[  101.953878][ T4052]  s8 : 0000003ff5922004 s9 : 0000003ff5922010
s10: 0000003ff5922008
[  101.955016][ T4052]  s11: 0000003ff5922038 t3 : 0000000000000bb0 t4
: 0000000000000002
[  101.956122][ T4052]  t5 : 0000000000000002 t6 : 0000000000003d40
[  101.957072][ T4052] status: 8000000000004020 badaddr:
0000000000000bb0 cause: 000000000000000c
[  154.349233][ T4055] udevadm[4055]: unhandled signal 11 code 0x1 at
0x0000000000000bb0 in udevadm[10000+38000]
[  154.351201][ T4055] CPU: 0 PID: 4055 Comm: udevadm Not tainted
5.11.0-rc2-00012-g0983834a8393 #19
[  154.352227][ T4055] epc: 0000000000000bb0 ra : 0000003ff2cd3872 sp
: 0000003fffe26a50
[  154.353136][ T4055]  gp : 0000000000052808 tp : 0000003ff28dd720 t0
: 0000003ff2cf5160
[  154.354047][ T4055]  t1 : 0000003ff2cd3bec t2 : 0000000000052570 s0
: 0000003fffe26af0
[  154.354957][ T4055]  s1 : 0000003ff2cd4000 a0 : 0000003ff2cd4000 a1
: 0000003fffe26b10
[  154.355860][ T4055]  a2 : 000000000003d790 a3 : 0000000000000002 a4
: 0000000000000002
[  154.356739][ T4055]  a5 : 000000000000000f a6 : ffffffffffffffff a7
: 0000000000000000
[  154.366998][ T4055]  s2 : 0000000000000001 s3 : 0000003fffe26b10 s4
: 0000003ff2cd4030
[  154.372223][ T4055]  s5 : 0000003ff2cd4010 s6 : 0000000000000068 s7
: 000000000003d000
[  154.373192][ T4055]  s8 : 0000003ff2cd4004 s9 : 0000003ff2cd4010
s10: 0000003ff2cd4008
[  154.374114][ T4055]  s11: 0000003ff2cd4038 t3 : 0000000000000bb0 t4
: 0000000000000002
[  154.375023][ T4055]  t5 : 0000000000000002 t6 : 0000000000003d40
[  154.375793][ T4055] status: 0000000000004020 badaddr:
0000000000000bb0 cause: 000000000000000c
Segmentation fault
udevadm settle failed
done
Saving random seed: OK
Starting network: [  160.769276][ T4073] 8021q: adding VLAN 0 to HW
filter on device eth0
udhcpc: started, v1.31.1
[  161.642968][ T4074] udhcpc[4074]: unhandled signal 11 code 0x1 at
0x0000000000000bb0 in busybox[10000+d6000]
[  161.645275][ T4074] CPU: 0 PID: 4074 Comm: udhcpc Not tainted
5.11.0-rc2-00012-g0983834a8393 #19
[  161.646515][ T4074] epc: 0000000000000bb0 ra : 0000003fd4d43872 sp
: 0000003fffedf5c0
[  161.661669][ T4074]  gp : 00000000000e7c90 tp : 0000003fd4d42820 t0
: 0000003fd4d65160
[  161.662875][ T4074]  t1 : 0000003fd4d43bec t2 : 00000000000e7960 s0
: 0000003fffedf660
[  161.663979][ T4074]  s1 : 0000003fd4d44000 a0 : 0000003fd4d44000 a1
: 0000003fffedf690
[  161.665110][ T4074]  a2 : 0000000000000019 a3 : 0000000000000002 a4
: 0000000000000002
[  161.666351][ T4074]  a5 : 000000000000000f a6 : fefefefefefefeff a7
: 0000000000000040
[  161.668642][ T4074]  s2 : 0000000000000001 s3 : 0000003fffedf690 s4
: 0000003fd4d44030
[  161.669785][ T4074]  s5 : 0000003fd4d44010 s6 : 00000000149d82c3 s7
: 00000000000000fe
[  161.670921][ T4074]  s8 : 0000003fd4d44004 s9 : 0000003fd4d44010
s10: 0000003fd4d44008
[  161.672150][ T4074]  s11: 0000003fd4d44038 t3 : 0000000000000bb0 t4
: 0000000000000002
[  161.673355][ T4074]  t5 : 0000000000000002 t6 : 0000000000003d40
[  161.674303][ T4074] status: 8000000000004020 badaddr:
0000000000000bb0 cause: 000000000000000c
FAIL
Starting dhcpcd...
[  162.771471][ T4077] dhcpcd[4077]: unhandled signal 11 code 0x1 at
0x0000000000000bb0 in dhcpcd[10000+39000]
[  162.773414][ T4077] CPU: 0 PID: 4077 Comm: dhcpcd Not tainted
5.11.0-rc2-00012-g0983834a8393 #19
[  162.774462][ T4077] epc: 0000000000000bb0 ra : 0000003fe6d12872 sp
: 0000003fff8527e0
[  162.775366][ T4077]  gp : 000000000004adb8 tp : 0000003fe6d11250 t0
: 0000003fe6d34160
[  162.776274][ T4077]  t1 : 0000003fe6d12bec t2 : 0000000000049a00 s0
: 0000003fff852880
[  162.777167][ T4077]  s1 : 0000003fe6d13000 a0 : 0000003fe6d13000 a1
: 0000003fff8528a0
[  162.779363][ T4077]  a2 : 0000000000000004 a3 : 0000000000000002 a4
: 0000000000000002
[  162.780279][ T4077]  a5 : 000000000000000f a6 : 7efefefefefefeff a7
: fffffffffffff000
[  162.781194][ T4077]  s2 : 0000000000000001 s3 : 0000003fff8528a0 s4
: 0000003fe6d13030
[  162.782106][ T4077]  s5 : 0000003fe6d13010 s6 : 0000000000000000 s7
: 0000000000000000
[  162.783015][ T4077]  s8 : 0000003fe6d13004 s9 : 0000003fe6d13010
s10: 0000003fe6d13008
[  162.783940][ T4077]  s11: 0000003fe6d13038 t3 : 0000000000000bb0 t4
: 0000000000000002
[  162.784853][ T4077]  t5 : 0000000000000002 t6 : 0000000000003d40
[  162.785618][ T4077] status: 8000000000006020 badaddr:
0000000000000bb0 cause: 000000000000000c
Segmentation fault
[  164.074891][ T4079] ssh-keygen[4079]: unhandled signal 11 code 0x1
at 0x0000000000000bb0 in ssh-keygen[2ac3c68000+63000]
[  164.076916][ T4079] CPU: 1 PID: 4079 Comm: ssh-keygen Not tainted
5.11.0-rc2-00012-g0983834a8393 #19
[  164.096635][ T4079] epc: 0000000000000bb0 ra : 0000003ff6899872 sp
: 0000003fffed1330
[  164.099233][ T4079]  gp : 0000002ac3ccd448 tp : 0000003ff6435cd0 t0
: 0000003ff6897000
[  164.100457][ T4079]  t1 : 0000003ff6899bec t2 : 0000003ff6891940 s0
: 0000003fffed13d0
[  164.101578][ T4079]  s1 : 0000003ff689a000 a0 : 0000003ff689a000 a1
: 0000003fffed13f8
[  164.102914][ T4079]  a2 : 0000000000000000 a3 : 0000000000000001 a4
: 0000000000000001
[  164.104058][ T4079]  a5 : 000000000000000f a6 : 0000000000000000 a7
: 00000000000000ac
[  164.105150][ T4079]  s2 : 0000000000000000 s3 : 0000003fffed13f8 s4
: 0000003ff689a020
[  164.106241][ T4079]  s5 : 0000003ff689a000 s6 : 0000003fd1861830 s7
: ffffffffffffffff
[  164.113694][ T4079]  s8 : 0000003ff689a004 s9 : 0000003ff689a010
s10: 0000003ff689a008
[  164.114869][ T4079]  s11: 0000003ff689a028 t3 : 0000000000000bb0 t4
: 0000000000000002
[  164.115972][ T4079]  t5 : 0000000000000002 t6 : 0000000000003d40
[  164.128360][ T4079] status: 8000000000004020 badaddr:
0000000000000bb0 cause: 000000000000000c
Segmentation fault
Starting sshd: [  164.872315][ T4080] sshd[4080]: unhandled signal 11
code 0x1 at 0x0000000000000bb0 in sshd[2ac7ea7000+a4000]
[  164.874297][ T4080] CPU: 1 PID: 4080 Comm: sshd Not tainted
5.11.0-rc2-00012-g0983834a8393 #19
[  164.875331][ T4080] epc: 0000000000000bb0 ra : 0000003ff2222872 sp
: 0000003fffbea300
[  164.876230][ T4080]  gp : 0000002ac7f4f9d0 tp : 0000003ff1dbecd0 t0
: 0000003ff2220000
[  164.877146][ T4080]  t1 : 0000003ff2222bec t2 : 0000003ff221a940 s0
: 0000003fffbea3a0
[  164.892174][ T4080]  s1 : 0000003ff2223000 a0 : 0000003ff2223000 a1
: 0000003fffbea3c8
[  164.893137][ T4080]  a2 : 0000000000000000 a3 : 0000000000000001 a4
: 0000000000000001
[  164.894065][ T4080]  a5 : 000000000000000f a6 : 0000000000000000 a7
: 00000000000000ac
[  164.895013][ T4080]  s2 : 0000000000000000 s3 : 0000003fffbea3c8 s4
: 0000003ff2223020
[  164.895947][ T4080]  s5 : 0000003ff2223000 s6 : 0000003fd1861830 s7
: ffffffffffffffff
[  164.896881][ T4080]  s8 : 0000003ff2223004 s9 : 0000003ff2223010
s10: 0000003ff2223008
[  164.905684][ T4080]  s11: 0000003ff2223028 t3 : 0000000000000bb0 t4
: 0000000000000002
[  164.906679][ T4080]  t5 : 0000000000000002 t6 : 0000000000003d40
[  164.908565][ T4080] status: 8000000000004020 badaddr:
0000000000000bb0 cause: 000000000000000c
Segmentation fault
OK
syzkaller
syzkaller login: [  167.973016][ T4082] ------------[ cut here ]------------
[  167.975887][ T4082] virt_to_phys used for non-linear address:
0000000059ffc026 (0xffffffd0158d105e)
[  167.979939][ T4082] WARNING: CPU: 0 PID: 4082 at
arch/riscv/mm/physaddr.c:16 __virt_to_phys+0x74/0x78
[  167.988658][ T4082] Modules linked in:
[  167.989781][ T4082] CPU: 0 PID: 4082 Comm: getty Not tainted
5.11.0-rc2-00012-g0983834a8393 #19
[  167.991063][ T4082] epc: ffffffe000011164 ra : ffffffe000011164 sp
: ffffffe01354fb10
[  167.992243][ T4082]  gp : ffffffe006234420 tp : ffffffe009c8ad80 t0
: ffffffe006cafb67
[  167.993384][ T4082]  t1 : 0000000000000001 t2 : 0000000000000000 s0
: ffffffe01354fb40
[  167.994531][ T4082]  s1 : fffffff0158d105e a0 : 000000000000004f a1
: 00000000000f0000
[  167.995690][ T4082]  a2 : 0000000000000002 a3 : ffffffe0000d1a30 a4
: 763e2d90a60ec500
[  167.996803][ T4082]  a5 : 763e2d90a60ec500 a6 : 0000000000f00000 a7
: ffffffe00009481c
[  167.999690][ T4082]  s2 : ffffffd0158d105e s3 : 0000001fffffffff s4
: 0000000000000001
[  168.000898][ T4082]  s5 : ffffffd0158d105f s6 : ffffffd0158d3260 s7
: 0000003fff81eac8
[  168.002093][ T4082]  s8 : ffffffd0158d105e s9 : 0000000000000001
s10: 0000000000000000
[  168.003226][ T4082]  s11: 0000000000000000 t3 : 763e2d90a60ec500 t4
: ffffffc4026a9efd
[  168.004361][ T4082]  t5 : ffffffc4026a9eff t6 : ffffffe01354f7f8
[  168.005328][ T4082] status: 0000000000000120 badaddr:
0000000000000000 cause: 0000000000000003
[  168.006756][ T4082] Kernel panic - not syncing: panic_on_warn set ...
[  168.008056][ T4082] CPU: 0 PID: 4082 Comm: getty Not tainted
5.11.0-rc2-00012-g0983834a8393 #19
[  168.009301][ T4082] Call Trace:
[  168.009969][ T4082] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0
[  168.011166][ T4082] [<ffffffe00458b2d8>] show_stack+0x3a/0x46
[  168.012215][ T4082] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180
[  168.013262][ T4082] [<ffffffe00458b6a0>] panic+0x20a/0x5cc
[  168.014264][ T4082] [<ffffffe000024210>] __warn+0x110/0x20a
[  168.015285][ T4082] [<ffffffe001759424>] report_bug+0x156/0x200
[  168.016324][ T4082] [<ffffffe0000093f6>] do_trap_break+0xa6/0x152
[  168.017431][ T4082] [<ffffffe00000559c>] ret_from_exception+0x0/0x14
[  168.018560][ T4082] [<ffffffe0018c97bc>] n_tty_read+0x908/0x115a
[  168.020124][ T4082] SMP: stopping secondary CPUs
[  168.022087][ T4082] Rebooting in 86400 seconds..

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2021-01-18 15:43               ` Dmitry Vyukov
@ 2021-01-29  7:45                 ` Alex Ghiti
       [not found]                   ` <CACT4Y+adSjve7bXRPh5UybCQx6ubOUu5RbwuT620wdcxHzVYJg@mail.gmail.com>
  0 siblings, 1 reply; 27+ messages in thread
From: Alex Ghiti @ 2021-01-29  7:45 UTC (permalink / raw)
  To: Dmitry Vyukov, Tobias Klauser
  Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller,
	Andreas Schwab, Paul Walmsley, linux-riscv

Hi Dmitry,

On 1/18/21 10:43 AM, Dmitry Vyukov wrote:
> On Mon, Jan 18, 2021 at 4:05 PM Dmitry Vyukov <dvyukov@google.com> wrote:
>>
>> On Mon, Jan 18, 2021 at 3:53 PM Tobias Klauser <tklauser@distanz.ch> wrote:
>>>>> On Thu, Jan 14, 2021 at 5:57 AM Palmer Dabbelt <palmerdabbelt@google.com> wrote:
>>>>>>
>>>>>> On Fri, 25 Dec 2020 09:13:23 PST (-0800), dvyukov@google.com wrote:
>>>>>>> On Fri, Dec 25, 2020 at 5:58 PM Andreas Schwab <schwab@linux-m68k.org> wrote:
>>>>>>>>
>>>>>>>> On Dez 25 2020, Dmitry Vyukov wrote:
>>>>>>>>
>>>>>>>>> qemu-system-riscv64 \
>>>>>>>>> -machine virt -bios default -smp 1 -m 2G \
>>>>>>>>> -device virtio-blk-device,drive=hd0 \
>>>>>>>>> -drive file=buildroot-riscv64.ext4,if=none,format=raw,id=hd0 \
>>>>>>>>> -kernel arch/riscv/boot/Image \
>>>>>>>>> -nographic \
>>>>>>>>> -device virtio-rng-device,rng=rng0 -object
>>>>>>>>> rng-random,filename=/dev/urandom,id=rng0 \
>>>>>>>>> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
>>>>>>>>> virtio-net-device,netdev=net0 \
>>>>>>>>> -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
>>>>>>>>> panic_on_warn=1 panic=86400"
>>>>>>>>
>>>>>>>> Do you get more output with earlycon=sbi?
>>>>>>>
>>>>>>> Hi Andreas,
>>>>>>>
>>>>>>> For defconfig+kvm_guest.config+ scripts/config -e KASAN -e
>>>>>>> KASAN_INLINE it actually gave me more output:
>>>>>>>
>>>>>>>
>>>>>>> OpenSBI v0.7
>>>>>>>     ____                    _____ ____ _____
>>>>>>>    / __ \                  / ____|  _ \_   _|
>>>>>>>   | |  | |_ __   ___ _ __ | (___ | |_) || |
>>>>>>>   | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
>>>>>>>   | |__| | |_) |  __/ | | |____) | |_) || |_
>>>>>>>    \____/| .__/ \___|_| |_|_____/|____/_____|
>>>>>>>          | |
>>>>>>>          |_|
>>>>>>>
>>>>>>> Platform Name          : QEMU Virt Machine
>>>>>>> Platform HART Features : RV64ACDFIMSU
>>>>>>> Current Hart           : 0
>>>>>>> Firmware Base          : 0x80000000
>>>>>>> Firmware Size          : 132 KB
>>>>>>> Runtime SBI Version    : 0.2
>>>>>>>
>>>>>>> MIDELEG : 0x0000000000000222
>>>>>>> MEDELEG : 0x000000000000b109
>>>>>>> PMP0    : 0x0000000080000000-0x000000008003ffff (A)
>>>>>>> PMP1    : 0x0000000000000000-0xffffffffffffffff (A,R,W,X)
>>>>>>> [    0.000000] Linux version 5.10.0-01370-g71c5f03154ac
>>>>>>> (dvyukov@dvyukov-desk.muc.corp.google.com) (riscv64-linux-gnu-gcc
>>>>>>> (Debian 10.2.0-9) 10.2.0, GNU ld (GNU Binutils for Debian) 2.35.1) #17
>>>>>>> SMP Fri Dec 25 18:10:12 CET 2020
>>>>>>> [    0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
>>>>>>> [    0.000000] earlycon: sbi0 at I/O port 0x0 (options '')
>>>>>>> [    0.000000] printk: bootconsole [sbi0] enabled
>>>>>>> [    0.000000] efi: UEFI not found.
>>>>>>> [    0.000000] Zone ranges:
>>>>>>> [    0.000000]   DMA32    [mem 0x0000000080200000-0x00000000ffffffff]
>>>>>>> [    0.000000]   Normal   empty
>>>>>>> [    0.000000] Movable zone start for each node
>>>>>>> [    0.000000] Early memory node ranges
>>>>>>> [    0.000000]   node   0: [mem 0x0000000080200000-0x00000000ffffffff]
>>>>>>> [    0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000ffffffff]
>>>>>>> [    0.000000] SBI specification v0.2 detected
>>>>>>> [    0.000000] SBI implementation ID=0x1 Version=0x7
>>>>>>> [    0.000000] SBI v0.2 TIME extension detected
>>>>>>> [    0.000000] SBI v0.2 IPI extension detected
>>>>>>> [    0.000000] SBI v0.2 RFENCE extension detected
>>>>>>> [    0.000000] software IO TLB: mapped [mem
>>>>>>> 0x00000000fa3f9000-0x00000000fe3f9000] (64MB)
>>>>>>> [    0.000000] Unable to handle kernel paging request at virtual
>>>>>>> address dfffffc810040000
>>>>>>> [    0.000000] Oops [#1]
>>>>>>> [    0.000000] Modules linked in:
>>>>>>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
>>>>>>> 5.10.0-01370-g71c5f03154ac #17
>>>>>>> [    0.000000] epc: ffffffe00042e3e4 ra : ffffffe000c0462c sp : ffffffe001603ea0
>>>>>>> [    0.000000]  gp : ffffffe0016e3c60 tp : ffffffe00160cd40 t0 :
>>>>>>> dfffffc810040000
>>>>>>> [    0.000000]  t1 : ffffffe000e0a838 t2 : 0000000000000000 s0 :
>>>>>>> ffffffe001603f50
>>>>>>> [    0.000000]  s1 : ffffffe0016e50a8 a0 : dfffffc810040000 a1 :
>>>>>>> 0000000000000000
>>>>>>> [    0.000000]  a2 : 000000000ffc0000 a3 : dfffffc820000000 a4 :
>>>>>>> 0000000000000000
>>>>>>> [    0.000000]  a5 : 000000003e8c6001 a6 : ffffffe000e0a820 a7 :
>>>>>>> 0000000000000900
>>>>>>> [    0.000000]  s2 : dfffffc820000000 s3 : dfffffc800000000 s4 :
>>>>>>> 0000000000000001
>>>>>>> [    0.000000]  s5 : ffffffe0016e5108 s6 : fffffffffffff000 s7 :
>>>>>>> dfffffc810040000
>>>>>>> [    0.000000]  s8 : 0000000000000080 s9 : ffffffffffffffff s10:
>>>>>>> ffffffe07a119000
>>>>>>> [    0.000000]  s11: 000000000000ffc0 t3 : ffffffe0016eb908 t4 :
>>>>>>> 0000000000000001
>>>>>>> [    0.000000]  t5 : ffffffc4001c150a t6 : ffffffe001603be8
>>>>>>> [    0.000000] status: 0000000000000100 badaddr: dfffffc810040000
>>>>>>> cause: 000000000000000f
>>>>>>> [    0.000000] random: get_random_bytes called from
>>>>>>> oops_exit+0x30/0x58 with crng_init=0
>>>>>>> [    0.000000] ---[ end trace 0000000000000000 ]---
>>>>>>> [    0.000000] Kernel panic - not syncing: Fatal exception
>>>>>>> [    0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]---
>>>>>>>
>>>>>>>
>>>>>>> But I first tried with a the kernel image I had in the dir, I think it
>>>>>>> was this config (no KASAN):
>>>>>>> https://gist.githubusercontent.com/dvyukov/b2b62beccf80493781ab03b41430e616/raw/62e673cff08a8a41656d2871b8a37f74b00f509f/gistfile1.txt
>>>>>>>
>>>>>>> and earlycon=sbi did not change anything (no output after OpenSBI).
>>>>>>> So potentially there are 2 different problems.
>>>>>>
>>>>>> Thanks for reporting this.  Looks like I'd forgotten to add a kasan config to
>>>>>> my tests.  There's one in there now, and it's passing as of the fix that Nylon
>>>>>> posted.
>>>>>
>>>>> I can boot the KASAN kernel now on riscv/fixes.
>>>>>
>>>>> Next problem: I've got only to:
>>>>>
>>>>> [   90.498967][    T1] Run /sbin/init as init process
>>>>> [   91.164353][ T4022] init[4022]: unhandled signal 11 code 0x1 at
>>>>> 0x0000000000000bb0 in busybox[10000+d7000]
>>>>> [   91.179640][ T4022] CPU: 1 PID: 4022 Comm: init Not tainted
>>>>> 5.11.0-rc2-00012-g0983834a8393 #19
>>>>> [   91.180853][ T4022] epc: 0000000000000bb0 ra : 0000003fccab09d0 sp
>>>>> : 0000003fffa8c7b0
>>>>> [   91.181861][ T4022]  gp : 00000000000e8d70 tp : 0000003fccaaf820 t0
>>>>> : 000000000000001e
>>>>> [   91.182810][ T4022]  t1 : 0000003fccab0bfc t2 : 000000000000000a s0
>>>>> : 0000003fffa8c850
>>>>> [   91.183749][ T4022]  s1 : 0000003fccab1070 a0 : 0000003fccab1070 a1
>>>>> : 0000003fffa8c8c8
>>>>> [   91.184689][ T4022]  a2 : 0000000000000001 a3 : 0000000000000020 a4
>>>>> : 0000000000000000
>>>>> [   91.185620][ T4022]  a5 : 0000000000000000 a6 : 0000003fcc9c4260 a7
>>>>> : fffffffffffffffe
>>>>> [   91.186566][ T4022]  s2 : 0000000000000000 s3 : 0000003fffa8c8c8 s4
>>>>> : 0000003fccab1000
>>>>> [   91.187500][ T4022]  s5 : 0000003fccab1078 s6 : 0000003fffa8c8d0 s7
>>>>> : 0000000000000010
>>>>> [   91.189672][ T4022]  s8 : 0000000000000016 s9 : 0000000000000000
>>>>> s10: 0000003fffa8c8c8
>>>>> [   91.190637][ T4022]  s11: 0000000000000000 t3 : 0000000000000bb0 t4
>>>>> : 0000000000000000
>>>>> [   91.191568][ T4022]  t5 : 0000003fffa8c360 t6 : 0000000000000000
>>>>> [   91.192389][ T4022] status: 8000000000004020 badaddr:
>>>>> 0000000000000bb0 cause: 000000000000000c
>>>>> [   91.201573][    T1] Kernel panic - not syncing: Attempted to kill
>>>>> init! exitcode=0x0000000b
>>>>> [   91.202906][    T1] CPU: 0 PID: 1 Comm: init Not tainted
>>>>> 5.11.0-rc2-00012-g0983834a8393 #19
>>>>> [   91.204139][    T1] Call Trace:
>>>>> [   91.204849][    T1] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0
>>>>> [   91.206124][    T1] [<ffffffe00458b2d8>] show_stack+0x3a/0x46
>>>>> [   91.207240][    T1] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180
>>>>> [   91.208732][    T1] [<ffffffe00458b6a0>] panic+0x20a/0x5cc
>>>>> [   91.209890][    T1] [<ffffffe00002eea4>] do_exit+0x1846/0x1874
>>>>> [   91.211052][    T1] [<ffffffe00002efdc>] do_group_exit+0xa0/0x192
>>>>> [   91.212224][    T1] [<ffffffe000047d30>] get_signal+0x2d6/0x13dc
>>>>> [   91.213390][    T1] [<ffffffe000007eb0>] do_notify_resume+0xa8/0x912
>>>>> [   91.214567][    T1] [<ffffffe00000559c>] ret_from_exception+0x0/0x14
>>>>>
>>>>> The image is buildroot on 2020.11.x built with this script:
>>>>> https://gist.githubusercontent.com/dvyukov/1a9a01ca2189e35175a021820c95b04d/raw/5c01d755e83f4eab0d56aa7dc84af3b2d5e80423/gistfile1.txt
>>>>>
>>>>> Readelf for init shows the following (is it that [10000+d7000] address
>>>>> is not .text at all?):
>>>>>
>>>>> $ riscv64-linux-gnu-readelf --sections image/bin/busybox
>>>>> There are 27 section headers, starting at offset 0xd7f20:
>>>>>
>>>>> Section Headers:
>>>>>    [Nr] Name              Type             Address           Offset
>>>>>         Size              EntSize          Flags  Link  Info  Align
>>>>>    [ 0]                   NULL             0000000000000000  00000000
>>>>>         0000000000000000  0000000000000000           0     0     0
>>>>>    [ 1] .interp           PROGBITS         0000000000010238  00000238
>>>>>         0000000000000021  0000000000000000   A       0     0     1
>>>>>    [ 2] .note.ABI-tag     NOTE             000000000001025c  0000025c
>>>>>         0000000000000020  0000000000000000   A       0     0     4
>>>>>    [ 3] .hash             HASH             0000000000010280  00000280
>>>>>         00000000000009cc  0000000000000004   A       5     0     8
>>>>>    [ 4] .gnu.hash         GNU_HASH         0000000000010c50  00000c50
>>>>>         0000000000000ac8  0000000000000000   A       5     0     8
>>>>>    [ 5] .dynsym           DYNSYM           0000000000011718  00001718
>>>>>         00000000000021f0  0000000000000018   A       6     1     8
>>>>>    [ 6] .dynstr           STRTAB           0000000000013908  00003908
>>>>>         0000000000000c66  0000000000000000   A       0     0     1
>>>>>    [ 7] .gnu.version      VERSYM           000000000001456e  0000456e
>>>>>         00000000000002d4  0000000000000002   A       5     0     2
>>>>>    [ 8] .gnu.version_r    VERNEED          0000000000014848  00004848
>>>>>         0000000000000050  0000000000000000   A       6     2     8
>>>>>    [ 9] .rela.dyn         RELA             0000000000014898  00004898
>>>>>         00000000000000c0  0000000000000018   A       5     0     8
>>>>>    [10] .rela.plt         RELA             0000000000014958  00004958
>>>>>         00000000000020a0  0000000000000018  AI       5    22     8
>>>>>    [11] .plt              PROGBITS         0000000000016a00  00006a00
>>>>>         00000000000015e0  0000000000000010  AX       0     0     16
>>>>>    [12] .text             PROGBITS         0000000000017fe0  00007fe0
>>>>>         00000000000a3668  0000000000000000  AX       0     0     4
>>>>>    [13] .rodata           PROGBITS         00000000000bb648  000ab648
>>>>>         000000000002b076  0000000000000000   A       0     0     8
>>>>>    [14] .sdata2           PROGBITS         00000000000e66c0  000d66c0
>>>>>         0000000000000163  0000000000000000   A       0     0     8
>>>>>    [15] .eh_frame_hdr     PROGBITS         00000000000e6824  000d6824
>>>>>         0000000000000014  0000000000000000   A       0     0     4
>>>>>    [16] .eh_frame         PROGBITS         00000000000e6838  000d6838
>>>>>         000000000000002c  0000000000000000   A       0     0     8
>>>>>    [17] .preinit_array    PREINIT_ARRAY    00000000000e7df8  000d6df8
>>>>>         0000000000000008  0000000000000008  WA       0     0     1
>>>>>    [18] .init_array       INIT_ARRAY       00000000000e7e00  000d6e00
>>>>>         0000000000000008  0000000000000008  WA       0     0     8
>>>>>    [19] .fini_array       FINI_ARRAY       00000000000e7e08  000d6e08
>>>>>         0000000000000008  0000000000000008  WA       0     0     8
>>>>>    [20] .dynamic          DYNAMIC          00000000000e7e10  000d6e10
>>>>>         00000000000001f0  0000000000000010  WA       6     0     8
>>>>>    [21] .data             PROGBITS         00000000000e8000  000d7000
>>>>>         0000000000000240  0000000000000000  WA       0     0     8
>>>>>    [22] .got              PROGBITS         00000000000e8240  000d7240
>>>>>         0000000000000af8  0000000000000008  WA       0     0     8
>>>>>    [23] .sdata            PROGBITS         00000000000e8d38  000d7d38
>>>>>         0000000000000101  0000000000000000  WA       0     0     8
>>>>>    [24] .sbss             NOBITS           00000000000e8e40  000d7e39
>>>>>         000000000000017f  0000000000000000  WA       0     0     8
>>>>>    [25] .bss              NOBITS           00000000000e8fc0  000d7e39
>>>>>         00000000000005b0  0000000000000000  WA       0     0     8
>>>>>    [26] .shstrtab         STRTAB           0000000000000000  000d7e39
>>>>>         00000000000000e6  0000000000000000           0     0     1
>>>>>
>>>>>
>>>>> Before I spent more time on this, am I doing anything obviously wrong?
>>>>> Is it a known issue? Are there any fresh working recipes?
>>>>
>>>> Humm.. I tried to use 2020.05 which Tobias used here:
>>>> https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_riscv64-kernel.md#image
>>>> But there is no make qemu_riscv64_virt_defconfig target... though I
>>>> remember I tested these instructions at the time...
>>>
>>> Weird. `make qemu_riscv64_virt_defconfig` works here on buildroot
>>> 202.05, 2020.11.1 and on latest master.
>>>
>>> Do you see these in your configs/ directory?
>>>
>>> $ ls -l configs/qemu_riscv*
>>> -rw-rw-r-- 1 tklauser tklauser 673 Jan 18 15:51 configs/qemu_riscv32_virt_defconfig
>>> -rw-rw-r-- 1 tklauser tklauser 682 Jan 18 15:51 configs/qemu_riscv64_virt_defconfig
>>
>> Oh, turned out I previously checked out 2011.05 somehow...
>> Yes, 2020.05 has qemu_riscv64_virt_defconfig and I am building it now.
>> 2020.11 has the config, but init crashes (see above).
> 
> 
> 2020.05 is a bit better, but still failed in several ways.
> First, a number of user-space services including sshd still crashed.
> Second, kernel also crashed a bit later.
> And 2020.11 seems to regress even more.
> It's with the same kernel from the previous email (I did not rebuilt it).
> 
> 
> 2020.05 buildroot:
> [   90.381218][    T1] devtmpfs: mounted
> [   90.534531][    T1] Freeing unused kernel memory: 2328K
> [   90.537085][    T1] Run /sbin/init as init process
> [   91.754610][ T4022] EXT4-fs (vda): re-mounted. Opts: (null). Quota
> mode: none.
> Starting syslogd: OK
> Starting klogd: OK
> Running sysctl: OK
> Populating /dev using udev: [   99.413418][ T4051] udevd[4051]:
> starting version 3.2.9
> [  100.480500][ T4052] udevd[4052]: starting eudev-3.2.9
> [  101.904876][ T4052] udevd[4052]: unhandled signal 11 code 0x1 at
> 0x0000000000000bb0 in udevd[10000+35000]
> [  101.911401][ T4052] CPU: 1 PID: 4052 Comm: udevd Not tainted
> 5.11.0-rc2-00012-g0983834a8393 #19
> [  101.913136][ T4052] epc: 0000000000000bb0 ra : 0000003ff5921872 sp
> : 0000003fffb0c3a0
> [  101.914593][ T4052]  gp : 000000000004f908 tp : 0000003ff552b720 t0
> : 0000003ff5943160
> [  101.915740][ T4052]  t1 : 0000003ff5921bec t2 : 000000000004f450 s0
> : 0000003fffb0c440
> [  101.916872][ T4052]  s1 : 0000003ff5922000 a0 : 0000003ff5922000 a1
> : 0000003fffb0c460
> [  101.949318][ T4052]  a2 : 0000000000000001 a3 : 0000000000000002 a4
> : 0000000000000002
> [  101.950529][ T4052]  a5 : 000000000000000f a6 : 0000000000000007 a7
> : 0000000000000016
> [  101.951653][ T4052]  s2 : 0000000000000001 s3 : 0000003fffb0c460 s4
> : 0000003ff5922030
> [  101.952771][ T4052]  s5 : 0000003ff5922010 s6 : 0000000000000000 s7
> : 0000000000000000
> [  101.953878][ T4052]  s8 : 0000003ff5922004 s9 : 0000003ff5922010
> s10: 0000003ff5922008
> [  101.955016][ T4052]  s11: 0000003ff5922038 t3 : 0000000000000bb0 t4
> : 0000000000000002
> [  101.956122][ T4052]  t5 : 0000000000000002 t6 : 0000000000003d40
> [  101.957072][ T4052] status: 8000000000004020 badaddr:
> 0000000000000bb0 cause: 000000000000000c
> [  154.349233][ T4055] udevadm[4055]: unhandled signal 11 code 0x1 at
> 0x0000000000000bb0 in udevadm[10000+38000]
> [  154.351201][ T4055] CPU: 0 PID: 4055 Comm: udevadm Not tainted
> 5.11.0-rc2-00012-g0983834a8393 #19
> [  154.352227][ T4055] epc: 0000000000000bb0 ra : 0000003ff2cd3872 sp
> : 0000003fffe26a50
> [  154.353136][ T4055]  gp : 0000000000052808 tp : 0000003ff28dd720 t0
> : 0000003ff2cf5160
> [  154.354047][ T4055]  t1 : 0000003ff2cd3bec t2 : 0000000000052570 s0
> : 0000003fffe26af0
> [  154.354957][ T4055]  s1 : 0000003ff2cd4000 a0 : 0000003ff2cd4000 a1
> : 0000003fffe26b10
> [  154.355860][ T4055]  a2 : 000000000003d790 a3 : 0000000000000002 a4
> : 0000000000000002
> [  154.356739][ T4055]  a5 : 000000000000000f a6 : ffffffffffffffff a7
> : 0000000000000000
> [  154.366998][ T4055]  s2 : 0000000000000001 s3 : 0000003fffe26b10 s4
> : 0000003ff2cd4030
> [  154.372223][ T4055]  s5 : 0000003ff2cd4010 s6 : 0000000000000068 s7
> : 000000000003d000
> [  154.373192][ T4055]  s8 : 0000003ff2cd4004 s9 : 0000003ff2cd4010
> s10: 0000003ff2cd4008
> [  154.374114][ T4055]  s11: 0000003ff2cd4038 t3 : 0000000000000bb0 t4
> : 0000000000000002
> [  154.375023][ T4055]  t5 : 0000000000000002 t6 : 0000000000003d40
> [  154.375793][ T4055] status: 0000000000004020 badaddr:
> 0000000000000bb0 cause: 000000000000000c
> Segmentation fault
> udevadm settle failed
> done
> Saving random seed: OK
> Starting network: [  160.769276][ T4073] 8021q: adding VLAN 0 to HW
> filter on device eth0
> udhcpc: started, v1.31.1
> [  161.642968][ T4074] udhcpc[4074]: unhandled signal 11 code 0x1 at
> 0x0000000000000bb0 in busybox[10000+d6000]
> [  161.645275][ T4074] CPU: 0 PID: 4074 Comm: udhcpc Not tainted
> 5.11.0-rc2-00012-g0983834a8393 #19
> [  161.646515][ T4074] epc: 0000000000000bb0 ra : 0000003fd4d43872 sp
> : 0000003fffedf5c0
> [  161.661669][ T4074]  gp : 00000000000e7c90 tp : 0000003fd4d42820 t0
> : 0000003fd4d65160
> [  161.662875][ T4074]  t1 : 0000003fd4d43bec t2 : 00000000000e7960 s0
> : 0000003fffedf660
> [  161.663979][ T4074]  s1 : 0000003fd4d44000 a0 : 0000003fd4d44000 a1
> : 0000003fffedf690
> [  161.665110][ T4074]  a2 : 0000000000000019 a3 : 0000000000000002 a4
> : 0000000000000002
> [  161.666351][ T4074]  a5 : 000000000000000f a6 : fefefefefefefeff a7
> : 0000000000000040
> [  161.668642][ T4074]  s2 : 0000000000000001 s3 : 0000003fffedf690 s4
> : 0000003fd4d44030
> [  161.669785][ T4074]  s5 : 0000003fd4d44010 s6 : 00000000149d82c3 s7
> : 00000000000000fe
> [  161.670921][ T4074]  s8 : 0000003fd4d44004 s9 : 0000003fd4d44010
> s10: 0000003fd4d44008
> [  161.672150][ T4074]  s11: 0000003fd4d44038 t3 : 0000000000000bb0 t4
> : 0000000000000002
> [  161.673355][ T4074]  t5 : 0000000000000002 t6 : 0000000000003d40
> [  161.674303][ T4074] status: 8000000000004020 badaddr:
> 0000000000000bb0 cause: 000000000000000c
> FAIL
> Starting dhcpcd...
> [  162.771471][ T4077] dhcpcd[4077]: unhandled signal 11 code 0x1 at
> 0x0000000000000bb0 in dhcpcd[10000+39000]
> [  162.773414][ T4077] CPU: 0 PID: 4077 Comm: dhcpcd Not tainted
> 5.11.0-rc2-00012-g0983834a8393 #19
> [  162.774462][ T4077] epc: 0000000000000bb0 ra : 0000003fe6d12872 sp
> : 0000003fff8527e0
> [  162.775366][ T4077]  gp : 000000000004adb8 tp : 0000003fe6d11250 t0
> : 0000003fe6d34160
> [  162.776274][ T4077]  t1 : 0000003fe6d12bec t2 : 0000000000049a00 s0
> : 0000003fff852880
> [  162.777167][ T4077]  s1 : 0000003fe6d13000 a0 : 0000003fe6d13000 a1
> : 0000003fff8528a0
> [  162.779363][ T4077]  a2 : 0000000000000004 a3 : 0000000000000002 a4
> : 0000000000000002
> [  162.780279][ T4077]  a5 : 000000000000000f a6 : 7efefefefefefeff a7
> : fffffffffffff000
> [  162.781194][ T4077]  s2 : 0000000000000001 s3 : 0000003fff8528a0 s4
> : 0000003fe6d13030
> [  162.782106][ T4077]  s5 : 0000003fe6d13010 s6 : 0000000000000000 s7
> : 0000000000000000
> [  162.783015][ T4077]  s8 : 0000003fe6d13004 s9 : 0000003fe6d13010
> s10: 0000003fe6d13008
> [  162.783940][ T4077]  s11: 0000003fe6d13038 t3 : 0000000000000bb0 t4
> : 0000000000000002
> [  162.784853][ T4077]  t5 : 0000000000000002 t6 : 0000000000003d40
> [  162.785618][ T4077] status: 8000000000006020 badaddr:
> 0000000000000bb0 cause: 000000000000000c
> Segmentation fault
> [  164.074891][ T4079] ssh-keygen[4079]: unhandled signal 11 code 0x1
> at 0x0000000000000bb0 in ssh-keygen[2ac3c68000+63000]
> [  164.076916][ T4079] CPU: 1 PID: 4079 Comm: ssh-keygen Not tainted
> 5.11.0-rc2-00012-g0983834a8393 #19
> [  164.096635][ T4079] epc: 0000000000000bb0 ra : 0000003ff6899872 sp
> : 0000003fffed1330
> [  164.099233][ T4079]  gp : 0000002ac3ccd448 tp : 0000003ff6435cd0 t0
> : 0000003ff6897000
> [  164.100457][ T4079]  t1 : 0000003ff6899bec t2 : 0000003ff6891940 s0
> : 0000003fffed13d0
> [  164.101578][ T4079]  s1 : 0000003ff689a000 a0 : 0000003ff689a000 a1
> : 0000003fffed13f8
> [  164.102914][ T4079]  a2 : 0000000000000000 a3 : 0000000000000001 a4
> : 0000000000000001
> [  164.104058][ T4079]  a5 : 000000000000000f a6 : 0000000000000000 a7
> : 00000000000000ac
> [  164.105150][ T4079]  s2 : 0000000000000000 s3 : 0000003fffed13f8 s4
> : 0000003ff689a020
> [  164.106241][ T4079]  s5 : 0000003ff689a000 s6 : 0000003fd1861830 s7
> : ffffffffffffffff
> [  164.113694][ T4079]  s8 : 0000003ff689a004 s9 : 0000003ff689a010
> s10: 0000003ff689a008
> [  164.114869][ T4079]  s11: 0000003ff689a028 t3 : 0000000000000bb0 t4
> : 0000000000000002
> [  164.115972][ T4079]  t5 : 0000000000000002 t6 : 0000000000003d40
> [  164.128360][ T4079] status: 8000000000004020 badaddr:
> 0000000000000bb0 cause: 000000000000000c
> Segmentation fault
> Starting sshd: [  164.872315][ T4080] sshd[4080]: unhandled signal 11
> code 0x1 at 0x0000000000000bb0 in sshd[2ac7ea7000+a4000]
> [  164.874297][ T4080] CPU: 1 PID: 4080 Comm: sshd Not tainted
> 5.11.0-rc2-00012-g0983834a8393 #19
> [  164.875331][ T4080] epc: 0000000000000bb0 ra : 0000003ff2222872 sp
> : 0000003fffbea300
> [  164.876230][ T4080]  gp : 0000002ac7f4f9d0 tp : 0000003ff1dbecd0 t0
> : 0000003ff2220000
> [  164.877146][ T4080]  t1 : 0000003ff2222bec t2 : 0000003ff221a940 s0
> : 0000003fffbea3a0
> [  164.892174][ T4080]  s1 : 0000003ff2223000 a0 : 0000003ff2223000 a1
> : 0000003fffbea3c8
> [  164.893137][ T4080]  a2 : 0000000000000000 a3 : 0000000000000001 a4
> : 0000000000000001
> [  164.894065][ T4080]  a5 : 000000000000000f a6 : 0000000000000000 a7
> : 00000000000000ac
> [  164.895013][ T4080]  s2 : 0000000000000000 s3 : 0000003fffbea3c8 s4
> : 0000003ff2223020
> [  164.895947][ T4080]  s5 : 0000003ff2223000 s6 : 0000003fd1861830 s7
> : ffffffffffffffff
> [  164.896881][ T4080]  s8 : 0000003ff2223004 s9 : 0000003ff2223010
> s10: 0000003ff2223008
> [  164.905684][ T4080]  s11: 0000003ff2223028 t3 : 0000000000000bb0 t4
> : 0000000000000002
> [  164.906679][ T4080]  t5 : 0000000000000002 t6 : 0000000000003d40
> [  164.908565][ T4080] status: 8000000000004020 badaddr:
> 0000000000000bb0 cause: 000000000000000c
> Segmentation fault
> OK
> syzkaller
> syzkaller login: [  167.973016][ T4082] ------------[ cut here ]------------
> [  167.975887][ T4082] virt_to_phys used for non-linear address:
> 0000000059ffc026 (0xffffffd0158d105e)
> [  167.979939][ T4082] WARNING: CPU: 0 PID: 4082 at
> arch/riscv/mm/physaddr.c:16 __virt_to_phys+0x74/0x78
> [  167.988658][ T4082] Modules linked in:
> [  167.989781][ T4082] CPU: 0 PID: 4082 Comm: getty Not tainted
> 5.11.0-rc2-00012-g0983834a8393 #19
> [  167.991063][ T4082] epc: ffffffe000011164 ra : ffffffe000011164 sp
> : ffffffe01354fb10
> [  167.992243][ T4082]  gp : ffffffe006234420 tp : ffffffe009c8ad80 t0
> : ffffffe006cafb67
> [  167.993384][ T4082]  t1 : 0000000000000001 t2 : 0000000000000000 s0
> : ffffffe01354fb40
> [  167.994531][ T4082]  s1 : fffffff0158d105e a0 : 000000000000004f a1
> : 00000000000f0000
> [  167.995690][ T4082]  a2 : 0000000000000002 a3 : ffffffe0000d1a30 a4
> : 763e2d90a60ec500
> [  167.996803][ T4082]  a5 : 763e2d90a60ec500 a6 : 0000000000f00000 a7
> : ffffffe00009481c
> [  167.999690][ T4082]  s2 : ffffffd0158d105e s3 : 0000001fffffffff s4
> : 0000000000000001
> [  168.000898][ T4082]  s5 : ffffffd0158d105f s6 : ffffffd0158d3260 s7
> : 0000003fff81eac8
> [  168.002093][ T4082]  s8 : ffffffd0158d105e s9 : 0000000000000001
> s10: 0000000000000000
> [  168.003226][ T4082]  s11: 0000000000000000 t3 : 763e2d90a60ec500 t4
> : ffffffc4026a9efd
> [  168.004361][ T4082]  t5 : ffffffc4026a9eff t6 : ffffffe01354f7f8
> [  168.005328][ T4082] status: 0000000000000120 badaddr:
> 0000000000000000 cause: 0000000000000003
> [  168.006756][ T4082] Kernel panic - not syncing: panic_on_warn set ...
> [  168.008056][ T4082] CPU: 0 PID: 4082 Comm: getty Not tainted
> 5.11.0-rc2-00012-g0983834a8393 #19
> [  168.009301][ T4082] Call Trace:
> [  168.009969][ T4082] [<ffffffe0000095c0>] walk_stackframe+0x0/0x1d0
> [  168.011166][ T4082] [<ffffffe00458b2d8>] show_stack+0x3a/0x46
> [  168.012215][ T4082] [<ffffffe0045a5b72>] dump_stack+0x11c/0x180
> [  168.013262][ T4082] [<ffffffe00458b6a0>] panic+0x20a/0x5cc
> [  168.014264][ T4082] [<ffffffe000024210>] __warn+0x110/0x20a
> [  168.015285][ T4082] [<ffffffe001759424>] report_bug+0x156/0x200
> [  168.016324][ T4082] [<ffffffe0000093f6>] do_trap_break+0xa6/0x152
> [  168.017431][ T4082] [<ffffffe00000559c>] ret_from_exception+0x0/0x14
> [  168.018560][ T4082] [<ffffffe0018c97bc>] n_tty_read+0x908/0x115a
> [  168.020124][ T4082] SMP: stopping secondary CPUs
> [  168.022087][ T4082] Rebooting in 86400 seconds..
> 

I was fixing KASAN support for my sv48 patchset so I took a look at your 
issue: I built a kernel on top of the branch riscv/fixes using 
https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config 
and Buildroot 2020.11. I have the warnings regarding the use of 
__virt_to_phys on wrong addresses (but that's normal since this function 
is used in virt_addr_valid) but not the segfaults you describe.

Hope that helps,

Alex


> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
       [not found]                   ` <CACT4Y+adSjve7bXRPh5UybCQx6ubOUu5RbwuT620wdcxHzVYJg@mail.gmail.com>
@ 2021-02-16 11:17                     ` Dmitry Vyukov
  2021-02-16 11:25                       ` Dmitry Vyukov
  2021-02-16 17:35                       ` Tobias Klauser
  0 siblings, 2 replies; 27+ messages in thread
From: Dmitry Vyukov @ 2021-02-16 11:17 UTC (permalink / raw)
  To: Alex Ghiti
  Cc: Tobias Klauser, Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML,
	nylon7, syzkaller, Andreas Schwab, Paul Walmsley, linux-riscv

On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> > I was fixing KASAN support for my sv48 patchset so I took a look at your
> > issue: I built a kernel on top of the branch riscv/fixes using
> > https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config
> > and Buildroot 2020.11. I have the warnings regarding the use of
> > __virt_to_phys on wrong addresses (but that's normal since this function
> > is used in virt_addr_valid) but not the segfaults you describe.
>
> Hi Alex,
>
> Let me try to rebuild buildroot image. Maybe there was something wrong
> with my build, though, I did 'make clean' before doing. But at the
> same time it worked back in June...
>
> Re WARNINGs, they indicate kernel bugs. I am working on setting up a
> syzbot instance on riscv. If there a WARNING during boot then the
> kernel will be marked as broken. No further testing will happen.
> Is it a mis-use of WARN_ON? If so, could anybody please remove it or
> replace it with pr_err.


Hi,

I've localized one issue with riscv/KASAN:
KASAN breaks VDSO and that's I think the root cause of weird faults I
saw earlier. The following patch fixes it.
Could somebody please upstream this fix? I don't know how to add/run
tests for this.
Thanks

diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
index 0cfd6da784f84..cf3a383c1799d 100644
--- a/arch/riscv/kernel/vdso/Makefile
+++ b/arch/riscv/kernel/vdso/Makefile
@@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
 # Disable gcov profiling for VDSO code
 GCOV_PROFILE := n
 KCOV_INSTRUMENT := n
+KASAN_SANITIZE := n

 # Force dependency
 $(obj)/vdso.o: $(obj)/vdso.so

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2021-02-16 11:17                     ` Dmitry Vyukov
@ 2021-02-16 11:25                       ` Dmitry Vyukov
  2021-02-16 13:45                         ` Dmitry Vyukov
  2021-02-16 20:42                         ` Alex Ghiti
  2021-02-16 17:35                       ` Tobias Klauser
  1 sibling, 2 replies; 27+ messages in thread
From: Dmitry Vyukov @ 2021-02-16 11:25 UTC (permalink / raw)
  To: Alex Ghiti
  Cc: Tobias Klauser, Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML,
	nylon7, syzkaller, Andreas Schwab, Paul Walmsley, linux-riscv

On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> > > I was fixing KASAN support for my sv48 patchset so I took a look at your
> > > issue: I built a kernel on top of the branch riscv/fixes using
> > > https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config
> > > and Buildroot 2020.11. I have the warnings regarding the use of
> > > __virt_to_phys on wrong addresses (but that's normal since this function
> > > is used in virt_addr_valid) but not the segfaults you describe.
> >
> > Hi Alex,
> >
> > Let me try to rebuild buildroot image. Maybe there was something wrong
> > with my build, though, I did 'make clean' before doing. But at the
> > same time it worked back in June...
> >
> > Re WARNINGs, they indicate kernel bugs. I am working on setting up a
> > syzbot instance on riscv. If there a WARNING during boot then the
> > kernel will be marked as broken. No further testing will happen.
> > Is it a mis-use of WARN_ON? If so, could anybody please remove it or
> > replace it with pr_err.
>
>
> Hi,
>
> I've localized one issue with riscv/KASAN:
> KASAN breaks VDSO and that's I think the root cause of weird faults I
> saw earlier. The following patch fixes it.
> Could somebody please upstream this fix? I don't know how to add/run
> tests for this.
> Thanks
>
> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
> index 0cfd6da784f84..cf3a383c1799d 100644
> --- a/arch/riscv/kernel/vdso/Makefile
> +++ b/arch/riscv/kernel/vdso/Makefile
> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
>  # Disable gcov profiling for VDSO code
>  GCOV_PROFILE := n
>  KCOV_INSTRUMENT := n
> +KASAN_SANITIZE := n
>
>  # Force dependency
>  $(obj)/vdso.o: $(obj)/vdso.so



Second issue I am seeing seems to be related to text segment size.
I check out v5.11 and use this config:
https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178

Then trying to boot it using:
QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3)
$ qemu-system-riscv64 -machine virt -smp 2 -m 4G ...

It shows no output from the kernel whatsoever, even though I have
earlycon and output shows very early with other configs.
Kernel boots fine with defconfig and other smaller configs.

If I enable KASAN_OUTLINE and CC_OPTIMIZE_FOR_SIZE, then this config
also boots fine. Both of these options significantly reduce kernel
size. However, I can also boot the kernel without these 2 configs, if
I disable a whole lot of subsystem configs. This makes me think that
there is an issue related to kernel size somewhere in
qemu/bootloader/kernel bootstrap code.
Does it make sense to you? Can somebody reproduce what I am seeing?

Thanks

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2021-02-16 11:25                       ` Dmitry Vyukov
@ 2021-02-16 13:45                         ` Dmitry Vyukov
  2021-02-16 20:42                         ` Alex Ghiti
  1 sibling, 0 replies; 27+ messages in thread
From: Dmitry Vyukov @ 2021-02-16 13:45 UTC (permalink / raw)
  To: Alex Ghiti
  Cc: Tobias Klauser, Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML,
	nylon7, syzkaller, Andreas Schwab, Paul Walmsley, linux-riscv

On Tue, Feb 16, 2021 at 12:25 PM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> > > > I was fixing KASAN support for my sv48 patchset so I took a look at your
> > > > issue: I built a kernel on top of the branch riscv/fixes using
> > > > https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config
> > > > and Buildroot 2020.11. I have the warnings regarding the use of
> > > > __virt_to_phys on wrong addresses (but that's normal since this function
> > > > is used in virt_addr_valid) but not the segfaults you describe.
> > >
> > > Hi Alex,
> > >
> > > Let me try to rebuild buildroot image. Maybe there was something wrong
> > > with my build, though, I did 'make clean' before doing. But at the
> > > same time it worked back in June...
> > >
> > > Re WARNINGs, they indicate kernel bugs. I am working on setting up a
> > > syzbot instance on riscv. If there a WARNING during boot then the
> > > kernel will be marked as broken. No further testing will happen.
> > > Is it a mis-use of WARN_ON? If so, could anybody please remove it or
> > > replace it with pr_err.
> >
> >
> > Hi,
> >
> > I've localized one issue with riscv/KASAN:
> > KASAN breaks VDSO and that's I think the root cause of weird faults I
> > saw earlier. The following patch fixes it.
> > Could somebody please upstream this fix? I don't know how to add/run
> > tests for this.
> > Thanks
> >
> > diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
> > index 0cfd6da784f84..cf3a383c1799d 100644
> > --- a/arch/riscv/kernel/vdso/Makefile
> > +++ b/arch/riscv/kernel/vdso/Makefile
> > @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
> >  # Disable gcov profiling for VDSO code
> >  GCOV_PROFILE := n
> >  KCOV_INSTRUMENT := n
> > +KASAN_SANITIZE := n
> >
> >  # Force dependency
> >  $(obj)/vdso.o: $(obj)/vdso.so
>
>
>
> Second issue I am seeing seems to be related to text segment size.
> I check out v5.11 and use this config:
> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178
>
> Then trying to boot it using:
> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3)
> $ qemu-system-riscv64 -machine virt -smp 2 -m 4G ...
>
> It shows no output from the kernel whatsoever, even though I have
> earlycon and output shows very early with other configs.
> Kernel boots fine with defconfig and other smaller configs.
>
> If I enable KASAN_OUTLINE and CC_OPTIMIZE_FOR_SIZE, then this config
> also boots fine. Both of these options significantly reduce kernel
> size. However, I can also boot the kernel without these 2 configs, if
> I disable a whole lot of subsystem configs. This makes me think that
> there is an issue related to kernel size somewhere in
> qemu/bootloader/kernel bootstrap code.
> Does it make sense to you? Can somebody reproduce what I am seeing?



I am debugging the next issue with VDSO. clock_gettime is broken in
some weird way.
syzkaller has this function:

static uint64 current_time_ms(void)
{
        struct timespec ts;
        if (clock_gettime(CLOCK_MONOTONIC, &ts))
        //if (syscall(SYS_clock_gettime, CLOCK_MONOTONIC, &ts))
                fail("clock_gettime failed");
        return (uint64)ts.tv_sec * 1000 + (uint64)ts.tv_nsec / 1000000;
}

When using clock_gettime it producer some nonsense that breaks all
timeouts (in particular monotonic time goes backwards):
pid=4343 now=836038064151457975
pid=4343 now=836038064151457975
pid=4343 now=836038064151457970
pid=4343 now=836038064151457971

When I tested it calling real syscall, it works as expected:
pid=4876 now=2493379
pid=4876 now=2493392
pid=4876 now=2493395
pid=4876 now=2493409
pid=4876 now=2493414

Is it a known issue? Any ideas?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2021-02-16 11:17                     ` Dmitry Vyukov
  2021-02-16 11:25                       ` Dmitry Vyukov
@ 2021-02-16 17:35                       ` Tobias Klauser
  1 sibling, 0 replies; 27+ messages in thread
From: Tobias Klauser @ 2021-02-16 17:35 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Alex Ghiti, Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7,
	syzkaller, Andreas Schwab, Paul Walmsley, linux-riscv

On 2021-02-16 at 12:17:30 +0100, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> > > I was fixing KASAN support for my sv48 patchset so I took a look at your
> > > issue: I built a kernel on top of the branch riscv/fixes using
> > > https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config
> > > and Buildroot 2020.11. I have the warnings regarding the use of
> > > __virt_to_phys on wrong addresses (but that's normal since this function
> > > is used in virt_addr_valid) but not the segfaults you describe.
> >
> > Hi Alex,
> >
> > Let me try to rebuild buildroot image. Maybe there was something wrong
> > with my build, though, I did 'make clean' before doing. But at the
> > same time it worked back in June...
> >
> > Re WARNINGs, they indicate kernel bugs. I am working on setting up a
> > syzbot instance on riscv. If there a WARNING during boot then the
> > kernel will be marked as broken. No further testing will happen.
> > Is it a mis-use of WARN_ON? If so, could anybody please remove it or
> > replace it with pr_err.
> 
> 
> Hi,
> 
> I've localized one issue with riscv/KASAN:
> KASAN breaks VDSO and that's I think the root cause of weird faults I
> saw earlier. The following patch fixes it.
> Could somebody please upstream this fix? I don't know how to add/run
> tests for this.

Thanks. I've tested the fix locally using vDSO selftests and sent the
fix upstream [1]

[1] https://lore.kernel.org/linux-riscv/20210216173305.2500-1-tklauser@distanz.ch/T/#u

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2021-02-16 11:25                       ` Dmitry Vyukov
  2021-02-16 13:45                         ` Dmitry Vyukov
@ 2021-02-16 20:42                         ` Alex Ghiti
  2021-02-17  4:42                           ` Dmitry Vyukov
  1 sibling, 1 reply; 27+ messages in thread
From: Alex Ghiti @ 2021-02-16 20:42 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller,
	Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv

Hi Dmitry,

Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit :
> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote:
>>
>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your
>>>> issue: I built a kernel on top of the branch riscv/fixes using
>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config
>>>> and Buildroot 2020.11. I have the warnings regarding the use of
>>>> __virt_to_phys on wrong addresses (but that's normal since this function
>>>> is used in virt_addr_valid) but not the segfaults you describe.
>>>
>>> Hi Alex,
>>>
>>> Let me try to rebuild buildroot image. Maybe there was something wrong
>>> with my build, though, I did 'make clean' before doing. But at the
>>> same time it worked back in June...
>>>
>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a
>>> syzbot instance on riscv. If there a WARNING during boot then the
>>> kernel will be marked as broken. No further testing will happen.
>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or
>>> replace it with pr_err.
>>
>>
>> Hi,
>>
>> I've localized one issue with riscv/KASAN:
>> KASAN breaks VDSO and that's I think the root cause of weird faults I
>> saw earlier. The following patch fixes it.
>> Could somebody please upstream this fix? I don't know how to add/run
>> tests for this.
>> Thanks
>>
>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
>> index 0cfd6da784f84..cf3a383c1799d 100644
>> --- a/arch/riscv/kernel/vdso/Makefile
>> +++ b/arch/riscv/kernel/vdso/Makefile
>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
>>   # Disable gcov profiling for VDSO code
>>   GCOV_PROFILE := n
>>   KCOV_INSTRUMENT := n
>> +KASAN_SANITIZE := n
>>
>>   # Force dependency
>>   $(obj)/vdso.o: $(obj)/vdso.so

What's weird is that I don't have any issue without this patch with the 
following config whereas it indeed seems required for KASAN. But when 
looking at the segfaults you got earlier, the segfault address is 0xbb0 
and the cause is an instruction page fault: this address is the PLT base 
address in vdso.so and an instruction page fault would mean that someone 
tried to jump at this address, which is weird. At first sight, that does 
not seem related to your patch above, but clearly I may be wrong.

Tobias, did you observe the same segfaults as Dmitry ?

> 
> 
> 
> Second issue I am seeing seems to be related to text segment size.
> I check out v5.11 and use this config:
> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178

This config gave my laptop a hard time ! Finally I was able to boot 
correctly to userspace, but I realized I used my sv48 branch...Either I 
fixed your issue along the way or I can't reproduce it, I'll give it a 
try tomorrow.

> 
> Then trying to boot it using:
> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3)
> $ qemu-system-riscv64 -machine virt -smp 2 -m 4G ...
> 
> It shows no output from the kernel whatsoever, even though I have
> earlycon and output shows very early with other configs.
> Kernel boots fine with defconfig and other smaller configs.
> 
> If I enable KASAN_OUTLINE and CC_OPTIMIZE_FOR_SIZE, then this config
> also boots fine. Both of these options significantly reduce kernel
> size. However, I can also boot the kernel without these 2 configs, if
> I disable a whole lot of subsystem configs. This makes me think that
> there is an issue related to kernel size somewhere in
> qemu/bootloader/kernel bootstrap code.
> Does it make sense to you? Can somebody reproduce what I am seeing? >

I did not bring any answer to your question, but at least you know I'm 
working on it, I'll keep you posted.

Thanks for taking the time to setup syzkaller.

Alex

> Thanks
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2021-02-16 20:42                         ` Alex Ghiti
@ 2021-02-17  4:42                           ` Dmitry Vyukov
  2021-02-17 16:36                             ` Alex Ghiti
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Vyukov @ 2021-02-17  4:42 UTC (permalink / raw)
  To: Alex Ghiti
  Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller,
	Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv

On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote:
>
> Hi Dmitry,
>
> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit :
> > On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> >>
> >> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> >>>> I was fixing KASAN support for my sv48 patchset so I took a look at your
> >>>> issue: I built a kernel on top of the branch riscv/fixes using
> >>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config
> >>>> and Buildroot 2020.11. I have the warnings regarding the use of
> >>>> __virt_to_phys on wrong addresses (but that's normal since this function
> >>>> is used in virt_addr_valid) but not the segfaults you describe.
> >>>
> >>> Hi Alex,
> >>>
> >>> Let me try to rebuild buildroot image. Maybe there was something wrong
> >>> with my build, though, I did 'make clean' before doing. But at the
> >>> same time it worked back in June...
> >>>
> >>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a
> >>> syzbot instance on riscv. If there a WARNING during boot then the
> >>> kernel will be marked as broken. No further testing will happen.
> >>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or
> >>> replace it with pr_err.
> >>
> >>
> >> Hi,
> >>
> >> I've localized one issue with riscv/KASAN:
> >> KASAN breaks VDSO and that's I think the root cause of weird faults I
> >> saw earlier. The following patch fixes it.
> >> Could somebody please upstream this fix? I don't know how to add/run
> >> tests for this.
> >> Thanks
> >>
> >> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
> >> index 0cfd6da784f84..cf3a383c1799d 100644
> >> --- a/arch/riscv/kernel/vdso/Makefile
> >> +++ b/arch/riscv/kernel/vdso/Makefile
> >> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
> >>   # Disable gcov profiling for VDSO code
> >>   GCOV_PROFILE := n
> >>   KCOV_INSTRUMENT := n
> >> +KASAN_SANITIZE := n
> >>
> >>   # Force dependency
> >>   $(obj)/vdso.o: $(obj)/vdso.so
>
> What's weird is that I don't have any issue without this patch with the
> following config whereas it indeed seems required for KASAN. But when
> looking at the segfaults you got earlier, the segfault address is 0xbb0
> and the cause is an instruction page fault: this address is the PLT base
> address in vdso.so and an instruction page fault would mean that someone
> tried to jump at this address, which is weird. At first sight, that does
> not seem related to your patch above, but clearly I may be wrong.
>
> Tobias, did you observe the same segfaults as Dmitry ?


I noticed that not all buildroot images use VDSO, it seems to be
dependent on libc settings (at least I think I changed it in the
past).
I also booted an image completely successfully including dhcpd/sshd
start, but then my executable crashed in clock_gettime. The executable
was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static"
(10.2.1).


> > Second issue I am seeing seems to be related to text segment size.
> > I check out v5.11 and use this config:
> > https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178
>
> This config gave my laptop a hard time ! Finally I was able to boot
> correctly to userspace, but I realized I used my sv48 branch...Either I
> fixed your issue along the way or I can't reproduce it, I'll give it a
> try tomorrow.

Where is your branch? I could also test in my setup on your branch.


> > Then trying to boot it using:
> > QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3)
> > $ qemu-system-riscv64 -machine virt -smp 2 -m 4G ...
> >
> > It shows no output from the kernel whatsoever, even though I have
> > earlycon and output shows very early with other configs.
> > Kernel boots fine with defconfig and other smaller configs.
> >
> > If I enable KASAN_OUTLINE and CC_OPTIMIZE_FOR_SIZE, then this config
> > also boots fine. Both of these options significantly reduce kernel
> > size. However, I can also boot the kernel without these 2 configs, if
> > I disable a whole lot of subsystem configs. This makes me think that
> > there is an issue related to kernel size somewhere in
> > qemu/bootloader/kernel bootstrap code.
> > Does it make sense to you? Can somebody reproduce what I am seeing? >
>
> I did not bring any answer to your question, but at least you know I'm
> working on it, I'll keep you posted.
>
> Thanks for taking the time to setup syzkaller.
>
> Alex
>
> > Thanks
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
> >

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2021-02-17  4:42                           ` Dmitry Vyukov
@ 2021-02-17 16:36                             ` Alex Ghiti
  2021-02-17 17:34                               ` Dmitry Vyukov
  0 siblings, 1 reply; 27+ messages in thread
From: Alex Ghiti @ 2021-02-17 16:36 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller,
	Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv

Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit :
> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote:
>>
>> Hi Dmitry,
>>
>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit :
>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>
>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your
>>>>>> issue: I built a kernel on top of the branch riscv/fixes using
>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config
>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of
>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function
>>>>>> is used in virt_addr_valid) but not the segfaults you describe.
>>>>>
>>>>> Hi Alex,
>>>>>
>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong
>>>>> with my build, though, I did 'make clean' before doing. But at the
>>>>> same time it worked back in June...
>>>>>
>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a
>>>>> syzbot instance on riscv. If there a WARNING during boot then the
>>>>> kernel will be marked as broken. No further testing will happen.
>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or
>>>>> replace it with pr_err.
>>>>
>>>>
>>>> Hi,
>>>>
>>>> I've localized one issue with riscv/KASAN:
>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I
>>>> saw earlier. The following patch fixes it.
>>>> Could somebody please upstream this fix? I don't know how to add/run
>>>> tests for this.
>>>> Thanks
>>>>
>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
>>>> index 0cfd6da784f84..cf3a383c1799d 100644
>>>> --- a/arch/riscv/kernel/vdso/Makefile
>>>> +++ b/arch/riscv/kernel/vdso/Makefile
>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
>>>>    # Disable gcov profiling for VDSO code
>>>>    GCOV_PROFILE := n
>>>>    KCOV_INSTRUMENT := n
>>>> +KASAN_SANITIZE := n
>>>>
>>>>    # Force dependency
>>>>    $(obj)/vdso.o: $(obj)/vdso.so
>>
>> What's weird is that I don't have any issue without this patch with the
>> following config whereas it indeed seems required for KASAN. But when
>> looking at the segfaults you got earlier, the segfault address is 0xbb0
>> and the cause is an instruction page fault: this address is the PLT base
>> address in vdso.so and an instruction page fault would mean that someone
>> tried to jump at this address, which is weird. At first sight, that does
>> not seem related to your patch above, but clearly I may be wrong.
>>
>> Tobias, did you observe the same segfaults as Dmitry ?
> 
> 
> I noticed that not all buildroot images use VDSO, it seems to be
> dependent on libc settings (at least I think I changed it in the
> past).

Ok, I used uClibc but then when using glibc, I have the same segfaults, 
only when KASAN is enabled. And your patch fixes the problem. I will try 
to take a look later to better understand the problem.

> I also booted an image completely successfully including dhcpd/sshd
> start, but then my executable crashed in clock_gettime. The executable
> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static"
> (10.2.1).
> 
> 
>>> Second issue I am seeing seems to be related to text segment size.
>>> I check out v5.11 and use this config:
>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178
>>
>> This config gave my laptop a hard time ! Finally I was able to boot
>> correctly to userspace, but I realized I used my sv48 branch...Either I
>> fixed your issue along the way or I can't reproduce it, I'll give it a
>> try tomorrow.
> 
> Where is your branch? I could also test in my setup on your branch.
> 

You can find my branch int/alex/riscv_kernel_end_of_address_space_v2 
here: https://github.com/AlexGhiti/riscv-linux.git

Thanks,

> 
>>> Then trying to boot it using:
>>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3)
>>> $ qemu-system-riscv64 -machine virt -smp 2 -m 4G ...
>>>
>>> It shows no output from the kernel whatsoever, even though I have
>>> earlycon and output shows very early with other configs.
>>> Kernel boots fine with defconfig and other smaller configs.
>>>
>>> If I enable KASAN_OUTLINE and CC_OPTIMIZE_FOR_SIZE, then this config
>>> also boots fine. Both of these options significantly reduce kernel
>>> size. However, I can also boot the kernel without these 2 configs, if
>>> I disable a whole lot of subsystem configs. This makes me think that
>>> there is an issue related to kernel size somewhere in
>>> qemu/bootloader/kernel bootstrap code.
>>> Does it make sense to you? Can somebody reproduce what I am seeing? >
>>
>> I did not bring any answer to your question, but at least you know I'm
>> working on it, I'll keep you posted.
>>
>> Thanks for taking the time to setup syzkaller.
>>
>> Alex
>>
>>> Thanks
>>>
>>> _______________________________________________
>>> linux-riscv mailing list
>>> linux-riscv@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2021-02-17 16:36                             ` Alex Ghiti
@ 2021-02-17 17:34                               ` Dmitry Vyukov
  2021-02-18  7:54                                 ` Alex Ghiti
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Vyukov @ 2021-02-17 17:34 UTC (permalink / raw)
  To: Alex Ghiti
  Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller,
	Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv

On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote:
>
> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit :
> > On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote:
> >>
> >> Hi Dmitry,
> >>
> >> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit :
> >>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> >>>>
> >>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> >>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your
> >>>>>> issue: I built a kernel on top of the branch riscv/fixes using
> >>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config
> >>>>>> and Buildroot 2020.11. I have the warnings regarding the use of
> >>>>>> __virt_to_phys on wrong addresses (but that's normal since this function
> >>>>>> is used in virt_addr_valid) but not the segfaults you describe.
> >>>>>
> >>>>> Hi Alex,
> >>>>>
> >>>>> Let me try to rebuild buildroot image. Maybe there was something wrong
> >>>>> with my build, though, I did 'make clean' before doing. But at the
> >>>>> same time it worked back in June...
> >>>>>
> >>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a
> >>>>> syzbot instance on riscv. If there a WARNING during boot then the
> >>>>> kernel will be marked as broken. No further testing will happen.
> >>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or
> >>>>> replace it with pr_err.
> >>>>
> >>>>
> >>>> Hi,
> >>>>
> >>>> I've localized one issue with riscv/KASAN:
> >>>> KASAN breaks VDSO and that's I think the root cause of weird faults I
> >>>> saw earlier. The following patch fixes it.
> >>>> Could somebody please upstream this fix? I don't know how to add/run
> >>>> tests for this.
> >>>> Thanks
> >>>>
> >>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
> >>>> index 0cfd6da784f84..cf3a383c1799d 100644
> >>>> --- a/arch/riscv/kernel/vdso/Makefile
> >>>> +++ b/arch/riscv/kernel/vdso/Makefile
> >>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
> >>>>    # Disable gcov profiling for VDSO code
> >>>>    GCOV_PROFILE := n
> >>>>    KCOV_INSTRUMENT := n
> >>>> +KASAN_SANITIZE := n
> >>>>
> >>>>    # Force dependency
> >>>>    $(obj)/vdso.o: $(obj)/vdso.so
> >>
> >> What's weird is that I don't have any issue without this patch with the
> >> following config whereas it indeed seems required for KASAN. But when
> >> looking at the segfaults you got earlier, the segfault address is 0xbb0
> >> and the cause is an instruction page fault: this address is the PLT base
> >> address in vdso.so and an instruction page fault would mean that someone
> >> tried to jump at this address, which is weird. At first sight, that does
> >> not seem related to your patch above, but clearly I may be wrong.
> >>
> >> Tobias, did you observe the same segfaults as Dmitry ?
> >
> >
> > I noticed that not all buildroot images use VDSO, it seems to be
> > dependent on libc settings (at least I think I changed it in the
> > past).
>
> Ok, I used uClibc but then when using glibc, I have the same segfaults,
> only when KASAN is enabled. And your patch fixes the problem. I will try
> to take a look later to better understand the problem.
>
> > I also booted an image completely successfully including dhcpd/sshd
> > start, but then my executable crashed in clock_gettime. The executable
> > was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static"
> > (10.2.1).
> >
> >
> >>> Second issue I am seeing seems to be related to text segment size.
> >>> I check out v5.11 and use this config:
> >>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178
> >>
> >> This config gave my laptop a hard time ! Finally I was able to boot
> >> correctly to userspace, but I realized I used my sv48 branch...Either I
> >> fixed your issue along the way or I can't reproduce it, I'll give it a
> >> try tomorrow.
> >
> > Where is your branch? I could also test in my setup on your branch.
> >
>
> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2
> here: https://github.com/AlexGhiti/riscv-linux.git

No, it does not work for me.

Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD,
int/alex/riscv_kernel_end_of_address_space_v2)
Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt

riscv64-linux-gnu-gcc -v
gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1)

qemu-system-riscv64 --version
QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3)

qemu-system-riscv64 \
-machine virt -smp 2 -m 2G \
-device virtio-blk-device,drive=hd0 \
-drive file=image-riscv64,if=none,format=raw,id=hd0 \
-kernel arch/riscv/boot/Image \
-nographic \
-device virtio-rng-device,rng=rng0 -object
rng-random,filename=/dev/urandom,id=rng0 \
-netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
virtio-net-device,netdev=net0 \
-append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
panic_on_warn=1 panic=86400 earlycon"

OpenSBI v0.8
   ____                    _____ ____ _____
  / __ \                  / ____|  _ \_   _|
 | |  | |_ __   ___ _ __ | (___ | |_) || |
 | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
 | |__| | |_) |  __/ | | |____) | |_) || |_
  \____/| .__/ \___|_| |_|_____/|____/_____|
        | |
        |_|

Platform Name       : riscv-virtio,qemu
Platform Features   : timer,mfdeleg
Platform HART Count : 2
Boot HART ID        : 1
Boot HART ISA       : rv64imafdcsu
BOOT HART Features  : pmp,scounteren,mcounteren,time
BOOT HART PMP Count : 16
Firmware Base       : 0x80000000
Firmware Size       : 104 KB
Runtime SBI Version : 0.2

MIDELEG : 0x0000000000000222
MEDELEG : 0x000000000000b109
PMP0    : 0x0000000080000000-0x000000008001ffff (A)


no output after this
PMP1    : 0x0000000000000000-0xffffffffffffffff (A,R,W,X)



> Thanks,
>
> >
> >>> Then trying to boot it using:
> >>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3)
> >>> $ qemu-system-riscv64 -machine virt -smp 2 -m 4G ...
> >>>
> >>> It shows no output from the kernel whatsoever, even though I have
> >>> earlycon and output shows very early with other configs.
> >>> Kernel boots fine with defconfig and other smaller configs.
> >>>
> >>> If I enable KASAN_OUTLINE and CC_OPTIMIZE_FOR_SIZE, then this config
> >>> also boots fine. Both of these options significantly reduce kernel
> >>> size. However, I can also boot the kernel without these 2 configs, if
> >>> I disable a whole lot of subsystem configs. This makes me think that
> >>> there is an issue related to kernel size somewhere in
> >>> qemu/bootloader/kernel bootstrap code.
> >>> Does it make sense to you? Can somebody reproduce what I am seeing? >
> >>
> >> I did not bring any answer to your question, but at least you know I'm
> >> working on it, I'll keep you posted.
> >>
> >> Thanks for taking the time to setup syzkaller.
> >>
> >> Alex
> >>
> >>> Thanks
> >>>
> >>> _______________________________________________
> >>> linux-riscv mailing list
> >>> linux-riscv@lists.infradead.org
> >>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> >>>
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
> >

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2021-02-17 17:34                               ` Dmitry Vyukov
@ 2021-02-18  7:54                                 ` Alex Ghiti
  2021-02-18 11:36                                   ` Dmitry Vyukov
  0 siblings, 1 reply; 27+ messages in thread
From: Alex Ghiti @ 2021-02-18  7:54 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller,
	Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv

Hi Dmitry,

> On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote:
>>
>> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit :
>>> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote:
>>>>
>>>> Hi Dmitry,
>>>>
>>>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit :
>>>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>>>
>>>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your
>>>>>>>> issue: I built a kernel on top of the branch riscv/fixes using
>>>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config
>>>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of
>>>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function
>>>>>>>> is used in virt_addr_valid) but not the segfaults you describe.
>>>>>>>
>>>>>>> Hi Alex,
>>>>>>>
>>>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong
>>>>>>> with my build, though, I did 'make clean' before doing. But at the
>>>>>>> same time it worked back in June...
>>>>>>>
>>>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a
>>>>>>> syzbot instance on riscv. If there a WARNING during boot then the
>>>>>>> kernel will be marked as broken. No further testing will happen.
>>>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or
>>>>>>> replace it with pr_err.
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've localized one issue with riscv/KASAN:
>>>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I
>>>>>> saw earlier. The following patch fixes it.
>>>>>> Could somebody please upstream this fix? I don't know how to add/run
>>>>>> tests for this.
>>>>>> Thanks
>>>>>>
>>>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
>>>>>> index 0cfd6da784f84..cf3a383c1799d 100644
>>>>>> --- a/arch/riscv/kernel/vdso/Makefile
>>>>>> +++ b/arch/riscv/kernel/vdso/Makefile
>>>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
>>>>>>     # Disable gcov profiling for VDSO code
>>>>>>     GCOV_PROFILE := n
>>>>>>     KCOV_INSTRUMENT := n
>>>>>> +KASAN_SANITIZE := n
>>>>>>
>>>>>>     # Force dependency
>>>>>>     $(obj)/vdso.o: $(obj)/vdso.so
>>>>
>>>> What's weird is that I don't have any issue without this patch with the
>>>> following config whereas it indeed seems required for KASAN. But when
>>>> looking at the segfaults you got earlier, the segfault address is 0xbb0
>>>> and the cause is an instruction page fault: this address is the PLT base
>>>> address in vdso.so and an instruction page fault would mean that someone
>>>> tried to jump at this address, which is weird. At first sight, that does
>>>> not seem related to your patch above, but clearly I may be wrong.
>>>>
>>>> Tobias, did you observe the same segfaults as Dmitry ?
>>>
>>>
>>> I noticed that not all buildroot images use VDSO, it seems to be
>>> dependent on libc settings (at least I think I changed it in the
>>> past).
>>
>> Ok, I used uClibc but then when using glibc, I have the same segfaults,
>> only when KASAN is enabled. And your patch fixes the problem. I will try
>> to take a look later to better understand the problem.
>>
>>> I also booted an image completely successfully including dhcpd/sshd
>>> start, but then my executable crashed in clock_gettime. The executable
>>> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static"
>>> (10.2.1).
>>>
>>>
>>>>> Second issue I am seeing seems to be related to text segment size.
>>>>> I check out v5.11 and use this config:
>>>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178
>>>>
>>>> This config gave my laptop a hard time ! Finally I was able to boot
>>>> correctly to userspace, but I realized I used my sv48 branch...Either I
>>>> fixed your issue along the way or I can't reproduce it, I'll give it a
>>>> try tomorrow.
>>>
>>> Where is your branch? I could also test in my setup on your branch.
>>>
>>
>> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2
>> here: https://github.com/AlexGhiti/riscv-linux.git
> 
> No, it does not work for me.
> 
> Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD,
> int/alex/riscv_kernel_end_of_address_space_v2)
> Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt
> 
> riscv64-linux-gnu-gcc -v
> gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1)
> 
> qemu-system-riscv64 --version
> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3)
> 
> qemu-system-riscv64 \
> -machine virt -smp 2 -m 2G \
> -device virtio-blk-device,drive=hd0 \
> -drive file=image-riscv64,if=none,format=raw,id=hd0 \
> -kernel arch/riscv/boot/Image \
> -nographic \
> -device virtio-rng-device,rng=rng0 -object
> rng-random,filename=/dev/urandom,id=rng0 \
> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
> virtio-net-device,netdev=net0 \
> -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
> panic_on_warn=1 panic=86400 earlycon"

It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I 
don't think that changes anything at runtime). But your above command 
line does not work for me as it appears you do not load any firmware, if 
I add -bios images/fw_jump.elf, it works. But then I don't know where 
your opensbi output below comes from...

And regarding your issue with calling clock_gettime 'directly' compared 
to using the syscall, I have the same consistent output from both calls.

I have an older gcc (9.3.0) and the same qemu. I think what is missing 
here is your buildroot config, so that we have the exact same 
environment: could you post your buildroot config as well ?

Thanks,

> 
> OpenSBI v0.8
>     ____                    _____ ____ _____
>    / __ \                  / ____|  _ \_   _|
>   | |  | |_ __   ___ _ __ | (___ | |_) || |
>   | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
>   | |__| | |_) |  __/ | | |____) | |_) || |_
>    \____/| .__/ \___|_| |_|_____/|____/_____|
>          | |
>          |_|
> 
> Platform Name       : riscv-virtio,qemu
> Platform Features   : timer,mfdeleg
> Platform HART Count : 2
> Boot HART ID        : 1
> Boot HART ISA       : rv64imafdcsu
> BOOT HART Features  : pmp,scounteren,mcounteren,time
> BOOT HART PMP Count : 16
> Firmware Base       : 0x80000000
> Firmware Size       : 104 KB
> Runtime SBI Version : 0.2
> 
> MIDELEG : 0x0000000000000222
> MEDELEG : 0x000000000000b109
> PMP0    : 0x0000000080000000-0x000000008001ffff (A)OpenSBI v0.6

> 
> 
> no output after this
> PMP1    : 0x0000000000000000-0xffffffffffffffff (A,R,W,X)
> 
> 
> 
>> Thanks,
>>
>>>
>>>>> Then trying to boot it using:
>>>>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3)
>>>>> $ qemu-system-riscv64 -machine virt -smp 2 -m 4G ...
>>>>>
>>>>> It shows no output from the kernel whatsoever, even though I have
>>>>> earlycon and output shows very early with other configs.
>>>>> Kernel boots fine with defconfig and other smaller configs.
>>>>>
>>>>> If I enable KASAN_OUTLINE and CC_OPTIMIZE_FOR_SIZE, then this config
>>>>> also boots fine. Both of these options significantly reduce kernel
>>>>> size. However, I can also boot the kernel without these 2 configs, if
>>>>> I disable a whole lot of subsystem configs. This makes me think that
>>>>> there is an issue related to kernel size somewhere in
>>>>> qemu/bootloader/kernel bootstrap code.
>>>>> Does it make sense to you? Can somebody reproduce what I am seeing? >
>>>>
>>>> I did not bring any answer to your question, but at least you know I'm
>>>> working on it, I'll keep you posted.
>>>>
>>>> Thanks for taking the time to setup syzkaller.
>>>>
>>>> Alex
>>>>
>>>>> Thanks
>>>>>
>>>>> _______________________________________________
>>>>> linux-riscv mailing list
>>>>> linux-riscv@lists.infradead.org
>>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>>>
>>>
>>> _______________________________________________
>>> linux-riscv mailing list
>>> linux-riscv@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2021-02-18  7:54                                 ` Alex Ghiti
@ 2021-02-18 11:36                                   ` Dmitry Vyukov
  2021-02-19 17:01                                     ` Alex Ghiti
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Vyukov @ 2021-02-18 11:36 UTC (permalink / raw)
  To: Alex Ghiti
  Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller,
	Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv

On Thu, Feb 18, 2021 at 8:54 AM Alex Ghiti <alex@ghiti.fr> wrote:
>
> Hi Dmitry,
>
> > On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote:
> >>
> >> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit :
> >>> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote:
> >>>>
> >>>> Hi Dmitry,
> >>>>
> >>>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit :
> >>>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> >>>>>>
> >>>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> >>>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your
> >>>>>>>> issue: I built a kernel on top of the branch riscv/fixes using
> >>>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config
> >>>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of
> >>>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function
> >>>>>>>> is used in virt_addr_valid) but not the segfaults you describe.
> >>>>>>>
> >>>>>>> Hi Alex,
> >>>>>>>
> >>>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong
> >>>>>>> with my build, though, I did 'make clean' before doing. But at the
> >>>>>>> same time it worked back in June...
> >>>>>>>
> >>>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a
> >>>>>>> syzbot instance on riscv. If there a WARNING during boot then the
> >>>>>>> kernel will be marked as broken. No further testing will happen.
> >>>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or
> >>>>>>> replace it with pr_err.
> >>>>>>
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I've localized one issue with riscv/KASAN:
> >>>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I
> >>>>>> saw earlier. The following patch fixes it.
> >>>>>> Could somebody please upstream this fix? I don't know how to add/run
> >>>>>> tests for this.
> >>>>>> Thanks
> >>>>>>
> >>>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
> >>>>>> index 0cfd6da784f84..cf3a383c1799d 100644
> >>>>>> --- a/arch/riscv/kernel/vdso/Makefile
> >>>>>> +++ b/arch/riscv/kernel/vdso/Makefile
> >>>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
> >>>>>>     # Disable gcov profiling for VDSO code
> >>>>>>     GCOV_PROFILE := n
> >>>>>>     KCOV_INSTRUMENT := n
> >>>>>> +KASAN_SANITIZE := n
> >>>>>>
> >>>>>>     # Force dependency
> >>>>>>     $(obj)/vdso.o: $(obj)/vdso.so
> >>>>
> >>>> What's weird is that I don't have any issue without this patch with the
> >>>> following config whereas it indeed seems required for KASAN. But when
> >>>> looking at the segfaults you got earlier, the segfault address is 0xbb0
> >>>> and the cause is an instruction page fault: this address is the PLT base
> >>>> address in vdso.so and an instruction page fault would mean that someone
> >>>> tried to jump at this address, which is weird. At first sight, that does
> >>>> not seem related to your patch above, but clearly I may be wrong.
> >>>>
> >>>> Tobias, did you observe the same segfaults as Dmitry ?
> >>>
> >>>
> >>> I noticed that not all buildroot images use VDSO, it seems to be
> >>> dependent on libc settings (at least I think I changed it in the
> >>> past).
> >>
> >> Ok, I used uClibc but then when using glibc, I have the same segfaults,
> >> only when KASAN is enabled. And your patch fixes the problem. I will try
> >> to take a look later to better understand the problem.
> >>
> >>> I also booted an image completely successfully including dhcpd/sshd
> >>> start, but then my executable crashed in clock_gettime. The executable
> >>> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static"
> >>> (10.2.1).
> >>>
> >>>
> >>>>> Second issue I am seeing seems to be related to text segment size.
> >>>>> I check out v5.11 and use this config:
> >>>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178
> >>>>
> >>>> This config gave my laptop a hard time ! Finally I was able to boot
> >>>> correctly to userspace, but I realized I used my sv48 branch...Either I
> >>>> fixed your issue along the way or I can't reproduce it, I'll give it a
> >>>> try tomorrow.
> >>>
> >>> Where is your branch? I could also test in my setup on your branch.
> >>>
> >>
> >> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2
> >> here: https://github.com/AlexGhiti/riscv-linux.git
> >
> > No, it does not work for me.
> >
> > Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD,
> > int/alex/riscv_kernel_end_of_address_space_v2)
> > Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt
> >
> > riscv64-linux-gnu-gcc -v
> > gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1)
> >
> > qemu-system-riscv64 --version
> > QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3)
> >
> > qemu-system-riscv64 \
> > -machine virt -smp 2 -m 2G \
> > -device virtio-blk-device,drive=hd0 \
> > -drive file=image-riscv64,if=none,format=raw,id=hd0 \
> > -kernel arch/riscv/boot/Image \
> > -nographic \
> > -device virtio-rng-device,rng=rng0 -object
> > rng-random,filename=/dev/urandom,id=rng0 \
> > -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
> > virtio-net-device,netdev=net0 \
> > -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
> > panic_on_warn=1 panic=86400 earlycon"
>
> It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I
> don't think that changes anything at runtime). But your above command
> line does not work for me as it appears you do not load any firmware, if
> I add -bios images/fw_jump.elf, it works. But then I don't know where
> your opensbi output below comes from...
>
> And regarding your issue with calling clock_gettime 'directly' compared
> to using the syscall, I have the same consistent output from both calls.
>
> I have an older gcc (9.3.0) and the same qemu. I think what is missing
> here is your buildroot config, so that we have the exact same
> environment: could you post your buildroot config as well ?

I don't think the image is relevant because I don't even get to kernel
code. If the kernel will complain about no init later, that's fine.
Re bios, this version of qemu already has OpenSBI bios builtin, you
can pass -bios default, but that's, well, the default :)
Here are more reproducible repro instructions that capture gcc and
qemu. I think gcc version may be potentially relevant as I suspect
code size.


curl https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt
> $KERNEL_SRC/.config
docker pull gcr.io/syzkaller/syzbot
docker run -it -v $KERNEL_SRC:/kernel gcr.io/syzkaller/syzbot
cd /kernel
make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- olddefconfig
make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-
qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel
arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial
console=ttyS0"
[this does not, only OpenSBI output]

scripts/config -d KASAN_INLINE -e KASAN_OUTLINE -d
CC_OPTIMIZE_FOR_PERFORMANCE -e CC_OPTIMIZE_FOR_SIZE
make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-
qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel
arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial
console=ttyS0"
[this boots fine, at least at to starting init process]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2021-02-18 11:36                                   ` Dmitry Vyukov
@ 2021-02-19 17:01                                     ` Alex Ghiti
  2021-02-19 18:53                                       ` Dmitry Vyukov
  0 siblings, 1 reply; 27+ messages in thread
From: Alex Ghiti @ 2021-02-19 17:01 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller,
	Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv

Hi Dmitry,

Le 2/18/21 à 6:36 AM, Dmitry Vyukov a écrit :
> On Thu, Feb 18, 2021 at 8:54 AM Alex Ghiti <alex@ghiti.fr> wrote:
>>
>> Hi Dmitry,
>>
>>> On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote:
>>>>
>>>> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit :
>>>>> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote:
>>>>>>
>>>>>> Hi Dmitry,
>>>>>>
>>>>>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit :
>>>>>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>>>>>
>>>>>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your
>>>>>>>>>> issue: I built a kernel on top of the branch riscv/fixes using
>>>>>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config
>>>>>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of
>>>>>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function
>>>>>>>>>> is used in virt_addr_valid) but not the segfaults you describe.
>>>>>>>>>
>>>>>>>>> Hi Alex,
>>>>>>>>>
>>>>>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong
>>>>>>>>> with my build, though, I did 'make clean' before doing. But at the
>>>>>>>>> same time it worked back in June...
>>>>>>>>>
>>>>>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a
>>>>>>>>> syzbot instance on riscv. If there a WARNING during boot then the
>>>>>>>>> kernel will be marked as broken. No further testing will happen.
>>>>>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or
>>>>>>>>> replace it with pr_err.
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I've localized one issue with riscv/KASAN:
>>>>>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I
>>>>>>>> saw earlier. The following patch fixes it.
>>>>>>>> Could somebody please upstream this fix? I don't know how to add/run
>>>>>>>> tests for this.
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
>>>>>>>> index 0cfd6da784f84..cf3a383c1799d 100644
>>>>>>>> --- a/arch/riscv/kernel/vdso/Makefile
>>>>>>>> +++ b/arch/riscv/kernel/vdso/Makefile
>>>>>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
>>>>>>>>      # Disable gcov profiling for VDSO code
>>>>>>>>      GCOV_PROFILE := n
>>>>>>>>      KCOV_INSTRUMENT := n
>>>>>>>> +KASAN_SANITIZE := n
>>>>>>>>
>>>>>>>>      # Force dependency
>>>>>>>>      $(obj)/vdso.o: $(obj)/vdso.so
>>>>>>
>>>>>> What's weird is that I don't have any issue without this patch with the
>>>>>> following config whereas it indeed seems required for KASAN. But when
>>>>>> looking at the segfaults you got earlier, the segfault address is 0xbb0
>>>>>> and the cause is an instruction page fault: this address is the PLT base
>>>>>> address in vdso.so and an instruction page fault would mean that someone
>>>>>> tried to jump at this address, which is weird. At first sight, that does
>>>>>> not seem related to your patch above, but clearly I may be wrong.
>>>>>>
>>>>>> Tobias, did you observe the same segfaults as Dmitry ?
>>>>>
>>>>>
>>>>> I noticed that not all buildroot images use VDSO, it seems to be
>>>>> dependent on libc settings (at least I think I changed it in the
>>>>> past).
>>>>
>>>> Ok, I used uClibc but then when using glibc, I have the same segfaults,
>>>> only when KASAN is enabled. And your patch fixes the problem. I will try
>>>> to take a look later to better understand the problem.
>>>>
>>>>> I also booted an image completely successfully including dhcpd/sshd
>>>>> start, but then my executable crashed in clock_gettime. The executable
>>>>> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static"
>>>>> (10.2.1).
>>>>>
>>>>>
>>>>>>> Second issue I am seeing seems to be related to text segment size.
>>>>>>> I check out v5.11 and use this config:
>>>>>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178
>>>>>>
>>>>>> This config gave my laptop a hard time ! Finally I was able to boot
>>>>>> correctly to userspace, but I realized I used my sv48 branch...Either I
>>>>>> fixed your issue along the way or I can't reproduce it, I'll give it a
>>>>>> try tomorrow.
>>>>>
>>>>> Where is your branch? I could also test in my setup on your branch.
>>>>>
>>>>
>>>> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2
>>>> here: https://github.com/AlexGhiti/riscv-linux.git
>>>
>>> No, it does not work for me.
>>>
>>> Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD,
>>> int/alex/riscv_kernel_end_of_address_space_v2)
>>> Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt
>>>
>>> riscv64-linux-gnu-gcc -v
>>> gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1)
>>>
>>> qemu-system-riscv64 --version
>>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3)
>>>
>>> qemu-system-riscv64 \
>>> -machine virt -smp 2 -m 2G \
>>> -device virtio-blk-device,drive=hd0 \
>>> -drive file=image-riscv64,if=none,format=raw,id=hd0 \
>>> -kernel arch/riscv/boot/Image \
>>> -nographic \
>>> -device virtio-rng-device,rng=rng0 -object
>>> rng-random,filename=/dev/urandom,id=rng0 \
>>> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
>>> virtio-net-device,netdev=net0 \
>>> -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
>>> panic_on_warn=1 panic=86400 earlycon"
>>
>> It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I
>> don't think that changes anything at runtime). But your above command
>> line does not work for me as it appears you do not load any firmware, if
>> I add -bios images/fw_jump.elf, it works. But then I don't know where
>> your opensbi output below comes from...
>>
>> And regarding your issue with calling clock_gettime 'directly' compared
>> to using the syscall, I have the same consistent output from both calls.
>>
>> I have an older gcc (9.3.0) and the same qemu. I think what is missing
>> here is your buildroot config, so that we have the exact same
>> environment: could you post your buildroot config as well ?
> 
> I don't think the image is relevant because I don't even get to kernel
> code. If the kernel will complain about no init later, that's fine.
> Re bios, this version of qemu already has OpenSBI bios builtin, you
> can pass -bios default, but that's, well, the default :)
> Here are more reproducible repro instructions that capture gcc and
> qemu. I think gcc version may be potentially relevant as I suspect
> code size.
> 
> 
> curl https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt
>> $KERNEL_SRC/.config
> docker pull gcr.io/syzkaller/syzbot
> docker run -it -v $KERNEL_SRC:/kernel gcr.io/syzkaller/syzbot
> cd /kernel
> make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- olddefconfig
> make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-
> qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel
> arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial
> console=ttyS0"
> [this does not, only OpenSBI output]
> 

Indeed the issue was code size, please find the fix below. I will send a 
proper patch once I made sure the fix is the right one, but I'm pretty 
confident, there's no reason to limit the mapping size to 128MB whereas 
we have a whole pgdir.

diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 9b0592b11a9f..ff2495707edb 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -287,7 +287,7 @@ pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
  pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
  pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss;

-#define MAX_EARLY_MAPPING_SIZE SZ_128M
+#define MAX_EARLY_MAPPING_SIZE PGDIR_SIZE

  pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);

-- 
2.20.1

Thanks,

Alex

> scripts/config -d KASAN_INLINE -e KASAN_OUTLINE -d
> CC_OPTIMIZE_FOR_PERFORMANCE -e CC_OPTIMIZE_FOR_SIZE
> make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-
> qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel
> arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial
> console=ttyS0"
> [this boots fine, at least at to starting init process]
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
> 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2021-02-19 17:01                                     ` Alex Ghiti
@ 2021-02-19 18:53                                       ` Dmitry Vyukov
  2021-02-19 22:26                                         ` Palmer Dabbelt
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Vyukov @ 2021-02-19 18:53 UTC (permalink / raw)
  To: Alex Ghiti
  Cc: Albert Ou, Bjorn Topel, Palmer Dabbelt, LKML, nylon7, syzkaller,
	Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv

On Fri, Feb 19, 2021 at 6:01 PM Alex Ghiti <alex@ghiti.fr> wrote:
>
> Hi Dmitry,
>
> Le 2/18/21 à 6:36 AM, Dmitry Vyukov a écrit :
> > On Thu, Feb 18, 2021 at 8:54 AM Alex Ghiti <alex@ghiti.fr> wrote:
> >>
> >> Hi Dmitry,
> >>
> >>> On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote:
> >>>>
> >>>> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit :
> >>>>> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote:
> >>>>>>
> >>>>>> Hi Dmitry,
> >>>>>>
> >>>>>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit :
> >>>>>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> >>>>>>>>
> >>>>>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> >>>>>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your
> >>>>>>>>>> issue: I built a kernel on top of the branch riscv/fixes using
> >>>>>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config
> >>>>>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of
> >>>>>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function
> >>>>>>>>>> is used in virt_addr_valid) but not the segfaults you describe.
> >>>>>>>>>
> >>>>>>>>> Hi Alex,
> >>>>>>>>>
> >>>>>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong
> >>>>>>>>> with my build, though, I did 'make clean' before doing. But at the
> >>>>>>>>> same time it worked back in June...
> >>>>>>>>>
> >>>>>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a
> >>>>>>>>> syzbot instance on riscv. If there a WARNING during boot then the
> >>>>>>>>> kernel will be marked as broken. No further testing will happen.
> >>>>>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or
> >>>>>>>>> replace it with pr_err.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> I've localized one issue with riscv/KASAN:
> >>>>>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I
> >>>>>>>> saw earlier. The following patch fixes it.
> >>>>>>>> Could somebody please upstream this fix? I don't know how to add/run
> >>>>>>>> tests for this.
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
> >>>>>>>> index 0cfd6da784f84..cf3a383c1799d 100644
> >>>>>>>> --- a/arch/riscv/kernel/vdso/Makefile
> >>>>>>>> +++ b/arch/riscv/kernel/vdso/Makefile
> >>>>>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
> >>>>>>>>      # Disable gcov profiling for VDSO code
> >>>>>>>>      GCOV_PROFILE := n
> >>>>>>>>      KCOV_INSTRUMENT := n
> >>>>>>>> +KASAN_SANITIZE := n
> >>>>>>>>
> >>>>>>>>      # Force dependency
> >>>>>>>>      $(obj)/vdso.o: $(obj)/vdso.so
> >>>>>>
> >>>>>> What's weird is that I don't have any issue without this patch with the
> >>>>>> following config whereas it indeed seems required for KASAN. But when
> >>>>>> looking at the segfaults you got earlier, the segfault address is 0xbb0
> >>>>>> and the cause is an instruction page fault: this address is the PLT base
> >>>>>> address in vdso.so and an instruction page fault would mean that someone
> >>>>>> tried to jump at this address, which is weird. At first sight, that does
> >>>>>> not seem related to your patch above, but clearly I may be wrong.
> >>>>>>
> >>>>>> Tobias, did you observe the same segfaults as Dmitry ?
> >>>>>
> >>>>>
> >>>>> I noticed that not all buildroot images use VDSO, it seems to be
> >>>>> dependent on libc settings (at least I think I changed it in the
> >>>>> past).
> >>>>
> >>>> Ok, I used uClibc but then when using glibc, I have the same segfaults,
> >>>> only when KASAN is enabled. And your patch fixes the problem. I will try
> >>>> to take a look later to better understand the problem.
> >>>>
> >>>>> I also booted an image completely successfully including dhcpd/sshd
> >>>>> start, but then my executable crashed in clock_gettime. The executable
> >>>>> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static"
> >>>>> (10.2.1).
> >>>>>
> >>>>>
> >>>>>>> Second issue I am seeing seems to be related to text segment size.
> >>>>>>> I check out v5.11 and use this config:
> >>>>>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178
> >>>>>>
> >>>>>> This config gave my laptop a hard time ! Finally I was able to boot
> >>>>>> correctly to userspace, but I realized I used my sv48 branch...Either I
> >>>>>> fixed your issue along the way or I can't reproduce it, I'll give it a
> >>>>>> try tomorrow.
> >>>>>
> >>>>> Where is your branch? I could also test in my setup on your branch.
> >>>>>
> >>>>
> >>>> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2
> >>>> here: https://github.com/AlexGhiti/riscv-linux.git
> >>>
> >>> No, it does not work for me.
> >>>
> >>> Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD,
> >>> int/alex/riscv_kernel_end_of_address_space_v2)
> >>> Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt
> >>>
> >>> riscv64-linux-gnu-gcc -v
> >>> gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1)
> >>>
> >>> qemu-system-riscv64 --version
> >>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3)
> >>>
> >>> qemu-system-riscv64 \
> >>> -machine virt -smp 2 -m 2G \
> >>> -device virtio-blk-device,drive=hd0 \
> >>> -drive file=image-riscv64,if=none,format=raw,id=hd0 \
> >>> -kernel arch/riscv/boot/Image \
> >>> -nographic \
> >>> -device virtio-rng-device,rng=rng0 -object
> >>> rng-random,filename=/dev/urandom,id=rng0 \
> >>> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
> >>> virtio-net-device,netdev=net0 \
> >>> -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
> >>> panic_on_warn=1 panic=86400 earlycon"
> >>
> >> It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I
> >> don't think that changes anything at runtime). But your above command
> >> line does not work for me as it appears you do not load any firmware, if
> >> I add -bios images/fw_jump.elf, it works. But then I don't know where
> >> your opensbi output below comes from...
> >>
> >> And regarding your issue with calling clock_gettime 'directly' compared
> >> to using the syscall, I have the same consistent output from both calls.
> >>
> >> I have an older gcc (9.3.0) and the same qemu. I think what is missing
> >> here is your buildroot config, so that we have the exact same
> >> environment: could you post your buildroot config as well ?
> >
> > I don't think the image is relevant because I don't even get to kernel
> > code. If the kernel will complain about no init later, that's fine.
> > Re bios, this version of qemu already has OpenSBI bios builtin, you
> > can pass -bios default, but that's, well, the default :)
> > Here are more reproducible repro instructions that capture gcc and
> > qemu. I think gcc version may be potentially relevant as I suspect
> > code size.
> >
> >
> > curl https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt
> >> $KERNEL_SRC/.config
> > docker pull gcr.io/syzkaller/syzbot
> > docker run -it -v $KERNEL_SRC:/kernel gcr.io/syzkaller/syzbot
> > cd /kernel
> > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- olddefconfig
> > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-
> > qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel
> > arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial
> > console=ttyS0"
> > [this does not, only OpenSBI output]
> >
>
> Indeed the issue was code size, please find the fix below. I will send a
> proper patch once I made sure the fix is the right one, but I'm pretty
> confident, there's no reason to limit the mapping size to 128MB whereas
> we have a whole pgdir.

Great you get to the bottom of this!
Riscv kernels are going to be YUGE!

> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 9b0592b11a9f..ff2495707edb 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -287,7 +287,7 @@ pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
>   pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
>   pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss;
>
> -#define MAX_EARLY_MAPPING_SIZE SZ_128M
> +#define MAX_EARLY_MAPPING_SIZE PGDIR_SIZE
>
>   pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
>
> --
> 2.20.1
>
> Thanks,
>
> Alex
>
> > scripts/config -d KASAN_INLINE -e KASAN_OUTLINE -d
> > CC_OPTIMIZE_FOR_PERFORMANCE -e CC_OPTIMIZE_FOR_SIZE
> > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-
> > qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel
> > arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial
> > console=ttyS0"
> > [this boots fine, at least at to starting init process]
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
> >

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2021-02-19 18:53                                       ` Dmitry Vyukov
@ 2021-02-19 22:26                                         ` Palmer Dabbelt
  2021-03-09 17:11                                           ` Dmitry Vyukov
  0 siblings, 1 reply; 27+ messages in thread
From: Palmer Dabbelt @ 2021-02-19 22:26 UTC (permalink / raw)
  To: dvyukov
  Cc: alex, aou, Bjorn Topel, linux-kernel, nylon7, syzkaller, schwab,
	Paul Walmsley, tklauser, linux-riscv

On Fri, 19 Feb 2021 10:53:43 PST (-0800), dvyukov@google.com wrote:
> On Fri, Feb 19, 2021 at 6:01 PM Alex Ghiti <alex@ghiti.fr> wrote:
>>
>> Hi Dmitry,
>>
>> Le 2/18/21 à 6:36 AM, Dmitry Vyukov a écrit :
>> > On Thu, Feb 18, 2021 at 8:54 AM Alex Ghiti <alex@ghiti.fr> wrote:
>> >>
>> >> Hi Dmitry,
>> >>
>> >>> On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote:
>> >>>>
>> >>>> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit :
>> >>>>> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote:
>> >>>>>>
>> >>>>>> Hi Dmitry,
>> >>>>>>
>> >>>>>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit :
>> >>>>>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote:
>> >>>>>>>>
>> >>>>>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>> >>>>>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your
>> >>>>>>>>>> issue: I built a kernel on top of the branch riscv/fixes using
>> >>>>>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config
>> >>>>>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of
>> >>>>>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function
>> >>>>>>>>>> is used in virt_addr_valid) but not the segfaults you describe.
>> >>>>>>>>>
>> >>>>>>>>> Hi Alex,
>> >>>>>>>>>
>> >>>>>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong
>> >>>>>>>>> with my build, though, I did 'make clean' before doing. But at the
>> >>>>>>>>> same time it worked back in June...
>> >>>>>>>>>
>> >>>>>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a
>> >>>>>>>>> syzbot instance on riscv. If there a WARNING during boot then the
>> >>>>>>>>> kernel will be marked as broken. No further testing will happen.
>> >>>>>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or
>> >>>>>>>>> replace it with pr_err.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Hi,
>> >>>>>>>>
>> >>>>>>>> I've localized one issue with riscv/KASAN:
>> >>>>>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I
>> >>>>>>>> saw earlier. The following patch fixes it.
>> >>>>>>>> Could somebody please upstream this fix? I don't know how to add/run
>> >>>>>>>> tests for this.
>> >>>>>>>> Thanks
>> >>>>>>>>
>> >>>>>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
>> >>>>>>>> index 0cfd6da784f84..cf3a383c1799d 100644
>> >>>>>>>> --- a/arch/riscv/kernel/vdso/Makefile
>> >>>>>>>> +++ b/arch/riscv/kernel/vdso/Makefile
>> >>>>>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
>> >>>>>>>>      # Disable gcov profiling for VDSO code
>> >>>>>>>>      GCOV_PROFILE := n
>> >>>>>>>>      KCOV_INSTRUMENT := n
>> >>>>>>>> +KASAN_SANITIZE := n
>> >>>>>>>>
>> >>>>>>>>      # Force dependency
>> >>>>>>>>      $(obj)/vdso.o: $(obj)/vdso.so
>> >>>>>>
>> >>>>>> What's weird is that I don't have any issue without this patch with the
>> >>>>>> following config whereas it indeed seems required for KASAN. But when
>> >>>>>> looking at the segfaults you got earlier, the segfault address is 0xbb0
>> >>>>>> and the cause is an instruction page fault: this address is the PLT base
>> >>>>>> address in vdso.so and an instruction page fault would mean that someone
>> >>>>>> tried to jump at this address, which is weird. At first sight, that does
>> >>>>>> not seem related to your patch above, but clearly I may be wrong.
>> >>>>>>
>> >>>>>> Tobias, did you observe the same segfaults as Dmitry ?
>> >>>>>
>> >>>>>
>> >>>>> I noticed that not all buildroot images use VDSO, it seems to be
>> >>>>> dependent on libc settings (at least I think I changed it in the
>> >>>>> past).
>> >>>>
>> >>>> Ok, I used uClibc but then when using glibc, I have the same segfaults,
>> >>>> only when KASAN is enabled. And your patch fixes the problem. I will try
>> >>>> to take a look later to better understand the problem.
>> >>>>
>> >>>>> I also booted an image completely successfully including dhcpd/sshd
>> >>>>> start, but then my executable crashed in clock_gettime. The executable
>> >>>>> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static"
>> >>>>> (10.2.1).
>> >>>>>
>> >>>>>
>> >>>>>>> Second issue I am seeing seems to be related to text segment size.
>> >>>>>>> I check out v5.11 and use this config:
>> >>>>>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178
>> >>>>>>
>> >>>>>> This config gave my laptop a hard time ! Finally I was able to boot
>> >>>>>> correctly to userspace, but I realized I used my sv48 branch...Either I
>> >>>>>> fixed your issue along the way or I can't reproduce it, I'll give it a
>> >>>>>> try tomorrow.
>> >>>>>
>> >>>>> Where is your branch? I could also test in my setup on your branch.
>> >>>>>
>> >>>>
>> >>>> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2
>> >>>> here: https://github.com/AlexGhiti/riscv-linux.git
>> >>>
>> >>> No, it does not work for me.
>> >>>
>> >>> Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD,
>> >>> int/alex/riscv_kernel_end_of_address_space_v2)
>> >>> Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt
>> >>>
>> >>> riscv64-linux-gnu-gcc -v
>> >>> gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1)
>> >>>
>> >>> qemu-system-riscv64 --version
>> >>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3)
>> >>>
>> >>> qemu-system-riscv64 \
>> >>> -machine virt -smp 2 -m 2G \
>> >>> -device virtio-blk-device,drive=hd0 \
>> >>> -drive file=image-riscv64,if=none,format=raw,id=hd0 \
>> >>> -kernel arch/riscv/boot/Image \
>> >>> -nographic \
>> >>> -device virtio-rng-device,rng=rng0 -object
>> >>> rng-random,filename=/dev/urandom,id=rng0 \
>> >>> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
>> >>> virtio-net-device,netdev=net0 \
>> >>> -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
>> >>> panic_on_warn=1 panic=86400 earlycon"
>> >>
>> >> It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I
>> >> don't think that changes anything at runtime). But your above command
>> >> line does not work for me as it appears you do not load any firmware, if
>> >> I add -bios images/fw_jump.elf, it works. But then I don't know where
>> >> your opensbi output below comes from...
>> >>
>> >> And regarding your issue with calling clock_gettime 'directly' compared
>> >> to using the syscall, I have the same consistent output from both calls.
>> >>
>> >> I have an older gcc (9.3.0) and the same qemu. I think what is missing
>> >> here is your buildroot config, so that we have the exact same
>> >> environment: could you post your buildroot config as well ?
>> >
>> > I don't think the image is relevant because I don't even get to kernel
>> > code. If the kernel will complain about no init later, that's fine.
>> > Re bios, this version of qemu already has OpenSBI bios builtin, you
>> > can pass -bios default, but that's, well, the default :)
>> > Here are more reproducible repro instructions that capture gcc and
>> > qemu. I think gcc version may be potentially relevant as I suspect
>> > code size.
>> >
>> >
>> > curl https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt
>> >> $KERNEL_SRC/.config
>> > docker pull gcr.io/syzkaller/syzbot
>> > docker run -it -v $KERNEL_SRC:/kernel gcr.io/syzkaller/syzbot
>> > cd /kernel
>> > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- olddefconfig
>> > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-
>> > qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel
>> > arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial
>> > console=ttyS0"
>> > [this does not, only OpenSBI output]
>> >
>>
>> Indeed the issue was code size, please find the fix below. I will send a
>> proper patch once I made sure the fix is the right one, but I'm pretty
>> confident, there's no reason to limit the mapping size to 128MB whereas
>> we have a whole pgdir.
>
> Great you get to the bottom of this!
> Riscv kernels are going to be YUGE!

IIRC I tried that a while ago and it didn't work.  It's possible I was just
running into some other bug, but I'm just build testing allyesconfig as opposed
to boot testing it.

If you've got a setup that does boot I'm happy to take a patch, though.  It'll
at least be one step forward.

>
>> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
>> index 9b0592b11a9f..ff2495707edb 100644
>> --- a/arch/riscv/mm/init.c
>> +++ b/arch/riscv/mm/init.c
>> @@ -287,7 +287,7 @@ pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
>>   pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
>>   pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss;
>>
>> -#define MAX_EARLY_MAPPING_SIZE SZ_128M
>> +#define MAX_EARLY_MAPPING_SIZE PGDIR_SIZE
>>
>>   pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
>>
>> --
>> 2.20.1
>>
>> Thanks,
>>
>> Alex
>>
>> > scripts/config -d KASAN_INLINE -e KASAN_OUTLINE -d
>> > CC_OPTIMIZE_FOR_PERFORMANCE -e CC_OPTIMIZE_FOR_SIZE
>> > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-
>> > qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel
>> > arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial
>> > console=ttyS0"
>> > [this boots fine, at least at to starting init process]
>> >
>> > _______________________________________________
>> > linux-riscv mailing list
>> > linux-riscv@lists.infradead.org
>> > http://lists.infradead.org/mailman/listinfo/linux-riscv
>> >

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2021-02-19 22:26                                         ` Palmer Dabbelt
@ 2021-03-09 17:11                                           ` Dmitry Vyukov
  2021-03-09 19:49                                             ` Alex Ghiti
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Vyukov @ 2021-03-09 17:11 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Alex Ghiti, Albert Ou, Bjorn Topel, LKML, nylon7, syzkaller,
	Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv

On Fri, Feb 19, 2021 at 11:26 PM 'Palmer Dabbelt' via syzkaller
<syzkaller@googlegroups.com> wrote:
>
> On Fri, 19 Feb 2021 10:53:43 PST (-0800), dvyukov@google.com wrote:
> > On Fri, Feb 19, 2021 at 6:01 PM Alex Ghiti <alex@ghiti.fr> wrote:
> >>
> >> Hi Dmitry,
> >>
> >> Le 2/18/21 à 6:36 AM, Dmitry Vyukov a écrit :
> >> > On Thu, Feb 18, 2021 at 8:54 AM Alex Ghiti <alex@ghiti.fr> wrote:
> >> >>
> >> >> Hi Dmitry,
> >> >>
> >> >>> On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote:
> >> >>>>
> >> >>>> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit :
> >> >>>>> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote:
> >> >>>>>>
> >> >>>>>> Hi Dmitry,
> >> >>>>>>
> >> >>>>>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit :
> >> >>>>>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> >> >>>>>>>>
> >> >>>>>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> >> >>>>>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your
> >> >>>>>>>>>> issue: I built a kernel on top of the branch riscv/fixes using
> >> >>>>>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config
> >> >>>>>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of
> >> >>>>>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function
> >> >>>>>>>>>> is used in virt_addr_valid) but not the segfaults you describe.
> >> >>>>>>>>>
> >> >>>>>>>>> Hi Alex,
> >> >>>>>>>>>
> >> >>>>>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong
> >> >>>>>>>>> with my build, though, I did 'make clean' before doing. But at the
> >> >>>>>>>>> same time it worked back in June...
> >> >>>>>>>>>
> >> >>>>>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a
> >> >>>>>>>>> syzbot instance on riscv. If there a WARNING during boot then the
> >> >>>>>>>>> kernel will be marked as broken. No further testing will happen.
> >> >>>>>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or
> >> >>>>>>>>> replace it with pr_err.
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> Hi,
> >> >>>>>>>>
> >> >>>>>>>> I've localized one issue with riscv/KASAN:
> >> >>>>>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I
> >> >>>>>>>> saw earlier. The following patch fixes it.
> >> >>>>>>>> Could somebody please upstream this fix? I don't know how to add/run
> >> >>>>>>>> tests for this.
> >> >>>>>>>> Thanks
> >> >>>>>>>>
> >> >>>>>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
> >> >>>>>>>> index 0cfd6da784f84..cf3a383c1799d 100644
> >> >>>>>>>> --- a/arch/riscv/kernel/vdso/Makefile
> >> >>>>>>>> +++ b/arch/riscv/kernel/vdso/Makefile
> >> >>>>>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
> >> >>>>>>>>      # Disable gcov profiling for VDSO code
> >> >>>>>>>>      GCOV_PROFILE := n
> >> >>>>>>>>      KCOV_INSTRUMENT := n
> >> >>>>>>>> +KASAN_SANITIZE := n
> >> >>>>>>>>
> >> >>>>>>>>      # Force dependency
> >> >>>>>>>>      $(obj)/vdso.o: $(obj)/vdso.so
> >> >>>>>>
> >> >>>>>> What's weird is that I don't have any issue without this patch with the
> >> >>>>>> following config whereas it indeed seems required for KASAN. But when
> >> >>>>>> looking at the segfaults you got earlier, the segfault address is 0xbb0
> >> >>>>>> and the cause is an instruction page fault: this address is the PLT base
> >> >>>>>> address in vdso.so and an instruction page fault would mean that someone
> >> >>>>>> tried to jump at this address, which is weird. At first sight, that does
> >> >>>>>> not seem related to your patch above, but clearly I may be wrong.
> >> >>>>>>
> >> >>>>>> Tobias, did you observe the same segfaults as Dmitry ?
> >> >>>>>
> >> >>>>>
> >> >>>>> I noticed that not all buildroot images use VDSO, it seems to be
> >> >>>>> dependent on libc settings (at least I think I changed it in the
> >> >>>>> past).
> >> >>>>
> >> >>>> Ok, I used uClibc but then when using glibc, I have the same segfaults,
> >> >>>> only when KASAN is enabled. And your patch fixes the problem. I will try
> >> >>>> to take a look later to better understand the problem.
> >> >>>>
> >> >>>>> I also booted an image completely successfully including dhcpd/sshd
> >> >>>>> start, but then my executable crashed in clock_gettime. The executable
> >> >>>>> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static"
> >> >>>>> (10.2.1).
> >> >>>>>
> >> >>>>>
> >> >>>>>>> Second issue I am seeing seems to be related to text segment size.
> >> >>>>>>> I check out v5.11 and use this config:
> >> >>>>>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178
> >> >>>>>>
> >> >>>>>> This config gave my laptop a hard time ! Finally I was able to boot
> >> >>>>>> correctly to userspace, but I realized I used my sv48 branch...Either I
> >> >>>>>> fixed your issue along the way or I can't reproduce it, I'll give it a
> >> >>>>>> try tomorrow.
> >> >>>>>
> >> >>>>> Where is your branch? I could also test in my setup on your branch.
> >> >>>>>
> >> >>>>
> >> >>>> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2
> >> >>>> here: https://github.com/AlexGhiti/riscv-linux.git
> >> >>>
> >> >>> No, it does not work for me.
> >> >>>
> >> >>> Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD,
> >> >>> int/alex/riscv_kernel_end_of_address_space_v2)
> >> >>> Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt
> >> >>>
> >> >>> riscv64-linux-gnu-gcc -v
> >> >>> gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1)
> >> >>>
> >> >>> qemu-system-riscv64 --version
> >> >>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3)
> >> >>>
> >> >>> qemu-system-riscv64 \
> >> >>> -machine virt -smp 2 -m 2G \
> >> >>> -device virtio-blk-device,drive=hd0 \
> >> >>> -drive file=image-riscv64,if=none,format=raw,id=hd0 \
> >> >>> -kernel arch/riscv/boot/Image \
> >> >>> -nographic \
> >> >>> -device virtio-rng-device,rng=rng0 -object
> >> >>> rng-random,filename=/dev/urandom,id=rng0 \
> >> >>> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
> >> >>> virtio-net-device,netdev=net0 \
> >> >>> -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
> >> >>> panic_on_warn=1 panic=86400 earlycon"
> >> >>
> >> >> It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I
> >> >> don't think that changes anything at runtime). But your above command
> >> >> line does not work for me as it appears you do not load any firmware, if
> >> >> I add -bios images/fw_jump.elf, it works. But then I don't know where
> >> >> your opensbi output below comes from...
> >> >>
> >> >> And regarding your issue with calling clock_gettime 'directly' compared
> >> >> to using the syscall, I have the same consistent output from both calls.
> >> >>
> >> >> I have an older gcc (9.3.0) and the same qemu. I think what is missing
> >> >> here is your buildroot config, so that we have the exact same
> >> >> environment: could you post your buildroot config as well ?
> >> >
> >> > I don't think the image is relevant because I don't even get to kernel
> >> > code. If the kernel will complain about no init later, that's fine.
> >> > Re bios, this version of qemu already has OpenSBI bios builtin, you
> >> > can pass -bios default, but that's, well, the default :)
> >> > Here are more reproducible repro instructions that capture gcc and
> >> > qemu. I think gcc version may be potentially relevant as I suspect
> >> > code size.
> >> >
> >> >
> >> > curl https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt
> >> >> $KERNEL_SRC/.config
> >> > docker pull gcr.io/syzkaller/syzbot
> >> > docker run -it -v $KERNEL_SRC:/kernel gcr.io/syzkaller/syzbot
> >> > cd /kernel
> >> > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- olddefconfig
> >> > make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-
> >> > qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel
> >> > arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial
> >> > console=ttyS0"
> >> > [this does not, only OpenSBI output]
> >> >
> >>
> >> Indeed the issue was code size, please find the fix below. I will send a
> >> proper patch once I made sure the fix is the right one, but I'm pretty
> >> confident, there's no reason to limit the mapping size to 128MB whereas
> >> we have a whole pgdir.
> >
> > Great you get to the bottom of this!
> > Riscv kernels are going to be YUGE!
>
> IIRC I tried that a while ago and it didn't work.  It's possible I was just
> running into some other bug, but I'm just build testing allyesconfig as opposed
> to boot testing it.
>
> If you've got a setup that does boot I'm happy to take a patch, though.  It'll
> at least be one step forward.



OK, it's getting better.
The next issue is called "512 bytes should be enough for everyone!" :)
https://elixir.bootlin.com/linux/v5.12-rc2/source/include/uapi/asm-generic/setup.h#L5
Most other arches redefine it to something bigger:
https://elixir.bootlin.com/linux/v5.12-rc2/source/arch/s390/include/uapi/asm/setup.h#L10
even arm32 redefines it.
I am not sure the default is even reasonable anymore. Failure mode is
also not nice (silent truncation).
We are trying to pass this:

earlyprintk=serial oops=panic nmi_watchdog=panic panic=86400
net.ifnames=0 sysctl.kernel.hung_task_all_cpu_backtrace=1
ima_policy=tcb kvm-intel.nested=1 nf-conntrack-ftp.ports=20000
nf-conntrack-tftp.ports=20000 nf-conntrack-sip.ports=20000
nf-conntrack-irc.ports=20000 nf-conntrack-sane.ports=20000
vivid.n_devs=16 vivid.multiplanar=1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2
netrom.nr_ndevs=16 rose.rose_ndevs=16 spec_store_bypass_disable=prctl
numa=fake=2 nopcid dummy_hcd.num=8 binder.debug_mask=0
rcupdate.rcu_expedited=1 watchdog_thresh=165
workqueue.watchdog_thresh=420 panic_on_warn=1

The last part gets truncated and we are getting false workqueue watchdog stalls.

Could you please increase it?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2021-03-09 17:11                                           ` Dmitry Vyukov
@ 2021-03-09 19:49                                             ` Alex Ghiti
  2021-03-10 17:25                                               ` Dmitry Vyukov
  0 siblings, 1 reply; 27+ messages in thread
From: Alex Ghiti @ 2021-03-09 19:49 UTC (permalink / raw)
  To: Dmitry Vyukov, Palmer Dabbelt
  Cc: Albert Ou, Bjorn Topel, LKML, nylon7, syzkaller, Andreas Schwab,
	Paul Walmsley, Tobias Klauser, linux-riscv

Le 3/9/21 à 12:11 PM, Dmitry Vyukov a écrit :
> On Fri, Feb 19, 2021 at 11:26 PM 'Palmer Dabbelt' via syzkaller
> <syzkaller@googlegroups.com> wrote:
>>
>> On Fri, 19 Feb 2021 10:53:43 PST (-0800), dvyukov@google.com wrote:
>>> On Fri, Feb 19, 2021 at 6:01 PM Alex Ghiti <alex@ghiti.fr> wrote:
>>>>
>>>> Hi Dmitry,
>>>>
>>>> Le 2/18/21 à 6:36 AM, Dmitry Vyukov a écrit :
>>>>> On Thu, Feb 18, 2021 at 8:54 AM Alex Ghiti <alex@ghiti.fr> wrote:
>>>>>>
>>>>>> Hi Dmitry,
>>>>>>
>>>>>>> On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote:
>>>>>>>>
>>>>>>>> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit :
>>>>>>>>> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Dmitry,
>>>>>>>>>>
>>>>>>>>>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit :
>>>>>>>>>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>>>>>>>>>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your
>>>>>>>>>>>>>> issue: I built a kernel on top of the branch riscv/fixes using
>>>>>>>>>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config
>>>>>>>>>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of
>>>>>>>>>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function
>>>>>>>>>>>>>> is used in virt_addr_valid) but not the segfaults you describe.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Alex,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong
>>>>>>>>>>>>> with my build, though, I did 'make clean' before doing. But at the
>>>>>>>>>>>>> same time it worked back in June...
>>>>>>>>>>>>>
>>>>>>>>>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a
>>>>>>>>>>>>> syzbot instance on riscv. If there a WARNING during boot then the
>>>>>>>>>>>>> kernel will be marked as broken. No further testing will happen.
>>>>>>>>>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or
>>>>>>>>>>>>> replace it with pr_err.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I've localized one issue with riscv/KASAN:
>>>>>>>>>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I
>>>>>>>>>>>> saw earlier. The following patch fixes it.
>>>>>>>>>>>> Could somebody please upstream this fix? I don't know how to add/run
>>>>>>>>>>>> tests for this.
>>>>>>>>>>>> Thanks
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
>>>>>>>>>>>> index 0cfd6da784f84..cf3a383c1799d 100644
>>>>>>>>>>>> --- a/arch/riscv/kernel/vdso/Makefile
>>>>>>>>>>>> +++ b/arch/riscv/kernel/vdso/Makefile
>>>>>>>>>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
>>>>>>>>>>>>       # Disable gcov profiling for VDSO code
>>>>>>>>>>>>       GCOV_PROFILE := n
>>>>>>>>>>>>       KCOV_INSTRUMENT := n
>>>>>>>>>>>> +KASAN_SANITIZE := n
>>>>>>>>>>>>
>>>>>>>>>>>>       # Force dependency
>>>>>>>>>>>>       $(obj)/vdso.o: $(obj)/vdso.so
>>>>>>>>>>
>>>>>>>>>> What's weird is that I don't have any issue without this patch with the
>>>>>>>>>> following config whereas it indeed seems required for KASAN. But when
>>>>>>>>>> looking at the segfaults you got earlier, the segfault address is 0xbb0
>>>>>>>>>> and the cause is an instruction page fault: this address is the PLT base
>>>>>>>>>> address in vdso.so and an instruction page fault would mean that someone
>>>>>>>>>> tried to jump at this address, which is weird. At first sight, that does
>>>>>>>>>> not seem related to your patch above, but clearly I may be wrong.
>>>>>>>>>>
>>>>>>>>>> Tobias, did you observe the same segfaults as Dmitry ?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I noticed that not all buildroot images use VDSO, it seems to be
>>>>>>>>> dependent on libc settings (at least I think I changed it in the
>>>>>>>>> past).
>>>>>>>>
>>>>>>>> Ok, I used uClibc but then when using glibc, I have the same segfaults,
>>>>>>>> only when KASAN is enabled. And your patch fixes the problem. I will try
>>>>>>>> to take a look later to better understand the problem.
>>>>>>>>
>>>>>>>>> I also booted an image completely successfully including dhcpd/sshd
>>>>>>>>> start, but then my executable crashed in clock_gettime. The executable
>>>>>>>>> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static"
>>>>>>>>> (10.2.1).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> Second issue I am seeing seems to be related to text segment size.
>>>>>>>>>>> I check out v5.11 and use this config:
>>>>>>>>>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178
>>>>>>>>>>
>>>>>>>>>> This config gave my laptop a hard time ! Finally I was able to boot
>>>>>>>>>> correctly to userspace, but I realized I used my sv48 branch...Either I
>>>>>>>>>> fixed your issue along the way or I can't reproduce it, I'll give it a
>>>>>>>>>> try tomorrow.
>>>>>>>>>
>>>>>>>>> Where is your branch? I could also test in my setup on your branch.
>>>>>>>>>
>>>>>>>>
>>>>>>>> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2
>>>>>>>> here: https://github.com/AlexGhiti/riscv-linux.git
>>>>>>>
>>>>>>> No, it does not work for me.
>>>>>>>
>>>>>>> Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD,
>>>>>>> int/alex/riscv_kernel_end_of_address_space_v2)
>>>>>>> Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt
>>>>>>>
>>>>>>> riscv64-linux-gnu-gcc -v
>>>>>>> gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1)
>>>>>>>
>>>>>>> qemu-system-riscv64 --version
>>>>>>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3)
>>>>>>>
>>>>>>> qemu-system-riscv64 \
>>>>>>> -machine virt -smp 2 -m 2G \
>>>>>>> -device virtio-blk-device,drive=hd0 \
>>>>>>> -drive file=image-riscv64,if=none,format=raw,id=hd0 \
>>>>>>> -kernel arch/riscv/boot/Image \
>>>>>>> -nographic \
>>>>>>> -device virtio-rng-device,rng=rng0 -object
>>>>>>> rng-random,filename=/dev/urandom,id=rng0 \
>>>>>>> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
>>>>>>> virtio-net-device,netdev=net0 \
>>>>>>> -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
>>>>>>> panic_on_warn=1 panic=86400 earlycon"
>>>>>>
>>>>>> It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I
>>>>>> don't think that changes anything at runtime). But your above command
>>>>>> line does not work for me as it appears you do not load any firmware, if
>>>>>> I add -bios images/fw_jump.elf, it works. But then I don't know where
>>>>>> your opensbi output below comes from...
>>>>>>
>>>>>> And regarding your issue with calling clock_gettime 'directly' compared
>>>>>> to using the syscall, I have the same consistent output from both calls.
>>>>>>
>>>>>> I have an older gcc (9.3.0) and the same qemu. I think what is missing
>>>>>> here is your buildroot config, so that we have the exact same
>>>>>> environment: could you post your buildroot config as well ?
>>>>>
>>>>> I don't think the image is relevant because I don't even get to kernel
>>>>> code. If the kernel will complain about no init later, that's fine.
>>>>> Re bios, this version of qemu already has OpenSBI bios builtin, you
>>>>> can pass -bios default, but that's, well, the default :)
>>>>> Here are more reproducible repro instructions that capture gcc and
>>>>> qemu. I think gcc version may be potentially relevant as I suspect
>>>>> code size.
>>>>>
>>>>>
>>>>> curl https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt
>>>>>> $KERNEL_SRC/.config
>>>>> docker pull gcr.io/syzkaller/syzbot
>>>>> docker run -it -v $KERNEL_SRC:/kernel gcr.io/syzkaller/syzbot
>>>>> cd /kernel
>>>>> make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- olddefconfig
>>>>> make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-
>>>>> qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel
>>>>> arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial
>>>>> console=ttyS0"
>>>>> [this does not, only OpenSBI output]
>>>>>
>>>>
>>>> Indeed the issue was code size, please find the fix below. I will send a
>>>> proper patch once I made sure the fix is the right one, but I'm pretty
>>>> confident, there's no reason to limit the mapping size to 128MB whereas
>>>> we have a whole pgdir.
>>>
>>> Great you get to the bottom of this!
>>> Riscv kernels are going to be YUGE!
>>
>> IIRC I tried that a while ago and it didn't work.  It's possible I was just
>> running into some other bug, but I'm just build testing allyesconfig as opposed
>> to boot testing it.
>>
>> If you've got a setup that does boot I'm happy to take a patch, though.  It'll
>> at least be one step forward.
> 
> 
> 
> OK, it's getting better.

Nice :)

> The next issue is called "512 bytes should be enough for everyone!" :)
> https://elixir.bootlin.com/linux/v5.12-rc2/source/include/uapi/asm-generic/setup.h#L5
> Most other arches redefine it to something bigger:
> https://elixir.bootlin.com/linux/v5.12-rc2/source/arch/s390/include/uapi/asm/setup.h#L10
> even arm32 redefines it.
> I am not sure the default is even reasonable anymore. 

Some archs override this value to 256, but git blame shows this is 
(very) old. I agree that 512 as default seems low.

> Failure mode is
> also not nice (silent truncation).

Agreed, maybe we could still have the default value and checks the 
terminating null character is somewhere and bugs if not, I'll take a look.

> We are trying to pass this:
> 
> earlyprintk=serial oops=panic nmi_watchdog=panic panic=86400
> net.ifnames=0 sysctl.kernel.hung_task_all_cpu_backtrace=1
> ima_policy=tcb kvm-intel.nested=1 nf-conntrack-ftp.ports=20000
> nf-conntrack-tftp.ports=20000 nf-conntrack-sip.ports=20000
> nf-conntrack-irc.ports=20000 nf-conntrack-sane.ports=20000
> vivid.n_devs=16 vivid.multiplanar=1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2
> netrom.nr_ndevs=16 rose.rose_ndevs=16 spec_store_bypass_disable=prctl
> numa=fake=2 nopcid dummy_hcd.num=8 binder.debug_mask=0
> rcupdate.rcu_expedited=1 watchdog_thresh=165
> workqueue.watchdog_thresh=420 panic_on_warn=1
> 
> The last part gets truncated and we are getting false workqueue watchdog stalls.
> 
> Could you please increase it?

I will propose a patchset that increases the default value and cleans 
archs up accordingly too.

Thanks again,

Alex

> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: riscv+KASAN does not boot
  2021-03-09 19:49                                             ` Alex Ghiti
@ 2021-03-10 17:25                                               ` Dmitry Vyukov
  0 siblings, 0 replies; 27+ messages in thread
From: Dmitry Vyukov @ 2021-03-10 17:25 UTC (permalink / raw)
  To: Alex Ghiti
  Cc: Palmer Dabbelt, Albert Ou, Bjorn Topel, LKML, nylon7, syzkaller,
	Andreas Schwab, Paul Walmsley, Tobias Klauser, linux-riscv

On Tue, Mar 9, 2021 at 8:49 PM Alex Ghiti <alex@ghiti.fr> wrote:
>
> Le 3/9/21 à 12:11 PM, Dmitry Vyukov a écrit :
> > On Fri, Feb 19, 2021 at 11:26 PM 'Palmer Dabbelt' via syzkaller
> > <syzkaller@googlegroups.com> wrote:
> >>
> >> On Fri, 19 Feb 2021 10:53:43 PST (-0800), dvyukov@google.com wrote:
> >>> On Fri, Feb 19, 2021 at 6:01 PM Alex Ghiti <alex@ghiti.fr> wrote:
> >>>>
> >>>> Hi Dmitry,
> >>>>
> >>>> Le 2/18/21 à 6:36 AM, Dmitry Vyukov a écrit :
> >>>>> On Thu, Feb 18, 2021 at 8:54 AM Alex Ghiti <alex@ghiti.fr> wrote:
> >>>>>>
> >>>>>> Hi Dmitry,
> >>>>>>
> >>>>>>> On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@ghiti.fr> wrote:
> >>>>>>>>
> >>>>>>>> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit :
> >>>>>>>>> On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@ghiti.fr> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi Dmitry,
> >>>>>>>>>>
> >>>>>>>>>> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit :
> >>>>>>>>>>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> >>>>>>>>>>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your
> >>>>>>>>>>>>>> issue: I built a kernel on top of the branch riscv/fixes using
> >>>>>>>>>>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config
> >>>>>>>>>>>>>> and Buildroot 2020.11. I have the warnings regarding the use of
> >>>>>>>>>>>>>> __virt_to_phys on wrong addresses (but that's normal since this function
> >>>>>>>>>>>>>> is used in virt_addr_valid) but not the segfaults you describe.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi Alex,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Let me try to rebuild buildroot image. Maybe there was something wrong
> >>>>>>>>>>>>> with my build, though, I did 'make clean' before doing. But at the
> >>>>>>>>>>>>> same time it worked back in June...
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a
> >>>>>>>>>>>>> syzbot instance on riscv. If there a WARNING during boot then the
> >>>>>>>>>>>>> kernel will be marked as broken. No further testing will happen.
> >>>>>>>>>>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or
> >>>>>>>>>>>>> replace it with pr_err.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I've localized one issue with riscv/KASAN:
> >>>>>>>>>>>> KASAN breaks VDSO and that's I think the root cause of weird faults I
> >>>>>>>>>>>> saw earlier. The following patch fixes it.
> >>>>>>>>>>>> Could somebody please upstream this fix? I don't know how to add/run
> >>>>>>>>>>>> tests for this.
> >>>>>>>>>>>> Thanks
> >>>>>>>>>>>>
> >>>>>>>>>>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
> >>>>>>>>>>>> index 0cfd6da784f84..cf3a383c1799d 100644
> >>>>>>>>>>>> --- a/arch/riscv/kernel/vdso/Makefile
> >>>>>>>>>>>> +++ b/arch/riscv/kernel/vdso/Makefile
> >>>>>>>>>>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
> >>>>>>>>>>>>       # Disable gcov profiling for VDSO code
> >>>>>>>>>>>>       GCOV_PROFILE := n
> >>>>>>>>>>>>       KCOV_INSTRUMENT := n
> >>>>>>>>>>>> +KASAN_SANITIZE := n
> >>>>>>>>>>>>
> >>>>>>>>>>>>       # Force dependency
> >>>>>>>>>>>>       $(obj)/vdso.o: $(obj)/vdso.so
> >>>>>>>>>>
> >>>>>>>>>> What's weird is that I don't have any issue without this patch with the
> >>>>>>>>>> following config whereas it indeed seems required for KASAN. But when
> >>>>>>>>>> looking at the segfaults you got earlier, the segfault address is 0xbb0
> >>>>>>>>>> and the cause is an instruction page fault: this address is the PLT base
> >>>>>>>>>> address in vdso.so and an instruction page fault would mean that someone
> >>>>>>>>>> tried to jump at this address, which is weird. At first sight, that does
> >>>>>>>>>> not seem related to your patch above, but clearly I may be wrong.
> >>>>>>>>>>
> >>>>>>>>>> Tobias, did you observe the same segfaults as Dmitry ?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I noticed that not all buildroot images use VDSO, it seems to be
> >>>>>>>>> dependent on libc settings (at least I think I changed it in the
> >>>>>>>>> past).
> >>>>>>>>
> >>>>>>>> Ok, I used uClibc but then when using glibc, I have the same segfaults,
> >>>>>>>> only when KASAN is enabled. And your patch fixes the problem. I will try
> >>>>>>>> to take a look later to better understand the problem.
> >>>>>>>>
> >>>>>>>>> I also booted an image completely successfully including dhcpd/sshd
> >>>>>>>>> start, but then my executable crashed in clock_gettime. The executable
> >>>>>>>>> was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static"
> >>>>>>>>> (10.2.1).
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>> Second issue I am seeing seems to be related to text segment size.
> >>>>>>>>>>> I check out v5.11 and use this config:
> >>>>>>>>>>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178
> >>>>>>>>>>
> >>>>>>>>>> This config gave my laptop a hard time ! Finally I was able to boot
> >>>>>>>>>> correctly to userspace, but I realized I used my sv48 branch...Either I
> >>>>>>>>>> fixed your issue along the way or I can't reproduce it, I'll give it a
> >>>>>>>>>> try tomorrow.
> >>>>>>>>>
> >>>>>>>>> Where is your branch? I could also test in my setup on your branch.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2
> >>>>>>>> here: https://github.com/AlexGhiti/riscv-linux.git
> >>>>>>>
> >>>>>>> No, it does not work for me.
> >>>>>>>
> >>>>>>> Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD,
> >>>>>>> int/alex/riscv_kernel_end_of_address_space_v2)
> >>>>>>> Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt
> >>>>>>>
> >>>>>>> riscv64-linux-gnu-gcc -v
> >>>>>>> gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1)
> >>>>>>>
> >>>>>>> qemu-system-riscv64 --version
> >>>>>>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3)
> >>>>>>>
> >>>>>>> qemu-system-riscv64 \
> >>>>>>> -machine virt -smp 2 -m 2G \
> >>>>>>> -device virtio-blk-device,drive=hd0 \
> >>>>>>> -drive file=image-riscv64,if=none,format=raw,id=hd0 \
> >>>>>>> -kernel arch/riscv/boot/Image \
> >>>>>>> -nographic \
> >>>>>>> -device virtio-rng-device,rng=rng0 -object
> >>>>>>> rng-random,filename=/dev/urandom,id=rng0 \
> >>>>>>> -netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
> >>>>>>> virtio-net-device,netdev=net0 \
> >>>>>>> -append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
> >>>>>>> panic_on_warn=1 panic=86400 earlycon"
> >>>>>>
> >>>>>> It still works for me but I had to disable CONFIG_DEBUG_INFO_BTF (I
> >>>>>> don't think that changes anything at runtime). But your above command
> >>>>>> line does not work for me as it appears you do not load any firmware, if
> >>>>>> I add -bios images/fw_jump.elf, it works. But then I don't know where
> >>>>>> your opensbi output below comes from...
> >>>>>>
> >>>>>> And regarding your issue with calling clock_gettime 'directly' compared
> >>>>>> to using the syscall, I have the same consistent output from both calls.
> >>>>>>
> >>>>>> I have an older gcc (9.3.0) and the same qemu. I think what is missing
> >>>>>> here is your buildroot config, so that we have the exact same
> >>>>>> environment: could you post your buildroot config as well ?
> >>>>>
> >>>>> I don't think the image is relevant because I don't even get to kernel
> >>>>> code. If the kernel will complain about no init later, that's fine.
> >>>>> Re bios, this version of qemu already has OpenSBI bios builtin, you
> >>>>> can pass -bios default, but that's, well, the default :)
> >>>>> Here are more reproducible repro instructions that capture gcc and
> >>>>> qemu. I think gcc version may be potentially relevant as I suspect
> >>>>> code size.
> >>>>>
> >>>>>
> >>>>> curl https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt
> >>>>>> $KERNEL_SRC/.config
> >>>>> docker pull gcr.io/syzkaller/syzbot
> >>>>> docker run -it -v $KERNEL_SRC:/kernel gcr.io/syzkaller/syzbot
> >>>>> cd /kernel
> >>>>> make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- olddefconfig
> >>>>> make -j72 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-
> >>>>> qemu-system-riscv64 -machine virt -smp 2 -m 4G -kernel
> >>>>> arch/riscv/boot/Image -nographic -append "earlycon earlyprintk=serial
> >>>>> console=ttyS0"
> >>>>> [this does not, only OpenSBI output]
> >>>>>
> >>>>
> >>>> Indeed the issue was code size, please find the fix below. I will send a
> >>>> proper patch once I made sure the fix is the right one, but I'm pretty
> >>>> confident, there's no reason to limit the mapping size to 128MB whereas
> >>>> we have a whole pgdir.
> >>>
> >>> Great you get to the bottom of this!
> >>> Riscv kernels are going to be YUGE!
> >>
> >> IIRC I tried that a while ago and it didn't work.  It's possible I was just
> >> running into some other bug, but I'm just build testing allyesconfig as opposed
> >> to boot testing it.
> >>
> >> If you've got a setup that does boot I'm happy to take a patch, though.  It'll
> >> at least be one step forward.
> >
> >
> >
> > OK, it's getting better.
>
> Nice :)
>
> > The next issue is called "512 bytes should be enough for everyone!" :)
> > https://elixir.bootlin.com/linux/v5.12-rc2/source/include/uapi/asm-generic/setup.h#L5
> > Most other arches redefine it to something bigger:
> > https://elixir.bootlin.com/linux/v5.12-rc2/source/arch/s390/include/uapi/asm/setup.h#L10
> > even arm32 redefines it.
> > I am not sure the default is even reasonable anymore.
>
> Some archs override this value to 256, but git blame shows this is
> (very) old. I agree that 512 as default seems low.
>
> > Failure mode is
> > also not nice (silent truncation).
>
> Agreed, maybe we could still have the default value and checks the
> terminating null character is somewhere and bugs if not, I'll take a look.
>
> > We are trying to pass this:
> >
> > earlyprintk=serial oops=panic nmi_watchdog=panic panic=86400
> > net.ifnames=0 sysctl.kernel.hung_task_all_cpu_backtrace=1
> > ima_policy=tcb kvm-intel.nested=1 nf-conntrack-ftp.ports=20000
> > nf-conntrack-tftp.ports=20000 nf-conntrack-sip.ports=20000
> > nf-conntrack-irc.ports=20000 nf-conntrack-sane.ports=20000
> > vivid.n_devs=16 vivid.multiplanar=1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2
> > netrom.nr_ndevs=16 rose.rose_ndevs=16 spec_store_bypass_disable=prctl
> > numa=fake=2 nopcid dummy_hcd.num=8 binder.debug_mask=0
> > rcupdate.rcu_expedited=1 watchdog_thresh=165
> > workqueue.watchdog_thresh=420 panic_on_warn=1
> >
> > The last part gets truncated and we are getting false workqueue watchdog stalls.
> >
> > Could you please increase it?
>
> I will propose a patchset that increases the default value and cleans
> archs up accordingly too.

I've worked around the command line length for now by reducing command
line size.
The syzbot instance is alive and kicking now:
https://syzkaller.appspot.com/upstream?manager=ci-qemu2-riscv64

with the first issue found:
https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69
https://lore.kernel.org/lkml/000000000000b74f1b05bd316729@google.com/T/#u

in my local testing it was happening very frequently, so until it's
fixed, the instance probably won't find lots of other issues.

FTR, the instance config is stored here:
https://github.com/google/syzkaller/blob/master/dashboard/config/linux/upstream-riscv64-kasan.config

The instance uses qemu emulation and heavy debug configs, so it's
quite slow and it makes sense to target it at riscv-specific parts of
the kernel (rather than stress generic subsystems that are already
stressed on x86).
So the question is: what riscv-specific parts are there that we reach?
Can you think of any qemu flags (cpu features, device emulation,
pstore, etc)? Any kernel parts that we may be missing?

Thanks

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2021-03-10 17:26 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-25 14:55 riscv+KASAN does not boot Dmitry Vyukov
2020-12-25 16:58 ` Andreas Schwab
2020-12-25 17:13   ` Dmitry Vyukov
2021-01-14  4:57     ` Palmer Dabbelt
2021-01-14  9:23       ` Dmitry Vyukov
2021-01-14 10:24         ` Dmitry Vyukov
2021-01-14 11:24           ` Dmitry Vyukov
2021-01-18 14:53           ` Tobias Klauser
2021-01-18 15:05             ` Dmitry Vyukov
2021-01-18 15:43               ` Dmitry Vyukov
2021-01-29  7:45                 ` Alex Ghiti
     [not found]                   ` <CACT4Y+adSjve7bXRPh5UybCQx6ubOUu5RbwuT620wdcxHzVYJg@mail.gmail.com>
2021-02-16 11:17                     ` Dmitry Vyukov
2021-02-16 11:25                       ` Dmitry Vyukov
2021-02-16 13:45                         ` Dmitry Vyukov
2021-02-16 20:42                         ` Alex Ghiti
2021-02-17  4:42                           ` Dmitry Vyukov
2021-02-17 16:36                             ` Alex Ghiti
2021-02-17 17:34                               ` Dmitry Vyukov
2021-02-18  7:54                                 ` Alex Ghiti
2021-02-18 11:36                                   ` Dmitry Vyukov
2021-02-19 17:01                                     ` Alex Ghiti
2021-02-19 18:53                                       ` Dmitry Vyukov
2021-02-19 22:26                                         ` Palmer Dabbelt
2021-03-09 17:11                                           ` Dmitry Vyukov
2021-03-09 19:49                                             ` Alex Ghiti
2021-03-10 17:25                                               ` Dmitry Vyukov
2021-02-16 17:35                       ` Tobias Klauser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).