From: Aurelien Jarno <aurelien@aurel32.net>
To: "Auer, Lukas" <lukas.auer@aisec.fraunhofer.de>
Cc: "david.abdurachmanov@sifive.com" <david.abdurachmanov@sifive.com>,
"Atish.Patra@wdc.com" <Atish.Patra@wdc.com>,
"linux-riscv@lists.infradead.org"
<linux-riscv@lists.infradead.org>
Subject: Re: Fail to bring hart online on HiFive Unleashed
Date: Thu, 17 Oct 2019 22:42:17 +0200 [thread overview]
Message-ID: <20191017204217.GA11023@aurel32.net> (raw)
In-Reply-To: <6e42e45b9af6467bb72eb4880ae9bf6b5b4f67cd.camel@aisec.fraunhofer.de>
On 2019-10-16 20:49, Auer, Lukas wrote:
> On Wed, 2019-10-16 at 00:22 +0200, Aurelien Jarno wrote:
> > On 2019-10-15 21:38, Auer, Lukas wrote:
> > > On Thu, 2019-10-10 at 21:58 +0200, Aurelien Jarno wrote:
> > > > On 2019-10-09 01:34, Atish Patra wrote:
> > > > > On Tue, 2019-10-08 at 08:33 +0200, Aurelien Jarno wrote:
> > > > > > Le 8 octobre 2019 08:14:58 GMT+02:00, David Abdurachmanov <
> > > > > > david.abdurachmanov@sifive.com> a écrit :
> > > > > > > On Tue, Oct 8, 2019 at 7:30 AM Aurelien Jarno <aurelien@aurel32.net
> > > > > > > wrote:
> > > > > > > > On 2019-10-07 22:19, Atish Patra wrote:
> > > > > > > > > Thanks for the detailed analysis. Can you please keep me and
> > > > > > > > > david
> > > > > > > in
> > > > > > > > > cc when you report the issue to U-boot ?
> > > > > > > >
> > > > > > > > Yep. I have progressed a bit on that, and now I am not convinced
> > > > > > > > it's
> > > > > > > an
> > > > > > > > U-boot issue, it can be a GCC issue.
> > > > > > > >
> > > > > > > > Here are the conditions to reproduce the bug:
> > > > > > > > - U-boot runs on hart 1, 2 or 3
> > > > > > > > - the autoboot process is not interrupted
> > > > > > > > - extlinux is used to boot the kernel
> > > > > > > > - arch/riscv/lib/bootm.c is compiled with GCC 9 (works fine with
> > > > > > > > GCC
> > > > > > > 8)
> > > > > > > > When the problem happens, the missing hart actually ends its
> > > > > > > execution
> > > > > > > > in an illegal instruction trap trying to execute the FDT (I only
> > > > > > > noticed
> > > > > > > > that recently as the message was hidden by the use of
> > > > > > > > earlycon=sbi):
> > > > > > > >
> > > > > > > > > SiFive FSBL: 2018-03-20
> > > > > > > > > HiFive-U serial #: 00000246
> > > > > > > > >
> > > > > > > > > OpenSBI v0.4-50-g30f09fb (Oct 6 2019 21:58:05)
> > > > > > > > > ____ _____ ____ _____
> > > > > > > > > / __ \ / ____| _ \_ _|
> > > > > > > > > | | | |_ __ ___ _ __ | (___ | |_) || |
> > > > > > > > > | | | | '_ \ / _ \ '_ \ \___ \| _ < | |
> > > > > > > > > | |__| | |_) | __/ | | |____) | |_) || |_
> > > > > > > > > \____/| .__/ \___|_| |_|_____/|____/_____|
> > > > > > > > > | |
> > > > > > > > > |_|
> > > > > > > > >
> > > > > > > > > Platform Name : SiFive Freedom U540
> > > > > > > > > Platform HART Features : RV64ACDFIMSU
> > > > > > > > > Platform Max HARTs : 5
> > > > > > > > > Current Hart : 2
> > > > > > > > > Firmware Base : 0x80000000
> > > > > > > > > Firmware Size : 104 KB
> > > > > > > > > Runtime SBI Version : 0.2
> > > > > > > > >
> > > > > > > > > PMP0: 0x0000000080000000-0x000000008001ffff (A)
> > > > > > > > > PMP1: 0x0000000000000000-0x0000007fffffffff (A,R,W,X)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > U-Boot 2019.10-rc4-00037-gdac51e9aaf-dirty (Oct 06 2019 -
> > > > > > > > > 21:56:51
> > > > > > > +0000)
> > > > > > > > > CPU: rv64imafdc
> > > > > > > > > Model: SiFive HiFive Unleashed A00
> > > > > > > > > DRAM: 8 GiB
> > > > > > > > >
> > > > > > > > > MMC: spi@10050000:mmc@0: 0
> > > > > > > > > In: serial@10010000
> > > > > > > > > Out: serial@10010000
> > > > > > > > > Err: serial@10010000
> > > > > > > > > Net: eth0: ethernet@10090000
> > > > > > > > > Hit any key to stop autoboot: 0
> > > > > > > > > switch to partitions #0, OK
> > > > > > > > > mmc0 is current device
> > > > > > > > > Scanning mmc 0:2...
> > > > > > > > > Found /boot/extlinux/extlinux.conf
> > > > > > > > > Retrieving file: /boot/extlinux/extlinux.conf
> > > > > > > > > 510 bytes read in 5 ms (99.6 KiB/s)
> > > > > > > > > U-Boot menu
> > > > > > > > > 1: kernel 5.3.4
> > > > > > > > > 2: Debian GNU/Linux kernel 5.3.0-trunk-riscv64
> > > > > > > > > Enter choice: 1
> > > > > > > > > 1: kernel 5.3.4
> > > > > > > > > Retrieving file: /boot/vmlinux-5.3.4
> > > > > > > > > 9486076 bytes read in 4813 ms (1.9 MiB/s)
> > > > > > > > > append: root=/dev/mmcblk0p2 rw console=ttySIF0 rootwait
> > > > > > > > > Retrieving file: /boot/hifive-unleashed-a00.dtb
> > > > > > > > > 6088 bytes read in 7 ms (848.6 KiB/s)
> > > > > > > > > ## Flattened Device Tree blob at 88000000
> > > > > > > > > Booting using the fdt blob at 0x88000000
> > > > > > > > > Using Device Tree in place at 0000000088000000, end
> > > > > > > 00000000880047c7
> > > > > > > > > Starting kernel ...
> > > > > > > > >
> > > > > > > > > exception code: 2 , Illegal instruction , epc , ra 88000004
> > > > > > > 88000000
> > > > > > > > > ### ERROR ### Please RESET the board ###
> > > > > > >
> > > > > > > I think, that's the same issue I had (or still have) a week ago.
> > > > > > > Just reminder that kernel 5.3 introduced a 64-byte header (thus no
> > > > > > > need to wrap kernel) at least for Image target. Thus it's booti
> > > > > > > that
> > > > > > > boots the kernel on U-Boot side.
> > > > > > > Thus the 1st instruction of that header is "j 0x40" (to the
> > > > > > > beginning
> > > > > > > of the actual kernel). And 88000004 would definitely hold an
> > > > > > > illegal
> > > > > > > instruction.
> > > > > > >
> > > > > > > 0000000000000000 <.data>:
> > > > > > > 0: 81a0 j 0x40
> > > > > > > 2: 0000 unimp
> > > > > > > 4: 0000 unimp
> > > > > > > 6: 0100 nop
> > > > > > > [..]
> > > > > >
> > > > > > Hmm that's the beginning of the kernel code. The address 88000004
> > > > > > actually corresponds to the FDT. So the hart ending up in a trap
> > > > > > actually tries to boot the FDT instead of the kernel.
> > > > > >
> > > > >
> > > > > Do you see the issue if you manually use bootm instead of extlinux?
> > > > >
> > > > > => bootm $kernel_addr_r - $fdt_addr_r
> > > > >
> > > > > This is a probably not related as bootm is jumping to wrong location
> > > > > for some reason. However, it may be worth a shot as it fixes fdt
> > > > > corruption.
> > > >
> > > > I have just tested, and it doesn't work. On the other hand I have try to
> > > > run that manually, and interrupting the boot process usually hides the
> > > > problem.
> > > >
> > >
> > > I tried to reproduce the issue today, but was not able to. If you can
> > > upload the relevant files somewhere, I can retry it with them. I have
> > > also added information on the boot flow in U-Boot below in hopes that
> > > it is helpful for debugging.
> >
> > You can find the files there:
> > https://temp.aurel32.net/hifive-opensbi-uboot/
> >
> > fw_payload.bin contains the OpenSBI + U-Boot payload to be copied to the
> > first partition of the SD card. The boot.tar.gz contains the /boot
> > directory (kernel, fdt and extlinux.conf) and has to be put on the
> > second partition of the SD card. Note that this partition should have
> > the GPT boot flag enabled for extlinux to work.
> >
> > I haven't looked more at the issue recently now that I have found that
> > using GCC 8 is a fix/workaround. Therefore those files are from ~10 days
> > ago. I will try to do more tests during the week-end.
> >
>
> Thanks for the files, I was able to reproduce the issue now. Seems like
> it is caused by a stack overflow. When smp_call_function() is called
> during bootm, the stack of the main hart overflows into the stack of
> one of the other harts. The return address of the main hart now lies
> within the stack of the other hart. Once that hart gets woken by the
> IPI it overwrites the return address, in our case with 0x88000000. This
> will cause the illegal instruction trap once the main hart returns.
> This also explains, why the problem does not occur when the main hart
> is hart 4, since its stack is at the bottom and therefore can't
> overflow into one of the other stacks.
>
> Increasing the stack size (CONFIG_STACK_SIZE_SHIFT) to 14 fixes the
> problem. I'll double check that there's nothing else causing an issue
> and will then send a patch to increase the stack size.
Thanks a lot for debugging that. I have just tried, I confirm it fixes
the issue for me.
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2019-10-17 20:42 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-03 20:07 Fail to bring hart online on HiFive Unleashed Aurelien Jarno
2019-10-03 23:13 ` Atish Patra
2019-10-03 23:16 ` Troy Benjegerdes
2019-10-05 10:25 ` Aurelien Jarno
2019-10-05 10:54 ` Aurelien Jarno
2019-10-06 12:28 ` Aurelien Jarno
2019-10-07 22:19 ` Atish Patra
2019-10-08 4:30 ` Aurelien Jarno
2019-10-08 6:14 ` David Abdurachmanov
2019-10-08 6:33 ` Aurelien Jarno
2019-10-08 7:17 ` Anup Patel
2019-10-08 22:21 ` Troy Benjegerdes
2019-10-10 19:59 ` Aurelien Jarno
2019-10-11 14:05 ` David Abdurachmanov
2019-10-09 1:34 ` Atish Patra
2019-10-10 19:58 ` Aurelien Jarno
2019-10-15 21:38 ` Auer, Lukas
2019-10-15 22:22 ` Aurelien Jarno
2019-10-16 20:49 ` Auer, Lukas
2019-10-17 15:45 ` David Abdurachmanov
2019-10-17 20:42 ` Aurelien Jarno [this message]
2019-10-20 18:57 ` Auer, Lukas
2019-10-08 7:06 ` Anup Patel
2019-10-14 9:23 ` Andreas Schwab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191017204217.GA11023@aurel32.net \
--to=aurelien@aurel32.net \
--cc=Atish.Patra@wdc.com \
--cc=david.abdurachmanov@sifive.com \
--cc=linux-riscv@lists.infradead.org \
--cc=lukas.auer@aisec.fraunhofer.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).