From: "Auer, Lukas" <lukas.auer@aisec.fraunhofer.de>
To: "aurelien@aurel32.net" <aurelien@aurel32.net>,
"Atish.Patra@wdc.com" <Atish.Patra@wdc.com>
Cc: "david.abdurachmanov@sifive.com" <david.abdurachmanov@sifive.com>,
"linux-riscv@lists.infradead.org"
<linux-riscv@lists.infradead.org>
Subject: Re: Fail to bring hart online on HiFive Unleashed
Date: Tue, 15 Oct 2019 21:38:25 +0000 [thread overview]
Message-ID: <f2a467d2dfd1828533fee8a8edf7eac51d8c1d84.camel@aisec.fraunhofer.de> (raw)
In-Reply-To: <20191010195851.GA10676@aurel32.net>
On Thu, 2019-10-10 at 21:58 +0200, Aurelien Jarno wrote:
> On 2019-10-09 01:34, Atish Patra wrote:
> > On Tue, 2019-10-08 at 08:33 +0200, Aurelien Jarno wrote:
> > > Le 8 octobre 2019 08:14:58 GMT+02:00, David Abdurachmanov <
> > > david.abdurachmanov@sifive.com> a écrit :
> > > > On Tue, Oct 8, 2019 at 7:30 AM Aurelien Jarno <aurelien@aurel32.net
> > > > wrote:
> > > > > On 2019-10-07 22:19, Atish Patra wrote:
> > > > > > Thanks for the detailed analysis. Can you please keep me and
> > > > > > david
> > > > in
> > > > > > cc when you report the issue to U-boot ?
> > > > >
> > > > > Yep. I have progressed a bit on that, and now I am not convinced
> > > > > it's
> > > > an
> > > > > U-boot issue, it can be a GCC issue.
> > > > >
> > > > > Here are the conditions to reproduce the bug:
> > > > > - U-boot runs on hart 1, 2 or 3
> > > > > - the autoboot process is not interrupted
> > > > > - extlinux is used to boot the kernel
> > > > > - arch/riscv/lib/bootm.c is compiled with GCC 9 (works fine with
> > > > > GCC
> > > > 8)
> > > > > When the problem happens, the missing hart actually ends its
> > > > execution
> > > > > in an illegal instruction trap trying to execute the FDT (I only
> > > > noticed
> > > > > that recently as the message was hidden by the use of
> > > > > earlycon=sbi):
> > > > >
> > > > > > SiFive FSBL: 2018-03-20
> > > > > > HiFive-U serial #: 00000246
> > > > > >
> > > > > > OpenSBI v0.4-50-g30f09fb (Oct 6 2019 21:58:05)
> > > > > > ____ _____ ____ _____
> > > > > > / __ \ / ____| _ \_ _|
> > > > > > | | | |_ __ ___ _ __ | (___ | |_) || |
> > > > > > | | | | '_ \ / _ \ '_ \ \___ \| _ < | |
> > > > > > | |__| | |_) | __/ | | |____) | |_) || |_
> > > > > > \____/| .__/ \___|_| |_|_____/|____/_____|
> > > > > > | |
> > > > > > |_|
> > > > > >
> > > > > > Platform Name : SiFive Freedom U540
> > > > > > Platform HART Features : RV64ACDFIMSU
> > > > > > Platform Max HARTs : 5
> > > > > > Current Hart : 2
> > > > > > Firmware Base : 0x80000000
> > > > > > Firmware Size : 104 KB
> > > > > > Runtime SBI Version : 0.2
> > > > > >
> > > > > > PMP0: 0x0000000080000000-0x000000008001ffff (A)
> > > > > > PMP1: 0x0000000000000000-0x0000007fffffffff (A,R,W,X)
> > > > > >
> > > > > >
> > > > > > U-Boot 2019.10-rc4-00037-gdac51e9aaf-dirty (Oct 06 2019 -
> > > > > > 21:56:51
> > > > +0000)
> > > > > > CPU: rv64imafdc
> > > > > > Model: SiFive HiFive Unleashed A00
> > > > > > DRAM: 8 GiB
> > > > > >
> > > > > > MMC: spi@10050000:mmc@0: 0
> > > > > > In: serial@10010000
> > > > > > Out: serial@10010000
> > > > > > Err: serial@10010000
> > > > > > Net: eth0: ethernet@10090000
> > > > > > Hit any key to stop autoboot: 0
> > > > > > switch to partitions #0, OK
> > > > > > mmc0 is current device
> > > > > > Scanning mmc 0:2...
> > > > > > Found /boot/extlinux/extlinux.conf
> > > > > > Retrieving file: /boot/extlinux/extlinux.conf
> > > > > > 510 bytes read in 5 ms (99.6 KiB/s)
> > > > > > U-Boot menu
> > > > > > 1: kernel 5.3.4
> > > > > > 2: Debian GNU/Linux kernel 5.3.0-trunk-riscv64
> > > > > > Enter choice: 1
> > > > > > 1: kernel 5.3.4
> > > > > > Retrieving file: /boot/vmlinux-5.3.4
> > > > > > 9486076 bytes read in 4813 ms (1.9 MiB/s)
> > > > > > append: root=/dev/mmcblk0p2 rw console=ttySIF0 rootwait
> > > > > > Retrieving file: /boot/hifive-unleashed-a00.dtb
> > > > > > 6088 bytes read in 7 ms (848.6 KiB/s)
> > > > > > ## Flattened Device Tree blob at 88000000
> > > > > > Booting using the fdt blob at 0x88000000
> > > > > > Using Device Tree in place at 0000000088000000, end
> > > > 00000000880047c7
> > > > > > Starting kernel ...
> > > > > >
> > > > > > exception code: 2 , Illegal instruction , epc , ra 88000004
> > > > 88000000
> > > > > > ### ERROR ### Please RESET the board ###
> > > >
> > > > I think, that's the same issue I had (or still have) a week ago.
> > > > Just reminder that kernel 5.3 introduced a 64-byte header (thus no
> > > > need to wrap kernel) at least for Image target. Thus it's booti
> > > > that
> > > > boots the kernel on U-Boot side.
> > > > Thus the 1st instruction of that header is "j 0x40" (to the
> > > > beginning
> > > > of the actual kernel). And 88000004 would definitely hold an
> > > > illegal
> > > > instruction.
> > > >
> > > > 0000000000000000 <.data>:
> > > > 0: 81a0 j 0x40
> > > > 2: 0000 unimp
> > > > 4: 0000 unimp
> > > > 6: 0100 nop
> > > > [..]
> > >
> > > Hmm that's the beginning of the kernel code. The address 88000004
> > > actually corresponds to the FDT. So the hart ending up in a trap
> > > actually tries to boot the FDT instead of the kernel.
> > >
> >
> > Do you see the issue if you manually use bootm instead of extlinux?
> >
> > => bootm $kernel_addr_r - $fdt_addr_r
> >
> > This is a probably not related as bootm is jumping to wrong location
> > for some reason. However, it may be worth a shot as it fixes fdt
> > corruption.
>
> I have just tested, and it doesn't work. On the other hand I have try to
> run that manually, and interrupting the boot process usually hides the
> problem.
>
I tried to reproduce the issue today, but was not able to. If you can
upload the relevant files somewhere, I can retry it with them. I have
also added information on the boot flow in U-Boot below in hopes that
it is helpful for debugging.
U-Boot divides the harts in the system into the main hart (running
U-Boot) and the secondary harts (all others). The main hart is
responsible for notifying the secondary harts of where to jump to. To
communicate with them, it uses IPIs and the U-Boot global data data
structure (register gp stores a pointer to it), located at the end of
RAM. Other variables in global data that could be helpful for debugging
are arch.boot_hart (the main hart running U-Boot) and
arch.available_harts (a bitmask of all harts that have entered U-Boot).
They are defined in
https://gitlab.denx.de/u-boot/u-boot/blob/master/arch/riscv/include/asm/global_data.h
.
Booting Linux will usually use the bootm command / functions at some
point. Before jumping to the kernel, the main hart instructs the
secondary harts to jump to the kernel image. The relevant code for this
is at
https://gitlab.denx.de/u-boot/u-boot/blob/master/arch/riscv/lib/bootm.c#L101
. This will send an IPI to all secondary harts. They are received in
arch/riscv/cpu/start.S and are eventually handled in handle_ipi() at
https://gitlab.denx.de/u-boot/u-boot/blob/master/arch/riscv/lib/smp.c#L86
.
What I find strange with the error you are seeing is that one of the
harts is jumping to the device tree binary. As you mentioned, it could
be that we have a race condition somewhere, for example causing
something to be overwritten in global data while some harts are still
running U-Boot. However, I would expect more or less random data and
not the address of the device tree binary in that case. For that reason
I would tend to rule out this scenario. Since only one hart is failing
to enter Linux, I assume that all secondary harts successfully boot
Linux and only the main hart is having problems. That would mean that
something is going wrong in arch/riscv/lib/bootm.c .
Andreas also brought up a good point. We did have a similar problem
before, which was caused by insufficient initialization. The workaround
to fix this was to use the power switch instead of the reset button to
reset the board. I haven't tested it, but I believe initialization in
OpenSBI should be better now, meaning that this might not be a problem
anymore. However, there might also be a similar problem in U-Boot.
Regards,
Lukas
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2019-10-15 21:38 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-03 20:07 Fail to bring hart online on HiFive Unleashed Aurelien Jarno
2019-10-03 23:13 ` Atish Patra
2019-10-03 23:16 ` Troy Benjegerdes
2019-10-05 10:25 ` Aurelien Jarno
2019-10-05 10:54 ` Aurelien Jarno
2019-10-06 12:28 ` Aurelien Jarno
2019-10-07 22:19 ` Atish Patra
2019-10-08 4:30 ` Aurelien Jarno
2019-10-08 6:14 ` David Abdurachmanov
2019-10-08 6:33 ` Aurelien Jarno
2019-10-08 7:17 ` Anup Patel
2019-10-08 22:21 ` Troy Benjegerdes
2019-10-10 19:59 ` Aurelien Jarno
2019-10-11 14:05 ` David Abdurachmanov
2019-10-09 1:34 ` Atish Patra
2019-10-10 19:58 ` Aurelien Jarno
2019-10-15 21:38 ` Auer, Lukas [this message]
2019-10-15 22:22 ` Aurelien Jarno
2019-10-16 20:49 ` Auer, Lukas
2019-10-17 15:45 ` David Abdurachmanov
2019-10-17 20:42 ` Aurelien Jarno
2019-10-20 18:57 ` Auer, Lukas
2019-10-08 7:06 ` Anup Patel
2019-10-14 9:23 ` Andreas Schwab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f2a467d2dfd1828533fee8a8edf7eac51d8c1d84.camel@aisec.fraunhofer.de \
--to=lukas.auer@aisec.fraunhofer.de \
--cc=Atish.Patra@wdc.com \
--cc=aurelien@aurel32.net \
--cc=david.abdurachmanov@sifive.com \
--cc=linux-riscv@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).