linux-riscv.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: "Auer, Lukas" <lukas.auer@aisec.fraunhofer.de>
To: "aurelien@aurel32.net" <aurelien@aurel32.net>,
	"Atish.Patra@wdc.com" <Atish.Patra@wdc.com>
Cc: "david.abdurachmanov@sifive.com" <david.abdurachmanov@sifive.com>,
	"linux-riscv@lists.infradead.org"
	<linux-riscv@lists.infradead.org>
Subject: Re: Fail to bring hart online on HiFive Unleashed
Date: Tue, 15 Oct 2019 21:38:25 +0000	[thread overview]
Message-ID: <f2a467d2dfd1828533fee8a8edf7eac51d8c1d84.camel@aisec.fraunhofer.de> (raw)
In-Reply-To: <20191010195851.GA10676@aurel32.net>

On Thu, 2019-10-10 at 21:58 +0200, Aurelien Jarno wrote:
> On 2019-10-09 01:34, Atish Patra wrote:
> > On Tue, 2019-10-08 at 08:33 +0200, Aurelien Jarno wrote:
> > > Le 8 octobre 2019 08:14:58 GMT+02:00, David Abdurachmanov <
> > > david.abdurachmanov@sifive.com> a écrit :
> > > > On Tue, Oct 8, 2019 at 7:30 AM Aurelien Jarno <aurelien@aurel32.net
> > > > wrote:
> > > > > On 2019-10-07 22:19, Atish Patra wrote:
> > > > > > Thanks for the detailed analysis. Can you please keep me and
> > > > > > david
> > > > in
> > > > > > cc when you report the issue to U-boot ?
> > > > > 
> > > > > Yep. I have progressed a bit on that, and now I am not convinced
> > > > > it's
> > > > an
> > > > > U-boot issue, it can be a GCC issue.
> > > > > 
> > > > > Here are the conditions to reproduce the bug:
> > > > > - U-boot runs on hart 1, 2 or 3
> > > > > - the autoboot process is not interrupted
> > > > > - extlinux is used to boot the kernel
> > > > > - arch/riscv/lib/bootm.c is compiled with GCC 9 (works fine with
> > > > > GCC
> > > > 8)
> > > > > When the problem happens, the missing hart actually ends its
> > > > execution
> > > > > in an illegal instruction trap trying to execute the FDT (I only
> > > > noticed
> > > > > that recently as the message was hidden by the use of
> > > > > earlycon=sbi):
> > > > > 
> > > > > > SiFive FSBL:       2018-03-20
> > > > > > HiFive-U serial #: 00000246
> > > > > > 
> > > > > > OpenSBI v0.4-50-g30f09fb (Oct  6 2019 21:58:05)
> > > > > >    ____                    _____ ____ _____
> > > > > >   / __ \                  / ____|  _ \_   _|
> > > > > >  | |  | |_ __   ___ _ __ | (___ | |_) || |
> > > > > >  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
> > > > > >  | |__| | |_) |  __/ | | |____) | |_) || |_
> > > > > >   \____/| .__/ \___|_| |_|_____/|____/_____|
> > > > > >         | |
> > > > > >         |_|
> > > > > > 
> > > > > > Platform Name          : SiFive Freedom U540
> > > > > > Platform HART Features : RV64ACDFIMSU
> > > > > > Platform Max HARTs     : 5
> > > > > > Current Hart           : 2
> > > > > > Firmware Base          : 0x80000000
> > > > > > Firmware Size          : 104 KB
> > > > > > Runtime SBI Version    : 0.2
> > > > > > 
> > > > > > PMP0: 0x0000000080000000-0x000000008001ffff (A)
> > > > > > PMP1: 0x0000000000000000-0x0000007fffffffff (A,R,W,X)
> > > > > > 
> > > > > > 
> > > > > > U-Boot 2019.10-rc4-00037-gdac51e9aaf-dirty (Oct 06 2019 -
> > > > > > 21:56:51
> > > > +0000)
> > > > > > CPU:   rv64imafdc
> > > > > > Model: SiFive HiFive Unleashed A00
> > > > > > DRAM:  8 GiB
> > > > > > 
> > > > > > MMC:   spi@10050000:mmc@0: 0
> > > > > > In:    serial@10010000
> > > > > > Out:   serial@10010000
> > > > > > Err:   serial@10010000
> > > > > > Net:   eth0: ethernet@10090000
> > > > > > Hit any key to stop autoboot:  0
> > > > > > switch to partitions #0, OK
> > > > > > mmc0 is current device
> > > > > > Scanning mmc 0:2...
> > > > > > Found /boot/extlinux/extlinux.conf
> > > > > > Retrieving file: /boot/extlinux/extlinux.conf
> > > > > > 510 bytes read in 5 ms (99.6 KiB/s)
> > > > > > U-Boot menu
> > > > > > 1:      kernel 5.3.4
> > > > > > 2:      Debian GNU/Linux kernel 5.3.0-trunk-riscv64
> > > > > > Enter choice: 1
> > > > > > 1:      kernel 5.3.4
> > > > > > Retrieving file: /boot/vmlinux-5.3.4
> > > > > > 9486076 bytes read in 4813 ms (1.9 MiB/s)
> > > > > > append: root=/dev/mmcblk0p2 rw console=ttySIF0 rootwait
> > > > > > Retrieving file: /boot/hifive-unleashed-a00.dtb
> > > > > > 6088 bytes read in 7 ms (848.6 KiB/s)
> > > > > > ## Flattened Device Tree blob at 88000000
> > > > > >    Booting using the fdt blob at 0x88000000
> > > > > >    Using Device Tree in place at 0000000088000000, end
> > > > 00000000880047c7
> > > > > > Starting kernel ...
> > > > > > 
> > > > > > exception code: 2 , Illegal instruction , epc  , ra 88000004
> > > > 88000000
> > > > > > ### ERROR ### Please RESET the board ###
> > > > 
> > > > I think, that's the same issue I had (or still have) a week ago.
> > > > Just reminder that kernel 5.3 introduced a 64-byte header (thus no
> > > > need to wrap kernel) at least for Image target. Thus it's booti
> > > > that
> > > > boots the kernel on U-Boot side.
> > > > Thus the 1st instruction of that header is "j 0x40" (to the
> > > > beginning
> > > > of the actual kernel).  And 88000004 would definitely hold an
> > > > illegal
> > > > instruction.
> > > > 
> > > > 0000000000000000 <.data>:
> > > > 0:       81a0                    j       0x40
> > > > 2:       0000                    unimp
> > > > 4:       0000                    unimp
> > > > 6:       0100                    nop
> > > > [..]
> > > 
> > > Hmm that's the beginning of the kernel code. The address 88000004
> > > actually corresponds to the FDT. So the hart ending up in a trap
> > > actually tries to boot the FDT instead of the kernel.
> > > 
> > 
> > Do you see the issue if you manually use bootm instead of extlinux?
> > 
> > => bootm $kernel_addr_r - $fdt_addr_r
> > 
> > This is a probably not related as bootm is jumping to wrong location
> > for some reason. However, it may be worth a shot as it fixes fdt
> > corruption. 
> 
> I have just tested, and it doesn't work. On the other hand I have try to
> run that manually, and interrupting the boot process usually hides the
> problem.
> 

I tried to reproduce the issue today, but was not able to. If you can
upload the relevant files somewhere, I can retry it with them. I have
also added information on the boot flow in U-Boot below in hopes that
it is helpful for debugging.

U-Boot divides the harts in the system into the main hart (running   
U-Boot) and the secondary harts (all others). The main hart is
responsible for notifying the secondary harts of where to jump to. To
communicate with them, it uses IPIs and the U-Boot global data data
structure (register gp stores a pointer to it), located at the end of
RAM. Other variables in global data that could be helpful for debugging
are arch.boot_hart (the main hart running U-Boot) and
arch.available_harts (a bitmask of all harts that have entered U-Boot). 
They are defined in 
https://gitlab.denx.de/u-boot/u-boot/blob/master/arch/riscv/include/asm/global_data.h
.

Booting Linux will usually use the bootm command / functions at some
point. Before jumping to the kernel, the main hart instructs the
secondary harts to jump to the kernel image. The relevant code for this
is at 
https://gitlab.denx.de/u-boot/u-boot/blob/master/arch/riscv/lib/bootm.c#L101
. This will send an IPI to all secondary harts. They are received in
arch/riscv/cpu/start.S and are eventually handled in handle_ipi() at 
https://gitlab.denx.de/u-boot/u-boot/blob/master/arch/riscv/lib/smp.c#L86
.

What I find strange with the error you are seeing is that one of the
harts is jumping to the device tree binary. As you mentioned, it could
be that we have a race condition somewhere, for example causing
something to be overwritten in global data while some harts are still
running U-Boot. However, I would expect more or less random data and
not the address of the device tree binary in that case. For that reason
I would tend to rule out this scenario. Since only one hart is failing
to enter Linux, I assume that all secondary harts successfully boot
Linux and only the main hart is having problems. That would mean that
something is going wrong in arch/riscv/lib/bootm.c .

Andreas also brought up a good point. We did have a similar problem
before, which was caused by insufficient initialization. The workaround
to fix this was to use the power switch instead of the reset button to
reset the board. I haven't tested it, but I believe initialization in
OpenSBI should be better now, meaning that this might not be a problem
anymore. However, there might also be a similar problem in U-Boot.

Regards,
Lukas
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2019-10-15 21:38 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-03 20:07 Fail to bring hart online on HiFive Unleashed Aurelien Jarno
2019-10-03 23:13 ` Atish Patra
2019-10-03 23:16   ` Troy Benjegerdes
2019-10-05 10:25   ` Aurelien Jarno
2019-10-05 10:54     ` Aurelien Jarno
2019-10-06 12:28     ` Aurelien Jarno
2019-10-07 22:19       ` Atish Patra
2019-10-08  4:30         ` Aurelien Jarno
2019-10-08  6:14           ` David Abdurachmanov
2019-10-08  6:33             ` Aurelien Jarno
2019-10-08  7:17               ` Anup Patel
2019-10-08 22:21               ` Troy Benjegerdes
2019-10-10 19:59                 ` Aurelien Jarno
2019-10-11 14:05                   ` David Abdurachmanov
2019-10-09  1:34               ` Atish Patra
2019-10-10 19:58                 ` Aurelien Jarno
2019-10-15 21:38                   ` Auer, Lukas [this message]
2019-10-15 22:22                     ` Aurelien Jarno
2019-10-16 20:49                       ` Auer, Lukas
2019-10-17 15:45                         ` David Abdurachmanov
2019-10-17 20:42                         ` Aurelien Jarno
2019-10-20 18:57                           ` Auer, Lukas
2019-10-08  7:06           ` Anup Patel
2019-10-14  9:23 ` Andreas Schwab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f2a467d2dfd1828533fee8a8edf7eac51d8c1d84.camel@aisec.fraunhofer.de \
    --to=lukas.auer@aisec.fraunhofer.de \
    --cc=Atish.Patra@wdc.com \
    --cc=aurelien@aurel32.net \
    --cc=david.abdurachmanov@sifive.com \
    --cc=linux-riscv@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).