All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Auer, Lukas" <lukas.auer@aisec.fraunhofer.de>
To: "aurelien@aurel32.net" <aurelien@aurel32.net>,
	"Atish.Patra@wdc.com" <Atish.Patra@wdc.com>
Cc: "david.abdurachmanov@sifive.com" <david.abdurachmanov@sifive.com>,
	"linux-riscv@lists.infradead.org"
	<linux-riscv@lists.infradead.org>
Subject: Re: Fail to bring hart online on HiFive Unleashed
Date: Tue, 15 Oct 2019 21:38:25 +0000	[thread overview]
Message-ID: <f2a467d2dfd1828533fee8a8edf7eac51d8c1d84.camel@aisec.fraunhofer.de> (raw)
In-Reply-To: <20191010195851.GA10676@aurel32.net>

On Thu, 2019-10-10 at 21:58 +0200, Aurelien Jarno wrote:
> On 2019-10-09 01:34, Atish Patra wrote:
> > On Tue, 2019-10-08 at 08:33 +0200, Aurelien Jarno wrote:
> > > Le 8 octobre 2019 08:14:58 GMT+02:00, David Abdurachmanov <
> > > david.abdurachmanov@sifive.com> a écrit :
> > > > On Tue, Oct 8, 2019 at 7:30 AM Aurelien Jarno <aurelien@aurel32.net
> > > > wrote:
> > > > > On 2019-10-07 22:19, Atish Patra wrote:
> > > > > > Thanks for the detailed analysis. Can you please keep me and
> > > > > > david
> > > > in
> > > > > > cc when you report the issue to U-boot ?
> > > > > 
> > > > > Yep. I have progressed a bit on that, and now I am not convinced
> > > > > it's
> > > > an
> > > > > U-boot issue, it can be a GCC issue.
> > > > > 
> > > > > Here are the conditions to reproduce the bug:
> > > > > - U-boot runs on hart 1, 2 or 3
> > > > > - the autoboot process is not interrupted
> > > > > - extlinux is used to boot the kernel
> > > > > - arch/riscv/lib/bootm.c is compiled with GCC 9 (works fine with
> > > > > GCC
> > > > 8)
> > > > > When the problem happens, the missing hart actually ends its
> > > > execution
> > > > > in an illegal instruction trap trying to execute the FDT (I only
> > > > noticed
> > > > > that recently as the message was hidden by the use of
> > > > > earlycon=sbi):
> > > > > 
> > > > > > SiFive FSBL:       2018-03-20
> > > > > > HiFive-U serial #: 00000246
> > > > > > 
> > > > > > OpenSBI v0.4-50-g30f09fb (Oct  6 2019 21:58:05)
> > > > > >    ____                    _____ ____ _____
> > > > > >   / __ \                  / ____|  _ \_   _|
> > > > > >  | |  | |_ __   ___ _ __ | (___ | |_) || |
> > > > > >  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
> > > > > >  | |__| | |_) |  __/ | | |____) | |_) || |_
> > > > > >   \____/| .__/ \___|_| |_|_____/|____/_____|
> > > > > >         | |
> > > > > >         |_|
> > > > > > 
> > > > > > Platform Name          : SiFive Freedom U540
> > > > > > Platform HART Features : RV64ACDFIMSU
> > > > > > Platform Max HARTs     : 5
> > > > > > Current Hart           : 2
> > > > > > Firmware Base          : 0x80000000
> > > > > > Firmware Size          : 104 KB
> > > > > > Runtime SBI Version    : 0.2
> > > > > > 
> > > > > > PMP0: 0x0000000080000000-0x000000008001ffff (A)
> > > > > > PMP1: 0x0000000000000000-0x0000007fffffffff (A,R,W,X)
> > > > > > 
> > > > > > 
> > > > > > U-Boot 2019.10-rc4-00037-gdac51e9aaf-dirty (Oct 06 2019 -
> > > > > > 21:56:51
> > > > +0000)
> > > > > > CPU:   rv64imafdc
> > > > > > Model: SiFive HiFive Unleashed A00
> > > > > > DRAM:  8 GiB
> > > > > > 
> > > > > > MMC:   spi@10050000:mmc@0: 0
> > > > > > In:    serial@10010000
> > > > > > Out:   serial@10010000
> > > > > > Err:   serial@10010000
> > > > > > Net:   eth0: ethernet@10090000
> > > > > > Hit any key to stop autoboot:  0
> > > > > > switch to partitions #0, OK
> > > > > > mmc0 is current device
> > > > > > Scanning mmc 0:2...
> > > > > > Found /boot/extlinux/extlinux.conf
> > > > > > Retrieving file: /boot/extlinux/extlinux.conf
> > > > > > 510 bytes read in 5 ms (99.6 KiB/s)
> > > > > > U-Boot menu
> > > > > > 1:      kernel 5.3.4
> > > > > > 2:      Debian GNU/Linux kernel 5.3.0-trunk-riscv64
> > > > > > Enter choice: 1
> > > > > > 1:      kernel 5.3.4
> > > > > > Retrieving file: /boot/vmlinux-5.3.4
> > > > > > 9486076 bytes read in 4813 ms (1.9 MiB/s)
> > > > > > append: root=/dev/mmcblk0p2 rw console=ttySIF0 rootwait
> > > > > > Retrieving file: /boot/hifive-unleashed-a00.dtb
> > > > > > 6088 bytes read in 7 ms (848.6 KiB/s)
> > > > > > ## Flattened Device Tree blob at 88000000
> > > > > >    Booting using the fdt blob at 0x88000000
> > > > > >    Using Device Tree in place at 0000000088000000, end
> > > > 00000000880047c7
> > > > > > Starting kernel ...
> > > > > > 
> > > > > > exception code: 2 , Illegal instruction , epc  , ra 88000004
> > > > 88000000
> > > > > > ### ERROR ### Please RESET the board ###
> > > > 
> > > > I think, that's the same issue I had (or still have) a week ago.
> > > > Just reminder that kernel 5.3 introduced a 64-byte header (thus no
> > > > need to wrap kernel) at least for Image target. Thus it's booti
> > > > that
> > > > boots the kernel on U-Boot side.
> > > > Thus the 1st instruction of that header is "j 0x40" (to the
> > > > beginning
> > > > of the actual kernel).  And 88000004 would definitely hold an
> > > > illegal
> > > > instruction.
> > > > 
> > > > 0000000000000000 <.data>:
> > > > 0:       81a0                    j       0x40
> > > > 2:       0000                    unimp
> > > > 4:       0000                    unimp
> > > > 6:       0100                    nop
> > > > [..]
> > > 
> > > Hmm that's the beginning of the kernel code. The address 88000004
> > > actually corresponds to the FDT. So the hart ending up in a trap
> > > actually tries to boot the FDT instead of the kernel.
> > > 
> > 
> > Do you see the issue if you manually use bootm instead of extlinux?
> > 
> > => bootm $kernel_addr_r - $fdt_addr_r
> > 
> > This is a probably not related as bootm is jumping to wrong location
> > for some reason. However, it may be worth a shot as it fixes fdt
> > corruption. 
> 
> I have just tested, and it doesn't work. On the other hand I have try to
> run that manually, and interrupting the boot process usually hides the
> problem.
> 

I tried to reproduce the issue today, but was not able to. If you can
upload the relevant files somewhere, I can retry it with them. I have
also added information on the boot flow in U-Boot below in hopes that
it is helpful for debugging.

U-Boot divides the harts in the system into the main hart (running   
U-Boot) and the secondary harts (all others). The main hart is
responsible for notifying the secondary harts of where to jump to. To
communicate with them, it uses IPIs and the U-Boot global data data
structure (register gp stores a pointer to it), located at the end of
RAM. Other variables in global data that could be helpful for debugging
are arch.boot_hart (the main hart running U-Boot) and
arch.available_harts (a bitmask of all harts that have entered U-Boot). 
They are defined in 
https://gitlab.denx.de/u-boot/u-boot/blob/master/arch/riscv/include/asm/global_data.h
.

Booting Linux will usually use the bootm command / functions at some
point. Before jumping to the kernel, the main hart instructs the
secondary harts to jump to the kernel image. The relevant code for this
is at 
https://gitlab.denx.de/u-boot/u-boot/blob/master/arch/riscv/lib/bootm.c#L101
. This will send an IPI to all secondary harts. They are received in
arch/riscv/cpu/start.S and are eventually handled in handle_ipi() at 
https://gitlab.denx.de/u-boot/u-boot/blob/master/arch/riscv/lib/smp.c#L86
.

What I find strange with the error you are seeing is that one of the
harts is jumping to the device tree binary. As you mentioned, it could
be that we have a race condition somewhere, for example causing
something to be overwritten in global data while some harts are still
running U-Boot. However, I would expect more or less random data and
not the address of the device tree binary in that case. For that reason
I would tend to rule out this scenario. Since only one hart is failing
to enter Linux, I assume that all secondary harts successfully boot
Linux and only the main hart is having problems. That would mean that
something is going wrong in arch/riscv/lib/bootm.c .

Andreas also brought up a good point. We did have a similar problem
before, which was caused by insufficient initialization. The workaround
to fix this was to use the power switch instead of the reset button to
reset the board. I haven't tested it, but I believe initialization in
OpenSBI should be better now, meaning that this might not be a problem
anymore. However, there might also be a similar problem in U-Boot.

Regards,
Lukas
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2019-10-15 21:38 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-03 20:07 Fail to bring hart online on HiFive Unleashed Aurelien Jarno
2019-10-03 23:13 ` Atish Patra
2019-10-03 23:16   ` Troy Benjegerdes
2019-10-05 10:25   ` Aurelien Jarno
2019-10-05 10:54     ` Aurelien Jarno
2019-10-06 12:28     ` Aurelien Jarno
2019-10-07 22:19       ` Atish Patra
2019-10-08  4:30         ` Aurelien Jarno
2019-10-08  6:14           ` David Abdurachmanov
2019-10-08  6:33             ` Aurelien Jarno
2019-10-08  7:17               ` Anup Patel
2019-10-08 22:21               ` Troy Benjegerdes
2019-10-10 19:59                 ` Aurelien Jarno
2019-10-11 14:05                   ` David Abdurachmanov
2019-10-09  1:34               ` Atish Patra
2019-10-10 19:58                 ` Aurelien Jarno
2019-10-15 21:38                   ` Auer, Lukas [this message]
2019-10-15 22:22                     ` Aurelien Jarno
2019-10-16 20:49                       ` Auer, Lukas
2019-10-17 15:45                         ` David Abdurachmanov
2019-10-17 20:42                         ` Aurelien Jarno
2019-10-20 18:57                           ` Auer, Lukas
2019-10-08  7:06           ` Anup Patel
2019-10-14  9:23 ` Andreas Schwab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f2a467d2dfd1828533fee8a8edf7eac51d8c1d84.camel@aisec.fraunhofer.de \
    --to=lukas.auer@aisec.fraunhofer.de \
    --cc=Atish.Patra@wdc.com \
    --cc=aurelien@aurel32.net \
    --cc=david.abdurachmanov@sifive.com \
    --cc=linux-riscv@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.