linux-riscv.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: "Auer, Lukas" <lukas.auer@aisec.fraunhofer.de>
To: "aurelien@aurel32.net" <aurelien@aurel32.net>
Cc: "david.abdurachmanov@sifive.com" <david.abdurachmanov@sifive.com>,
	"Atish.Patra@wdc.com" <Atish.Patra@wdc.com>,
	"linux-riscv@lists.infradead.org"
	<linux-riscv@lists.infradead.org>
Subject: Re: Fail to bring hart online on HiFive Unleashed
Date: Sun, 20 Oct 2019 18:57:08 +0000	[thread overview]
Message-ID: <2a1089078a0118fd973945d7d75a7b2662287dc1.camel@aisec.fraunhofer.de> (raw)
In-Reply-To: <20191017204217.GA11023@aurel32.net>

On Thu, 2019-10-17 at 22:42 +0200, Aurelien Jarno wrote:
> On 2019-10-16 20:49, Auer, Lukas wrote:
> > On Wed, 2019-10-16 at 00:22 +0200, Aurelien Jarno wrote:
> > > On 2019-10-15 21:38, Auer, Lukas wrote:
> > > > On Thu, 2019-10-10 at 21:58 +0200, Aurelien Jarno wrote:
> > > > > On 2019-10-09 01:34, Atish Patra wrote:
> > > > > > On Tue, 2019-10-08 at 08:33 +0200, Aurelien Jarno wrote:
> > > > > > > Le 8 octobre 2019 08:14:58 GMT+02:00, David Abdurachmanov <
> > > > > > > david.abdurachmanov@sifive.com> a écrit :
> > > > > > > > On Tue, Oct 8, 2019 at 7:30 AM Aurelien Jarno <aurelien@aurel32.net
> > > > > > > > wrote:
> > > > > > > > > On 2019-10-07 22:19, Atish Patra wrote:
> > > > > > > > > > Thanks for the detailed analysis. Can you please keep me and
> > > > > > > > > > david
> > > > > > > > in
> > > > > > > > > > cc when you report the issue to U-boot ?
> > > > > > > > > 
> > > > > > > > > Yep. I have progressed a bit on that, and now I am not convinced
> > > > > > > > > it's
> > > > > > > > an
> > > > > > > > > U-boot issue, it can be a GCC issue.
> > > > > > > > > 
> > > > > > > > > Here are the conditions to reproduce the bug:
> > > > > > > > > - U-boot runs on hart 1, 2 or 3
> > > > > > > > > - the autoboot process is not interrupted
> > > > > > > > > - extlinux is used to boot the kernel
> > > > > > > > > - arch/riscv/lib/bootm.c is compiled with GCC 9 (works fine with
> > > > > > > > > GCC
> > > > > > > > 8)
> > > > > > > > > When the problem happens, the missing hart actually ends its
> > > > > > > > execution
> > > > > > > > > in an illegal instruction trap trying to execute the FDT (I only
> > > > > > > > noticed
> > > > > > > > > that recently as the message was hidden by the use of
> > > > > > > > > earlycon=sbi):
> > > > > > > > > 
> > > > > > > > > > SiFive FSBL:       2018-03-20
> > > > > > > > > > HiFive-U serial #: 00000246
> > > > > > > > > > 
> > > > > > > > > > OpenSBI v0.4-50-g30f09fb (Oct  6 2019 21:58:05)
> > > > > > > > > >    ____                    _____ ____ _____
> > > > > > > > > >   / __ \                  / ____|  _ \_   _|
> > > > > > > > > >  | |  | |_ __   ___ _ __ | (___ | |_) || |
> > > > > > > > > >  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
> > > > > > > > > >  | |__| | |_) |  __/ | | |____) | |_) || |_
> > > > > > > > > >   \____/| .__/ \___|_| |_|_____/|____/_____|
> > > > > > > > > >         | |
> > > > > > > > > >         |_|
> > > > > > > > > > 
> > > > > > > > > > Platform Name          : SiFive Freedom U540
> > > > > > > > > > Platform HART Features : RV64ACDFIMSU
> > > > > > > > > > Platform Max HARTs     : 5
> > > > > > > > > > Current Hart           : 2
> > > > > > > > > > Firmware Base          : 0x80000000
> > > > > > > > > > Firmware Size          : 104 KB
> > > > > > > > > > Runtime SBI Version    : 0.2
> > > > > > > > > > 
> > > > > > > > > > PMP0: 0x0000000080000000-0x000000008001ffff (A)
> > > > > > > > > > PMP1: 0x0000000000000000-0x0000007fffffffff (A,R,W,X)
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > U-Boot 2019.10-rc4-00037-gdac51e9aaf-dirty (Oct 06 2019 -
> > > > > > > > > > 21:56:51
> > > > > > > > +0000)
> > > > > > > > > > CPU:   rv64imafdc
> > > > > > > > > > Model: SiFive HiFive Unleashed A00
> > > > > > > > > > DRAM:  8 GiB
> > > > > > > > > > 
> > > > > > > > > > MMC:   spi@10050000:mmc@0: 0
> > > > > > > > > > In:    serial@10010000
> > > > > > > > > > Out:   serial@10010000
> > > > > > > > > > Err:   serial@10010000
> > > > > > > > > > Net:   eth0: ethernet@10090000
> > > > > > > > > > Hit any key to stop autoboot:  0
> > > > > > > > > > switch to partitions #0, OK
> > > > > > > > > > mmc0 is current device
> > > > > > > > > > Scanning mmc 0:2...
> > > > > > > > > > Found /boot/extlinux/extlinux.conf
> > > > > > > > > > Retrieving file: /boot/extlinux/extlinux.conf
> > > > > > > > > > 510 bytes read in 5 ms (99.6 KiB/s)
> > > > > > > > > > U-Boot menu
> > > > > > > > > > 1:      kernel 5.3.4
> > > > > > > > > > 2:      Debian GNU/Linux kernel 5.3.0-trunk-riscv64
> > > > > > > > > > Enter choice: 1
> > > > > > > > > > 1:      kernel 5.3.4
> > > > > > > > > > Retrieving file: /boot/vmlinux-5.3.4
> > > > > > > > > > 9486076 bytes read in 4813 ms (1.9 MiB/s)
> > > > > > > > > > append: root=/dev/mmcblk0p2 rw console=ttySIF0 rootwait
> > > > > > > > > > Retrieving file: /boot/hifive-unleashed-a00.dtb
> > > > > > > > > > 6088 bytes read in 7 ms (848.6 KiB/s)
> > > > > > > > > > ## Flattened Device Tree blob at 88000000
> > > > > > > > > >    Booting using the fdt blob at 0x88000000
> > > > > > > > > >    Using Device Tree in place at 0000000088000000, end
> > > > > > > > 00000000880047c7
> > > > > > > > > > Starting kernel ...
> > > > > > > > > > 
> > > > > > > > > > exception code: 2 , Illegal instruction , epc  , ra 88000004
> > > > > > > > 88000000
> > > > > > > > > > ### ERROR ### Please RESET the board ###
> > > > > > > > 
> > > > > > > > I think, that's the same issue I had (or still have) a week ago.
> > > > > > > > Just reminder that kernel 5.3 introduced a 64-byte header (thus no
> > > > > > > > need to wrap kernel) at least for Image target. Thus it's booti
> > > > > > > > that
> > > > > > > > boots the kernel on U-Boot side.
> > > > > > > > Thus the 1st instruction of that header is "j 0x40" (to the
> > > > > > > > beginning
> > > > > > > > of the actual kernel).  And 88000004 would definitely hold an
> > > > > > > > illegal
> > > > > > > > instruction.
> > > > > > > > 
> > > > > > > > 0000000000000000 <.data>:
> > > > > > > > 0:       81a0                    j       0x40
> > > > > > > > 2:       0000                    unimp
> > > > > > > > 4:       0000                    unimp
> > > > > > > > 6:       0100                    nop
> > > > > > > > [..]
> > > > > > > 
> > > > > > > Hmm that's the beginning of the kernel code. The address 88000004
> > > > > > > actually corresponds to the FDT. So the hart ending up in a trap
> > > > > > > actually tries to boot the FDT instead of the kernel.
> > > > > > > 
> > > > > > 
> > > > > > Do you see the issue if you manually use bootm instead of extlinux?
> > > > > > 
> > > > > > => bootm $kernel_addr_r - $fdt_addr_r
> > > > > > 
> > > > > > This is a probably not related as bootm is jumping to wrong location
> > > > > > for some reason. However, it may be worth a shot as it fixes fdt
> > > > > > corruption. 
> > > > > 
> > > > > I have just tested, and it doesn't work. On the other hand I have try to
> > > > > run that manually, and interrupting the boot process usually hides the
> > > > > problem.
> > > > > 
> > > > 
> > > > I tried to reproduce the issue today, but was not able to. If you can
> > > > upload the relevant files somewhere, I can retry it with them. I have
> > > > also added information on the boot flow in U-Boot below in hopes that
> > > > it is helpful for debugging.
> > > 
> > > You can find the files there:
> > > https://temp.aurel32.net/hifive-opensbi-uboot/
> > > 
> > > fw_payload.bin contains the OpenSBI + U-Boot payload to be copied to the
> > > first partition of the SD card. The boot.tar.gz contains the /boot 
> > > directory (kernel, fdt and extlinux.conf) and has to be put on the
> > > second partition of the SD card. Note that this partition should have
> > > the GPT boot flag enabled for extlinux to work.
> > > 
> > > I haven't looked more at the issue recently now that I have found that
> > > using GCC 8 is a fix/workaround. Therefore those files are from ~10 days
> > > ago. I will try to do more tests during the week-end.
> > > 
> > 
> > Thanks for the files, I was able to reproduce the issue now. Seems like
> > it is caused by a stack overflow. When smp_call_function() is called
> > during bootm, the stack of the main hart overflows into the stack of
> > one of the other harts. The return address of the main hart now lies
> > within the stack of the other hart. Once that hart gets woken by the
> > IPI it overwrites the return address, in our case with 0x88000000. This
> > will cause the illegal instruction trap once the main hart returns.
> > This also explains, why the problem does not occur when the main hart
> > is hart 4, since its stack is at the bottom and therefore can't
> > overflow into one of the other stacks.
> > 
> > Increasing the stack size (CONFIG_STACK_SIZE_SHIFT) to 14 fixes the
> > problem. I'll double check that there's nothing else causing an issue
> > and will then send a patch to increase the stack size.
> 
> Thanks a lot for debugging that. I have just tried, I confirm it fixes
> the issue for me.
> 
> Tested-by: Aurelien Jarno <aurelien@aurel32.net>
> 

Thanks for testing the fix, David and Aurelien! I have submitted a
patch to U-Boot with the fix.

https://patchwork.ozlabs.org/patch/1180057/

Regards,
Lukas
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2019-10-20 18:57 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-03 20:07 Fail to bring hart online on HiFive Unleashed Aurelien Jarno
2019-10-03 23:13 ` Atish Patra
2019-10-03 23:16   ` Troy Benjegerdes
2019-10-05 10:25   ` Aurelien Jarno
2019-10-05 10:54     ` Aurelien Jarno
2019-10-06 12:28     ` Aurelien Jarno
2019-10-07 22:19       ` Atish Patra
2019-10-08  4:30         ` Aurelien Jarno
2019-10-08  6:14           ` David Abdurachmanov
2019-10-08  6:33             ` Aurelien Jarno
2019-10-08  7:17               ` Anup Patel
2019-10-08 22:21               ` Troy Benjegerdes
2019-10-10 19:59                 ` Aurelien Jarno
2019-10-11 14:05                   ` David Abdurachmanov
2019-10-09  1:34               ` Atish Patra
2019-10-10 19:58                 ` Aurelien Jarno
2019-10-15 21:38                   ` Auer, Lukas
2019-10-15 22:22                     ` Aurelien Jarno
2019-10-16 20:49                       ` Auer, Lukas
2019-10-17 15:45                         ` David Abdurachmanov
2019-10-17 20:42                         ` Aurelien Jarno
2019-10-20 18:57                           ` Auer, Lukas [this message]
2019-10-08  7:06           ` Anup Patel
2019-10-14  9:23 ` Andreas Schwab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2a1089078a0118fd973945d7d75a7b2662287dc1.camel@aisec.fraunhofer.de \
    --to=lukas.auer@aisec.fraunhofer.de \
    --cc=Atish.Patra@wdc.com \
    --cc=aurelien@aurel32.net \
    --cc=david.abdurachmanov@sifive.com \
    --cc=linux-riscv@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).