linux-riscv.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* Fail to bring hart online on HiFive Unleashed
@ 2019-10-03 20:07 Aurelien Jarno
  2019-10-03 23:13 ` Atish Patra
  2019-10-14  9:23 ` Andreas Schwab
  0 siblings, 2 replies; 24+ messages in thread
From: Aurelien Jarno @ 2019-10-03 20:07 UTC (permalink / raw)
  To: linux-riscv

Hi all,

I have just upgraded the bootloaders and kernel on an HiFive Unleashed
board to:
- OpenSBI v0.4-50-g30f09fb 
- U-Boot 2019.10-rc4
- Linux v5.3.2

Most of the time, the kernel only brings online 3 of the 4 RV64GC harts:
| # getconf _NPROCESSORS_CONF
| 4
| # getconf _NPROCESSORS_ONLN
| 3

This can also be seen in /proc/cpuinfo:

| processor       : 0
| hart            : 1
| isa             : rv64imafdc
| mmu             : sv39
| uarch           : sifive,u54-mc
|
| processor       : 1
| hart            : 2
| isa             : rv64imafdc
| mmu             : sv39
| uarch           : sifive,u54-mc
|
| processor       : 3
| hart            : 4
| isa             : rv64imafdc
| mmu             : sv39
| uarch           : sifive,u54-mc

When it happens, the kernel logs contain:

| [    0.049851] smp: Bringing up secondary CPUs ...
| [    1.082530] CPU2: failed to come online
| [    1.086267] smp: Brought up 1 node, 3 CPUs

I have also seen the issue with CPU1 but not with CPU3 and CPU4 (might
be a statistical effect).

Any idea what could be the issue?

Thanks,
Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fail to bring hart online on HiFive Unleashed
  2019-10-03 20:07 Fail to bring hart online on HiFive Unleashed Aurelien Jarno
@ 2019-10-03 23:13 ` Atish Patra
  2019-10-03 23:16   ` Troy Benjegerdes
  2019-10-05 10:25   ` Aurelien Jarno
  2019-10-14  9:23 ` Andreas Schwab
  1 sibling, 2 replies; 24+ messages in thread
From: Atish Patra @ 2019-10-03 23:13 UTC (permalink / raw)
  To: linux-riscv, aurelien

On Thu, 2019-10-03 at 22:07 +0200, Aurelien Jarno wrote:
> Hi all,
> 
> I have just upgraded the bootloaders and kernel on an HiFive
> Unleashed
> board to:
> - OpenSBI v0.4-50-g30f09fb 
> - U-Boot 2019.10-rc4
> - Linux v5.3.2
> 
> Most of the time, the kernel only brings online 3 of the 4 RV64GC
> harts:
> > # getconf _NPROCESSORS_CONF
> > 4
> > # getconf _NPROCESSORS_ONLN
> > 3
> 
> This can also be seen in /proc/cpuinfo:
> 
> > processor       : 0
> > hart            : 1
> > isa             : rv64imafdc
> > mmu             : sv39
> >  uarch           : sifive,u54-mc
> > 
> > processor       : 1
> >  hart            : 2
> > isa             : rv64imafdc
> > mmu             : sv39
> > uarch           : sifive,u54-mc
> > 
> > processor       : 3
> > hart            : 4
> > isa             : rv64imafdc
> > mmu             : sv39
> > uarch           : sifive,u54-mc
> 
> When it happens, the kernel logs contain:
> 
> > [    0.049851] smp: Bringing up secondary CPUs ...
> > [    1.082530] CPU2: failed to come online
> >  [    1.086267] smp: Brought up 1 node, 3 CPUs
> 

The log is aligned with the outcome. CPU2 never came up within 1 second
for some reason. How often do you see this ?

I tried couple of times and did not see this issue. Here is the log

OpenSBI v0.4-50-g30f09fbfd1ec (Oct  3 2019 14:03:20)
U-Boot 2019.10-rc4-00023-g72efcc8f00fc (Oct 03 2019 - 14:03:12 -0700)
Linux version 5.4.0-rc1-00004-gecd4522e3e09

Here is the bootlog.
https://paste.fedoraproject.org/paste/-gr1Zeg4~UBs~bqIPraJwA

If this issue is reliably reproducible, here are some areas to dbeug.

1. __cpu_up() in smpboot.c has a 1sec timeout for each cpu to come up.

You can increase that time just to make sure that it's not a hardware
issue.

or

2. Put some debug prints in U-boot/OpenSBI to confirm that all 4 harts
did  come up at each layer. 

You can also just use kernel image directly FW_PAYLOAD_PATH in OpenSBI
to avoid U-boot. That may give a clue if it is a U-boot issue or not.

> I have also seen the issue with CPU1 but not with CPU3 and CPU4
> (might
> be a statistical effect).
> 
> Any idea what could be the issue?
> 


> Thanks,
> Aurelien
> 

-- 
Regards,
Atish
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fail to bring hart online on HiFive Unleashed
  2019-10-03 23:13 ` Atish Patra
@ 2019-10-03 23:16   ` Troy Benjegerdes
  2019-10-05 10:25   ` Aurelien Jarno
  1 sibling, 0 replies; 24+ messages in thread
From: Troy Benjegerdes @ 2019-10-03 23:16 UTC (permalink / raw)
  To: aurelien; +Cc: Atish Patra, linux-riscv

Does the same problem occur if you write this file to an SDcard and set the switches to load the (legacy) machine-mode HiFive u-boot as FSBL?

https://github.com/sifive/freedom-u-sdk/releases/download/hifiveu-2.0-alpha-2/hifive-unleashed-a00-2019-03-22.gpt

> On Oct 3, 2019, at 6:13 PM, Atish Patra <Atish.Patra@wdc.com> wrote:
> 
> On Thu, 2019-10-03 at 22:07 +0200, Aurelien Jarno wrote:
>> Hi all,
>> 
>> I have just upgraded the bootloaders and kernel on an HiFive
>> Unleashed
>> board to:
>> - OpenSBI v0.4-50-g30f09fb 
>> - U-Boot 2019.10-rc4
>> - Linux v5.3.2
>> 
>> Most of the time, the kernel only brings online 3 of the 4 RV64GC
>> harts:
>>> # getconf _NPROCESSORS_CONF
>>> 4
>>> # getconf _NPROCESSORS_ONLN
>>> 3
>> 
>> This can also be seen in /proc/cpuinfo:
>> 
>>> processor       : 0
>>> hart            : 1
>>> isa             : rv64imafdc
>>> mmu             : sv39
>>> uarch           : sifive,u54-mc
>>> 
>>> processor       : 1
>>> hart            : 2
>>> isa             : rv64imafdc
>>> mmu             : sv39
>>> uarch           : sifive,u54-mc
>>> 
>>> processor       : 3
>>> hart            : 4
>>> isa             : rv64imafdc
>>> mmu             : sv39
>>> uarch           : sifive,u54-mc
>> 
>> When it happens, the kernel logs contain:
>> 
>>> [    0.049851] smp: Bringing up secondary CPUs ...
>>> [    1.082530] CPU2: failed to come online
>>> [    1.086267] smp: Brought up 1 node, 3 CPUs
>> 
> 
> The log is aligned with the outcome. CPU2 never came up within 1 second
> for some reason. How often do you see this ?
> 
> I tried couple of times and did not see this issue. Here is the log
> 
> OpenSBI v0.4-50-g30f09fbfd1ec (Oct  3 2019 14:03:20)
> U-Boot 2019.10-rc4-00023-g72efcc8f00fc (Oct 03 2019 - 14:03:12 -0700)
> Linux version 5.4.0-rc1-00004-gecd4522e3e09
> 
> Here is the bootlog.
> https://paste.fedoraproject.org/paste/-gr1Zeg4~UBs~bqIPraJwA
> 
> If this issue is reliably reproducible, here are some areas to dbeug.
> 
> 1. __cpu_up() in smpboot.c has a 1sec timeout for each cpu to come up.
> 
> You can increase that time just to make sure that it's not a hardware
> issue.
> 
> or
> 
> 2. Put some debug prints in U-boot/OpenSBI to confirm that all 4 harts
> did  come up at each layer. 
> 
> You can also just use kernel image directly FW_PAYLOAD_PATH in OpenSBI
> to avoid U-boot. That may give a clue if it is a U-boot issue or not.
> 
>> I have also seen the issue with CPU1 but not with CPU3 and CPU4
>> (might
>> be a statistical effect).
>> 
>> Any idea what could be the issue?
>> 
> 
> 
>> Thanks,
>> Aurelien
>> 
> 
> -- 
> Regards,
> Atish
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fail to bring hart online on HiFive Unleashed
  2019-10-03 23:13 ` Atish Patra
  2019-10-03 23:16   ` Troy Benjegerdes
@ 2019-10-05 10:25   ` Aurelien Jarno
  2019-10-05 10:54     ` Aurelien Jarno
  2019-10-06 12:28     ` Aurelien Jarno
  1 sibling, 2 replies; 24+ messages in thread
From: Aurelien Jarno @ 2019-10-05 10:25 UTC (permalink / raw)
  To: Atish Patra; +Cc: linux-riscv

Hi,

On 2019-10-03 23:13, Atish Patra wrote:
> On Thu, 2019-10-03 at 22:07 +0200, Aurelien Jarno wrote:
> > Hi all,
> > 
> > When it happens, the kernel logs contain:
> > 
> > > [    0.049851] smp: Bringing up secondary CPUs ...
> > > [    1.082530] CPU2: failed to come online
> > >  [    1.086267] smp: Brought up 1 node, 3 CPUs
> > 
> 
> The log is aligned with the outcome. CPU2 never came up within 1 second
> for some reason. How often do you see this ?

It happens about 80% of the time.

> I tried couple of times and did not see this issue. Here is the log
> 
> OpenSBI v0.4-50-g30f09fbfd1ec (Oct  3 2019 14:03:20)
> U-Boot 2019.10-rc4-00023-g72efcc8f00fc (Oct 03 2019 - 14:03:12 -0700)
> Linux version 5.4.0-rc1-00004-gecd4522e3e09
> 
> Here is the bootlog.
> https://paste.fedoraproject.org/paste/-gr1Zeg4~UBs~bqIPraJwA
> 
> If this issue is reliably reproducible, here are some areas to dbeug.
> 
> 1. __cpu_up() in smpboot.c has a 1sec timeout for each cpu to come up.
> 
> You can increase that time just to make sure that it's not a hardware
> issue.
> 
> or
> 
> 2. Put some debug prints in U-boot/OpenSBI to confirm that all 4 harts
> did  come up at each layer. 
> 
> You can also just use kernel image directly FW_PAYLOAD_PATH in OpenSBI
> to avoid U-boot. That may give a clue if it is a U-boot issue or not.

I have tried that first as it's the easiest to do. I have not been able
to reproduce the issue when skipping u-boot. I'll therefore now try to
debug that using your suggestions.

Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fail to bring hart online on HiFive Unleashed
  2019-10-05 10:25   ` Aurelien Jarno
@ 2019-10-05 10:54     ` Aurelien Jarno
  2019-10-06 12:28     ` Aurelien Jarno
  1 sibling, 0 replies; 24+ messages in thread
From: Aurelien Jarno @ 2019-10-05 10:54 UTC (permalink / raw)
  To: Atish Patra; +Cc: linux-riscv

On 2019-10-05 12:25, Aurelien Jarno wrote:
> Hi,
> 
> On 2019-10-03 23:13, Atish Patra wrote:
> > On Thu, 2019-10-03 at 22:07 +0200, Aurelien Jarno wrote:
> > > Hi all,
> > > 
> > > When it happens, the kernel logs contain:
> > > 
> > > > [    0.049851] smp: Bringing up secondary CPUs ...
> > > > [    1.082530] CPU2: failed to come online
> > > >  [    1.086267] smp: Brought up 1 node, 3 CPUs
> > > 
> > 
> > The log is aligned with the outcome. CPU2 never came up within 1 second
> > for some reason. How often do you see this ?
> 
> It happens about 80% of the time.

Some more statistics. The missing hart is always the one used by OpenSBI
and I guess u-boot. It always fail when OpenSBI uses the hart 1, 2 or 3.
It never fails when OpenSBI uses the hart 4.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fail to bring hart online on HiFive Unleashed
  2019-10-05 10:25   ` Aurelien Jarno
  2019-10-05 10:54     ` Aurelien Jarno
@ 2019-10-06 12:28     ` Aurelien Jarno
  2019-10-07 22:19       ` Atish Patra
  1 sibling, 1 reply; 24+ messages in thread
From: Aurelien Jarno @ 2019-10-06 12:28 UTC (permalink / raw)
  To: Atish Patra; +Cc: linux-riscv

On 2019-10-05 12:25, Aurelien Jarno wrote:
> Hi,
> 
> On 2019-10-03 23:13, Atish Patra wrote:
> > On Thu, 2019-10-03 at 22:07 +0200, Aurelien Jarno wrote:
> > > Hi all,
> > > 
> > > When it happens, the kernel logs contain:
> > > 
> > > > [    0.049851] smp: Bringing up secondary CPUs ...
> > > > [    1.082530] CPU2: failed to come online
> > > >  [    1.086267] smp: Brought up 1 node, 3 CPUs
> > > 
> > 
> > The log is aligned with the outcome. CPU2 never came up within 1 second
> > for some reason. How often do you see this ?
> 
> It happens about 80% of the time.
> 
> > I tried couple of times and did not see this issue. Here is the log
> > 
> > OpenSBI v0.4-50-g30f09fbfd1ec (Oct  3 2019 14:03:20)
> > U-Boot 2019.10-rc4-00023-g72efcc8f00fc (Oct 03 2019 - 14:03:12 -0700)
> > Linux version 5.4.0-rc1-00004-gecd4522e3e09
> > 
> > Here is the bootlog.
> > https://paste.fedoraproject.org/paste/-gr1Zeg4~UBs~bqIPraJwA
> > 
> > If this issue is reliably reproducible, here are some areas to dbeug.
> > 
> > 1. __cpu_up() in smpboot.c has a 1sec timeout for each cpu to come up.
> > 
> > You can increase that time just to make sure that it's not a hardware
> > issue.

I tried to increase it to 5 seconds. This does not change anything.

> > or
> > 
> > 2. Put some debug prints in U-boot/OpenSBI to confirm that all 4 harts
> > did  come up at each layer. 
> > 
> > You can also just use kernel image directly FW_PAYLOAD_PATH in OpenSBI
> > to avoid U-boot. That may give a clue if it is a U-boot issue or not.
> 
> I have tried that first as it's the easiest to do. I have not been able
> to reproduce the issue when skipping u-boot. I'll therefore now try to
> debug that using your suggestions.

I have finally tracked down the issue to the usage of extlinux for the
boot process. When using ext4load to load the kernel and dtb, and booti
to boot the kernel, the issue does not happen.

It is therefore purely an u-boot issue. I'll continue to debug that and
report the issue to u-boot.

Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fail to bring hart online on HiFive Unleashed
  2019-10-06 12:28     ` Aurelien Jarno
@ 2019-10-07 22:19       ` Atish Patra
  2019-10-08  4:30         ` Aurelien Jarno
  0 siblings, 1 reply; 24+ messages in thread
From: Atish Patra @ 2019-10-07 22:19 UTC (permalink / raw)
  To: aurelien; +Cc: david.abdurachmanov, linux-riscv

On Sun, 2019-10-06 at 14:28 +0200, Aurelien Jarno wrote:
> On 2019-10-05 12:25, Aurelien Jarno wrote:
> > Hi,
> > 
> > On 2019-10-03 23:13, Atish Patra wrote:
> > > On Thu, 2019-10-03 at 22:07 +0200, Aurelien Jarno wrote:
> > > > Hi all,
> > > > 
> > > > When it happens, the kernel logs contain:
> > > > 
> > > > > [    0.049851] smp: Bringing up secondary CPUs ...
> > > > > [    1.082530] CPU2: failed to come online
> > > > >  [    1.086267] smp: Brought up 1 node, 3 CPUs
> > > 
> > > The log is aligned with the outcome. CPU2 never came up within 1
> > > second
> > > for some reason. How often do you see this ?
> > 
> > It happens about 80% of the time.
> > 
> > > I tried couple of times and did not see this issue. Here is the
> > > log
> > > 
> > > OpenSBI v0.4-50-g30f09fbfd1ec (Oct  3 2019 14:03:20)
> > > U-Boot 2019.10-rc4-00023-g72efcc8f00fc (Oct 03 2019 - 14:03:12
> > > -0700)
> > > Linux version 5.4.0-rc1-00004-gecd4522e3e09
> > > 
> > > Here is the bootlog.
> > > https://paste.fedoraproject.org/paste/-gr1Zeg4~UBs~bqIPraJwA
> > > 
> > > If this issue is reliably reproducible, here are some areas to
> > > dbeug.
> > > 
> > > 1. __cpu_up() in smpboot.c has a 1sec timeout for each cpu to
> > > come up.
> > > 
> > > You can increase that time just to make sure that it's not a
> > > hardware
> > > issue.
> 
> I tried to increase it to 5 seconds. This does not change anything.
> 
> > > or
> > > 
> > > 2. Put some debug prints in U-boot/OpenSBI to confirm that all 4
> > > harts
> > > did  come up at each layer. 
> > > 
> > > You can also just use kernel image directly FW_PAYLOAD_PATH in
> > > OpenSBI
> > > to avoid U-boot. That may give a clue if it is a U-boot issue or
> > > not.
> > 
> > I have tried that first as it's the easiest to do. I have not been
> > able
> > to reproduce the issue when skipping u-boot. I'll therefore now try
> > to
> > debug that using your suggestions.
> 
> I have finally tracked down the issue to the usage of extlinux for
> the
> boot process. When using ext4load to load the kernel and dtb, and
> booti
> to boot the kernel, the issue does not happen.
> 
> It is therefore purely an u-boot issue. I'll continue to debug that
> and
> report the issue to u-boot.
> 

Thanks for the detailed analysis. Can you please keep me and david in
cc when you report the issue to U-boot ?

FYI: David is working on uboot + extlinux support for fedora RISC-V.

> Aurelien
> 

-- 
Regards,
Atish
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fail to bring hart online on HiFive Unleashed
  2019-10-07 22:19       ` Atish Patra
@ 2019-10-08  4:30         ` Aurelien Jarno
  2019-10-08  6:14           ` David Abdurachmanov
  2019-10-08  7:06           ` Anup Patel
  0 siblings, 2 replies; 24+ messages in thread
From: Aurelien Jarno @ 2019-10-08  4:30 UTC (permalink / raw)
  To: Atish Patra; +Cc: david.abdurachmanov, linux-riscv

On 2019-10-07 22:19, Atish Patra wrote:
> Thanks for the detailed analysis. Can you please keep me and david in
> cc when you report the issue to U-boot ?

Yep. I have progressed a bit on that, and now I am not convinced it's an
U-boot issue, it can be a GCC issue.

Here are the conditions to reproduce the bug:
- U-boot runs on hart 1, 2 or 3
- the autoboot process is not interrupted
- extlinux is used to boot the kernel
- arch/riscv/lib/bootm.c is compiled with GCC 9 (works fine with GCC 8)

When the problem happens, the missing hart actually ends its execution
in an illegal instruction trap trying to execute the FDT (I only noticed
that recently as the message was hidden by the use of earlycon=sbi):

| SiFive FSBL:       2018-03-20
| HiFive-U serial #: 00000246
| 
| OpenSBI v0.4-50-g30f09fb (Oct  6 2019 21:58:05)
|    ____                    _____ ____ _____
|   / __ \                  / ____|  _ \_   _|
|  | |  | |_ __   ___ _ __ | (___ | |_) || |
|  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
|  | |__| | |_) |  __/ | | |____) | |_) || |_
|   \____/| .__/ \___|_| |_|_____/|____/_____|
|         | |
|         |_|
| 
| Platform Name          : SiFive Freedom U540
| Platform HART Features : RV64ACDFIMSU
| Platform Max HARTs     : 5
| Current Hart           : 2
| Firmware Base          : 0x80000000
| Firmware Size          : 104 KB
| Runtime SBI Version    : 0.2
| 
| PMP0: 0x0000000080000000-0x000000008001ffff (A)
| PMP1: 0x0000000000000000-0x0000007fffffffff (A,R,W,X)
| 
| 
| U-Boot 2019.10-rc4-00037-gdac51e9aaf-dirty (Oct 06 2019 - 21:56:51 +0000)
| 
| CPU:   rv64imafdc
| Model: SiFive HiFive Unleashed A00
| DRAM:  8 GiB
| 
| MMC:   spi@10050000:mmc@0: 0
| In:    serial@10010000
| Out:   serial@10010000
| Err:   serial@10010000
| Net:   eth0: ethernet@10090000
| Hit any key to stop autoboot:  0
| switch to partitions #0, OK
| mmc0 is current device
| Scanning mmc 0:2...
| Found /boot/extlinux/extlinux.conf
| Retrieving file: /boot/extlinux/extlinux.conf
| 510 bytes read in 5 ms (99.6 KiB/s)
| U-Boot menu
| 1:      kernel 5.3.4
| 2:      Debian GNU/Linux kernel 5.3.0-trunk-riscv64
| Enter choice: 1
| 1:      kernel 5.3.4
| Retrieving file: /boot/vmlinux-5.3.4
| 9486076 bytes read in 4813 ms (1.9 MiB/s)
| append: root=/dev/mmcblk0p2 rw console=ttySIF0 rootwait
| Retrieving file: /boot/hifive-unleashed-a00.dtb
| 6088 bytes read in 7 ms (848.6 KiB/s)
| ## Flattened Device Tree blob at 88000000
|    Booting using the fdt blob at 0x88000000
|    Using Device Tree in place at 0000000088000000, end 00000000880047c7
| 
| Starting kernel ...
| 
| exception code: 2 , Illegal instruction , epc 88000004 , ra 88000000
| ### ERROR ### Please RESET the board ###
| [    0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
| [    0.000000] Linux version 5.3.4+ (aurel32@ohm) (gcc version 9.2.1 20190821 (Debian 9.2.1-4)) #1 SMP Sun Oct 6 11:35:09 UTC 2019
| [    0.000000] initrd not found or empty - disabling initrd
| [    0.000000] Zone ranges:
| [    0.000000]   DMA32    [mem 0x0000000080200000-0x00000000ffffffff]
| [    0.000000]   Normal   [mem 0x0000000100000000-0x000000027fffffff]
| [    0.000000] Movable zone start for each node
| [    0.000000] Early memory node ranges
| [    0.000000]   node   0: [mem 0x0000000080200000-0x000000027fffffff]
| [    0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x000000027fffffff]
| [    0.000000] software IO TLB: mapped [mem 0xfbfff000-0xfffff000] (64MB)
| [    0.000000] CPU with hartid=0 is not available
| [    0.000000] CPU with hartid=0 is not available
| [    0.000000] elf_hwcap is 0x112d
| [    0.000000] percpu: Embedded 18 pages/cpu s36120 r8192 d29416 u73728
| [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 2067975
| [    0.000000] Kernel command line: root=/dev/mmcblk0p2 rw console=ttySIF0 rootwait
| [    0.000000] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear)
| [    0.000000] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes, linear)
| [    0.000000] Sorting __ex_table...
| [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
| [    0.000000] Memory: 8184044K/8386560K available (6310K kernel code, 395K rwdata, 1985K rodata, 239K init, 317K bss, 202516K reserved, 0K cma-reserved)
| [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
| [    0.000000] rcu: Hierarchical RCU implementation.
| [    0.000000] rcu:     RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=4.
| [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
| [    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
| [    0.000000] NR_IRQS: 0, nr_irqs: 0, preallocated irqs: 0
| [    0.000000] plic: mapped 53 interrupts with 4 handlers for 9 contexts.
| [    0.000000] riscv_timer_init_dt: Registering clocksource cpuid [0] hartid [1]
| [    0.000000] clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0x1d854df40, max_idle_ns: 3526361616960 ns
| [    0.000006] sched_clock: 64 bits at 1000kHz, resolution 1000ns, wraps every 2199023255500ns
| [    0.000147] Console: colour dummy device 80x25
| [    0.000184] Calibrating delay loop (skipped), value calculated using timer frequency.. 2.00 BogoMIPS (lpj=4000)
| [    0.000198] pid_max: default: 32768 minimum: 301
| [    0.000685] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
| [    0.001026] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
| [    0.002814] rcu: Hierarchical SRCU implementation.
| [    0.003280] smp: Bringing up secondary CPUs ...
| [    5.090625] CPU1: failed to come online
| [    5.091815] smp: Brought up 1 node, 3 CPUs

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fail to bring hart online on HiFive Unleashed
  2019-10-08  4:30         ` Aurelien Jarno
@ 2019-10-08  6:14           ` David Abdurachmanov
  2019-10-08  6:33             ` Aurelien Jarno
  2019-10-08  7:06           ` Anup Patel
  1 sibling, 1 reply; 24+ messages in thread
From: David Abdurachmanov @ 2019-10-08  6:14 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: Atish Patra, linux-riscv

On Tue, Oct 8, 2019 at 7:30 AM Aurelien Jarno <aurelien@aurel32.net> wrote:
>
> On 2019-10-07 22:19, Atish Patra wrote:
> > Thanks for the detailed analysis. Can you please keep me and david in
> > cc when you report the issue to U-boot ?
>
> Yep. I have progressed a bit on that, and now I am not convinced it's an
> U-boot issue, it can be a GCC issue.
>
> Here are the conditions to reproduce the bug:
> - U-boot runs on hart 1, 2 or 3
> - the autoboot process is not interrupted
> - extlinux is used to boot the kernel
> - arch/riscv/lib/bootm.c is compiled with GCC 9 (works fine with GCC 8)
>
> When the problem happens, the missing hart actually ends its execution
> in an illegal instruction trap trying to execute the FDT (I only noticed
> that recently as the message was hidden by the use of earlycon=sbi):
>
> | SiFive FSBL:       2018-03-20
> | HiFive-U serial #: 00000246
> |
> | OpenSBI v0.4-50-g30f09fb (Oct  6 2019 21:58:05)
> |    ____                    _____ ____ _____
> |   / __ \                  / ____|  _ \_   _|
> |  | |  | |_ __   ___ _ __ | (___ | |_) || |
> |  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
> |  | |__| | |_) |  __/ | | |____) | |_) || |_
> |   \____/| .__/ \___|_| |_|_____/|____/_____|
> |         | |
> |         |_|
> |
> | Platform Name          : SiFive Freedom U540
> | Platform HART Features : RV64ACDFIMSU
> | Platform Max HARTs     : 5
> | Current Hart           : 2
> | Firmware Base          : 0x80000000
> | Firmware Size          : 104 KB
> | Runtime SBI Version    : 0.2
> |
> | PMP0: 0x0000000080000000-0x000000008001ffff (A)
> | PMP1: 0x0000000000000000-0x0000007fffffffff (A,R,W,X)
> |
> |
> | U-Boot 2019.10-rc4-00037-gdac51e9aaf-dirty (Oct 06 2019 - 21:56:51 +0000)
> |
> | CPU:   rv64imafdc
> | Model: SiFive HiFive Unleashed A00
> | DRAM:  8 GiB
> |
> | MMC:   spi@10050000:mmc@0: 0
> | In:    serial@10010000
> | Out:   serial@10010000
> | Err:   serial@10010000
> | Net:   eth0: ethernet@10090000
> | Hit any key to stop autoboot:  0
> | switch to partitions #0, OK
> | mmc0 is current device
> | Scanning mmc 0:2...
> | Found /boot/extlinux/extlinux.conf
> | Retrieving file: /boot/extlinux/extlinux.conf
> | 510 bytes read in 5 ms (99.6 KiB/s)
> | U-Boot menu
> | 1:      kernel 5.3.4
> | 2:      Debian GNU/Linux kernel 5.3.0-trunk-riscv64
> | Enter choice: 1
> | 1:      kernel 5.3.4
> | Retrieving file: /boot/vmlinux-5.3.4
> | 9486076 bytes read in 4813 ms (1.9 MiB/s)
> | append: root=/dev/mmcblk0p2 rw console=ttySIF0 rootwait
> | Retrieving file: /boot/hifive-unleashed-a00.dtb
> | 6088 bytes read in 7 ms (848.6 KiB/s)
> | ## Flattened Device Tree blob at 88000000
> |    Booting using the fdt blob at 0x88000000
> |    Using Device Tree in place at 0000000088000000, end 00000000880047c7
> |
> | Starting kernel ...
> |
> | exception code: 2 , Illegal instruction , epc 88000004 , ra 88000000
> | ### ERROR ### Please RESET the board ###

I think, that's the same issue I had (or still have) a week ago.
Just reminder that kernel 5.3 introduced a 64-byte header (thus no
need to wrap kernel) at least for Image target. Thus it's booti that
boots the kernel on U-Boot side.
Thus the 1st instruction of that header is "j 0x40" (to the beginning
of the actual kernel).  And 88000004 would definitely hold an illegal
instruction.

0000000000000000 <.data>:
0:       81a0                    j       0x40
2:       0000                    unimp
4:       0000                    unimp
6:       0100                    nop
[..]

In the last week and more I am only booting manually to tweak U-Boot /
kernel config / extlinux.conf for Fedora. I will try again without
interrupting extlinux with my current tweaks.

Here is the logic of booting in PXE/EXT code:
https://github.com/u-boot/u-boot/blob/master/cmd/pxe.c#L818

I should end up calling booti and not bootm, but even bootm should be
fine as the 1st instruction is to jump over the header.

I am still confused about this illegal instruction as it shouldn't
happen to my current understanding.

david

> | [    0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
> | [    0.000000] Linux version 5.3.4+ (aurel32@ohm) (gcc version 9.2.1 20190821 (Debian 9.2.1-4)) #1 SMP Sun Oct 6 11:35:09 UTC 2019
> | [    0.000000] initrd not found or empty - disabling initrd
> | [    0.000000] Zone ranges:
> | [    0.000000]   DMA32    [mem 0x0000000080200000-0x00000000ffffffff]
> | [    0.000000]   Normal   [mem 0x0000000100000000-0x000000027fffffff]
> | [    0.000000] Movable zone start for each node
> | [    0.000000] Early memory node ranges
> | [    0.000000]   node   0: [mem 0x0000000080200000-0x000000027fffffff]
> | [    0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x000000027fffffff]
> | [    0.000000] software IO TLB: mapped [mem 0xfbfff000-0xfffff000] (64MB)
> | [    0.000000] CPU with hartid=0 is not available
> | [    0.000000] CPU with hartid=0 is not available
> | [    0.000000] elf_hwcap is 0x112d
> | [    0.000000] percpu: Embedded 18 pages/cpu s36120 r8192 d29416 u73728
> | [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 2067975
> | [    0.000000] Kernel command line: root=/dev/mmcblk0p2 rw console=ttySIF0 rootwait
> | [    0.000000] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear)
> | [    0.000000] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes, linear)
> | [    0.000000] Sorting __ex_table...
> | [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> | [    0.000000] Memory: 8184044K/8386560K available (6310K kernel code, 395K rwdata, 1985K rodata, 239K init, 317K bss, 202516K reserved, 0K cma-reserved)
> | [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> | [    0.000000] rcu: Hierarchical RCU implementation.
> | [    0.000000] rcu:     RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=4.
> | [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
> | [    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
> | [    0.000000] NR_IRQS: 0, nr_irqs: 0, preallocated irqs: 0
> | [    0.000000] plic: mapped 53 interrupts with 4 handlers for 9 contexts.
> | [    0.000000] riscv_timer_init_dt: Registering clocksource cpuid [0] hartid [1]
> | [    0.000000] clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0x1d854df40, max_idle_ns: 3526361616960 ns
> | [    0.000006] sched_clock: 64 bits at 1000kHz, resolution 1000ns, wraps every 2199023255500ns
> | [    0.000147] Console: colour dummy device 80x25
> | [    0.000184] Calibrating delay loop (skipped), value calculated using timer frequency.. 2.00 BogoMIPS (lpj=4000)
> | [    0.000198] pid_max: default: 32768 minimum: 301
> | [    0.000685] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
> | [    0.001026] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
> | [    0.002814] rcu: Hierarchical SRCU implementation.
> | [    0.003280] smp: Bringing up secondary CPUs ...
> | [    5.090625] CPU1: failed to come online
> | [    5.091815] smp: Brought up 1 node, 3 CPUs
>
> --
> Aurelien Jarno                          GPG: 4096R/1DDD8C9B
> aurelien@aurel32.net                 http://www.aurel32.net

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fail to bring hart online on HiFive Unleashed
  2019-10-08  6:14           ` David Abdurachmanov
@ 2019-10-08  6:33             ` Aurelien Jarno
  2019-10-08  7:17               ` Anup Patel
                                 ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Aurelien Jarno @ 2019-10-08  6:33 UTC (permalink / raw)
  To: David Abdurachmanov; +Cc: Atish Patra, linux-riscv

Le 8 octobre 2019 08:14:58 GMT+02:00, David Abdurachmanov <david.abdurachmanov@sifive.com> a écrit :
>On Tue, Oct 8, 2019 at 7:30 AM Aurelien Jarno <aurelien@aurel32.net>
>wrote:
>>
>> On 2019-10-07 22:19, Atish Patra wrote:
>> > Thanks for the detailed analysis. Can you please keep me and david
>in
>> > cc when you report the issue to U-boot ?
>>
>> Yep. I have progressed a bit on that, and now I am not convinced it's
>an
>> U-boot issue, it can be a GCC issue.
>>
>> Here are the conditions to reproduce the bug:
>> - U-boot runs on hart 1, 2 or 3
>> - the autoboot process is not interrupted
>> - extlinux is used to boot the kernel
>> - arch/riscv/lib/bootm.c is compiled with GCC 9 (works fine with GCC
>8)
>>
>> When the problem happens, the missing hart actually ends its
>execution
>> in an illegal instruction trap trying to execute the FDT (I only
>noticed
>> that recently as the message was hidden by the use of earlycon=sbi):
>>
>> | SiFive FSBL:       2018-03-20
>> | HiFive-U serial #: 00000246
>> |
>> | OpenSBI v0.4-50-g30f09fb (Oct  6 2019 21:58:05)
>> |    ____                    _____ ____ _____
>> |   / __ \                  / ____|  _ \_   _|
>> |  | |  | |_ __   ___ _ __ | (___ | |_) || |
>> |  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
>> |  | |__| | |_) |  __/ | | |____) | |_) || |_
>> |   \____/| .__/ \___|_| |_|_____/|____/_____|
>> |         | |
>> |         |_|
>> |
>> | Platform Name          : SiFive Freedom U540
>> | Platform HART Features : RV64ACDFIMSU
>> | Platform Max HARTs     : 5
>> | Current Hart           : 2
>> | Firmware Base          : 0x80000000
>> | Firmware Size          : 104 KB
>> | Runtime SBI Version    : 0.2
>> |
>> | PMP0: 0x0000000080000000-0x000000008001ffff (A)
>> | PMP1: 0x0000000000000000-0x0000007fffffffff (A,R,W,X)
>> |
>> |
>> | U-Boot 2019.10-rc4-00037-gdac51e9aaf-dirty (Oct 06 2019 - 21:56:51
>+0000)
>> |
>> | CPU:   rv64imafdc
>> | Model: SiFive HiFive Unleashed A00
>> | DRAM:  8 GiB
>> |
>> | MMC:   spi@10050000:mmc@0: 0
>> | In:    serial@10010000
>> | Out:   serial@10010000
>> | Err:   serial@10010000
>> | Net:   eth0: ethernet@10090000
>> | Hit any key to stop autoboot:  0
>> | switch to partitions #0, OK
>> | mmc0 is current device
>> | Scanning mmc 0:2...
>> | Found /boot/extlinux/extlinux.conf
>> | Retrieving file: /boot/extlinux/extlinux.conf
>> | 510 bytes read in 5 ms (99.6 KiB/s)
>> | U-Boot menu
>> | 1:      kernel 5.3.4
>> | 2:      Debian GNU/Linux kernel 5.3.0-trunk-riscv64
>> | Enter choice: 1
>> | 1:      kernel 5.3.4
>> | Retrieving file: /boot/vmlinux-5.3.4
>> | 9486076 bytes read in 4813 ms (1.9 MiB/s)
>> | append: root=/dev/mmcblk0p2 rw console=ttySIF0 rootwait
>> | Retrieving file: /boot/hifive-unleashed-a00.dtb
>> | 6088 bytes read in 7 ms (848.6 KiB/s)
>> | ## Flattened Device Tree blob at 88000000
>> |    Booting using the fdt blob at 0x88000000
>> |    Using Device Tree in place at 0000000088000000, end
>00000000880047c7
>> |
>> | Starting kernel ...
>> |
>> | exception code: 2 , Illegal instruction , epc  , ra 88000004
>88000000
>> | ### ERROR ### Please RESET the board ###
>
>I think, that's the same issue I had (or still have) a week ago.
>Just reminder that kernel 5.3 introduced a 64-byte header (thus no
>need to wrap kernel) at least for Image target. Thus it's booti that
>boots the kernel on U-Boot side.
>Thus the 1st instruction of that header is "j 0x40" (to the beginning
>of the actual kernel).  And 88000004 would definitely hold an illegal
>instruction.
>
>0000000000000000 <.data>:
>0:       81a0                    j       0x40
>2:       0000                    unimp
>4:       0000                    unimp
>6:       0100                    nop
>[..]

Hmm that's the beginning of the kernel code. The address 88000004
actually corresponds to the FDT. So the hart ending up in a trap
actually tries to boot the FDT instead of the kernel.

I haven't spotted any obvious differences between bootm.o compiled with
GCC 8 and GCC 9. I wonder if there is somehow a race condition because
some harts are already executing linux while the last one is still
executing U-boot.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fail to bring hart online on HiFive Unleashed
  2019-10-08  4:30         ` Aurelien Jarno
  2019-10-08  6:14           ` David Abdurachmanov
@ 2019-10-08  7:06           ` Anup Patel
  1 sibling, 0 replies; 24+ messages in thread
From: Anup Patel @ 2019-10-08  7:06 UTC (permalink / raw)
  To: Aurelien Jarno
  Cc: david.abdurachmanov, Atish Patra, linux-riscv, Bin Meng, Lukas Auer

+Bin and _Lukas for U-Boot insights.

On Tue, Oct 8, 2019 at 10:00 AM Aurelien Jarno <aurelien@aurel32.net> wrote:
>
> On 2019-10-07 22:19, Atish Patra wrote:
> > Thanks for the detailed analysis. Can you please keep me and david in
> > cc when you report the issue to U-boot ?
>
> Yep. I have progressed a bit on that, and now I am not convinced it's an
> U-boot issue, it can be a GCC issue.
>
> Here are the conditions to reproduce the bug:
> - U-boot runs on hart 1, 2 or 3
> - the autoboot process is not interrupted
> - extlinux is used to boot the kernel
> - arch/riscv/lib/bootm.c is compiled with GCC 9 (works fine with GCC 8)
>
> When the problem happens, the missing hart actually ends its execution
> in an illegal instruction trap trying to execute the FDT (I only noticed
> that recently as the message was hidden by the use of earlycon=sbi):
>
> | SiFive FSBL:       2018-03-20
> | HiFive-U serial #: 00000246
> |
> | OpenSBI v0.4-50-g30f09fb (Oct  6 2019 21:58:05)
> |    ____                    _____ ____ _____
> |   / __ \                  / ____|  _ \_   _|
> |  | |  | |_ __   ___ _ __ | (___ | |_) || |
> |  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
> |  | |__| | |_) |  __/ | | |____) | |_) || |_
> |   \____/| .__/ \___|_| |_|_____/|____/_____|
> |         | |
> |         |_|
> |
> | Platform Name          : SiFive Freedom U540
> | Platform HART Features : RV64ACDFIMSU
> | Platform Max HARTs     : 5
> | Current Hart           : 2
> | Firmware Base          : 0x80000000
> | Firmware Size          : 104 KB
> | Runtime SBI Version    : 0.2
> |
> | PMP0: 0x0000000080000000-0x000000008001ffff (A)
> | PMP1: 0x0000000000000000-0x0000007fffffffff (A,R,W,X)
> |
> |
> | U-Boot 2019.10-rc4-00037-gdac51e9aaf-dirty (Oct 06 2019 - 21:56:51 +0000)
> |
> | CPU:   rv64imafdc
> | Model: SiFive HiFive Unleashed A00
> | DRAM:  8 GiB
> |
> | MMC:   spi@10050000:mmc@0: 0
> | In:    serial@10010000
> | Out:   serial@10010000
> | Err:   serial@10010000
> | Net:   eth0: ethernet@10090000
> | Hit any key to stop autoboot:  0
> | switch to partitions #0, OK
> | mmc0 is current device
> | Scanning mmc 0:2...
> | Found /boot/extlinux/extlinux.conf
> | Retrieving file: /boot/extlinux/extlinux.conf
> | 510 bytes read in 5 ms (99.6 KiB/s)
> | U-Boot menu
> | 1:      kernel 5.3.4
> | 2:      Debian GNU/Linux kernel 5.3.0-trunk-riscv64
> | Enter choice: 1
> | 1:      kernel 5.3.4
> | Retrieving file: /boot/vmlinux-5.3.4
> | 9486076 bytes read in 4813 ms (1.9 MiB/s)
> | append: root=/dev/mmcblk0p2 rw console=ttySIF0 rootwait
> | Retrieving file: /boot/hifive-unleashed-a00.dtb
> | 6088 bytes read in 7 ms (848.6 KiB/s)
> | ## Flattened Device Tree blob at 88000000
> |    Booting using the fdt blob at 0x88000000
> |    Using Device Tree in place at 0000000088000000, end 00000000880047c7
> |
> | Starting kernel ...
> |
> | exception code: 2 , Illegal instruction , epc 88000004 , ra 88000000
> | ### ERROR ### Please RESET the board ###
> | [    0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
> | [    0.000000] Linux version 5.3.4+ (aurel32@ohm) (gcc version 9.2.1 20190821 (Debian 9.2.1-4)) #1 SMP Sun Oct 6 11:35:09 UTC 2019
> | [    0.000000] initrd not found or empty - disabling initrd
> | [    0.000000] Zone ranges:
> | [    0.000000]   DMA32    [mem 0x0000000080200000-0x00000000ffffffff]
> | [    0.000000]   Normal   [mem 0x0000000100000000-0x000000027fffffff]
> | [    0.000000] Movable zone start for each node
> | [    0.000000] Early memory node ranges
> | [    0.000000]   node   0: [mem 0x0000000080200000-0x000000027fffffff]
> | [    0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x000000027fffffff]
> | [    0.000000] software IO TLB: mapped [mem 0xfbfff000-0xfffff000] (64MB)
> | [    0.000000] CPU with hartid=0 is not available
> | [    0.000000] CPU with hartid=0 is not available
> | [    0.000000] elf_hwcap is 0x112d
> | [    0.000000] percpu: Embedded 18 pages/cpu s36120 r8192 d29416 u73728
> | [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 2067975
> | [    0.000000] Kernel command line: root=/dev/mmcblk0p2 rw console=ttySIF0 rootwait
> | [    0.000000] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear)
> | [    0.000000] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes, linear)
> | [    0.000000] Sorting __ex_table...
> | [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> | [    0.000000] Memory: 8184044K/8386560K available (6310K kernel code, 395K rwdata, 1985K rodata, 239K init, 317K bss, 202516K reserved, 0K cma-reserved)
> | [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> | [    0.000000] rcu: Hierarchical RCU implementation.
> | [    0.000000] rcu:     RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=4.
> | [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
> | [    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
> | [    0.000000] NR_IRQS: 0, nr_irqs: 0, preallocated irqs: 0
> | [    0.000000] plic: mapped 53 interrupts with 4 handlers for 9 contexts.
> | [    0.000000] riscv_timer_init_dt: Registering clocksource cpuid [0] hartid [1]
> | [    0.000000] clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0x1d854df40, max_idle_ns: 3526361616960 ns
> | [    0.000006] sched_clock: 64 bits at 1000kHz, resolution 1000ns, wraps every 2199023255500ns
> | [    0.000147] Console: colour dummy device 80x25
> | [    0.000184] Calibrating delay loop (skipped), value calculated using timer frequency.. 2.00 BogoMIPS (lpj=4000)
> | [    0.000198] pid_max: default: 32768 minimum: 301
> | [    0.000685] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
> | [    0.001026] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
> | [    0.002814] rcu: Hierarchical SRCU implementation.
> | [    0.003280] smp: Bringing up secondary CPUs ...
> | [    5.090625] CPU1: failed to come online
> | [    5.091815] smp: Brought up 1 node, 3 CPUs
>
> --
> Aurelien Jarno                          GPG: 4096R/1DDD8C9B
> aurelien@aurel32.net                 http://www.aurel32.net
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fail to bring hart online on HiFive Unleashed
  2019-10-08  6:33             ` Aurelien Jarno
@ 2019-10-08  7:17               ` Anup Patel
  2019-10-08 22:21               ` Troy Benjegerdes
  2019-10-09  1:34               ` Atish Patra
  2 siblings, 0 replies; 24+ messages in thread
From: Anup Patel @ 2019-10-08  7:17 UTC (permalink / raw)
  To: Aurelien Jarno
  Cc: David Abdurachmanov, Atish Patra, linux-riscv, Bin Meng, Lukas Auer

On Tue, Oct 8, 2019 at 12:03 PM Aurelien Jarno <aurelien@aurel32.net> wrote:
>
> Le 8 octobre 2019 08:14:58 GMT+02:00, David Abdurachmanov <david.abdurachmanov@sifive.com> a écrit :
> >On Tue, Oct 8, 2019 at 7:30 AM Aurelien Jarno <aurelien@aurel32.net>
> >wrote:
> >>
> >> On 2019-10-07 22:19, Atish Patra wrote:
> >> > Thanks for the detailed analysis. Can you please keep me and david
> >in
> >> > cc when you report the issue to U-boot ?
> >>
> >> Yep. I have progressed a bit on that, and now I am not convinced it's
> >an
> >> U-boot issue, it can be a GCC issue.
> >>
> >> Here are the conditions to reproduce the bug:
> >> - U-boot runs on hart 1, 2 or 3
> >> - the autoboot process is not interrupted
> >> - extlinux is used to boot the kernel
> >> - arch/riscv/lib/bootm.c is compiled with GCC 9 (works fine with GCC
> >8)
> >>
> >> When the problem happens, the missing hart actually ends its
> >execution
> >> in an illegal instruction trap trying to execute the FDT (I only
> >noticed
> >> that recently as the message was hidden by the use of earlycon=sbi):
> >>
> >> | SiFive FSBL:       2018-03-20
> >> | HiFive-U serial #: 00000246
> >> |
> >> | OpenSBI v0.4-50-g30f09fb (Oct  6 2019 21:58:05)
> >> |    ____                    _____ ____ _____
> >> |   / __ \                  / ____|  _ \_   _|
> >> |  | |  | |_ __   ___ _ __ | (___ | |_) || |
> >> |  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
> >> |  | |__| | |_) |  __/ | | |____) | |_) || |_
> >> |   \____/| .__/ \___|_| |_|_____/|____/_____|
> >> |         | |
> >> |         |_|
> >> |
> >> | Platform Name          : SiFive Freedom U540
> >> | Platform HART Features : RV64ACDFIMSU
> >> | Platform Max HARTs     : 5
> >> | Current Hart           : 2
> >> | Firmware Base          : 0x80000000
> >> | Firmware Size          : 104 KB
> >> | Runtime SBI Version    : 0.2
> >> |
> >> | PMP0: 0x0000000080000000-0x000000008001ffff (A)
> >> | PMP1: 0x0000000000000000-0x0000007fffffffff (A,R,W,X)
> >> |
> >> |
> >> | U-Boot 2019.10-rc4-00037-gdac51e9aaf-dirty (Oct 06 2019 - 21:56:51
> >+0000)
> >> |
> >> | CPU:   rv64imafdc
> >> | Model: SiFive HiFive Unleashed A00
> >> | DRAM:  8 GiB
> >> |
> >> | MMC:   spi@10050000:mmc@0: 0
> >> | In:    serial@10010000
> >> | Out:   serial@10010000
> >> | Err:   serial@10010000
> >> | Net:   eth0: ethernet@10090000
> >> | Hit any key to stop autoboot:  0
> >> | switch to partitions #0, OK
> >> | mmc0 is current device
> >> | Scanning mmc 0:2...
> >> | Found /boot/extlinux/extlinux.conf
> >> | Retrieving file: /boot/extlinux/extlinux.conf
> >> | 510 bytes read in 5 ms (99.6 KiB/s)
> >> | U-Boot menu
> >> | 1:      kernel 5.3.4
> >> | 2:      Debian GNU/Linux kernel 5.3.0-trunk-riscv64
> >> | Enter choice: 1
> >> | 1:      kernel 5.3.4
> >> | Retrieving file: /boot/vmlinux-5.3.4
> >> | 9486076 bytes read in 4813 ms (1.9 MiB/s)
> >> | append: root=/dev/mmcblk0p2 rw console=ttySIF0 rootwait
> >> | Retrieving file: /boot/hifive-unleashed-a00.dtb
> >> | 6088 bytes read in 7 ms (848.6 KiB/s)
> >> | ## Flattened Device Tree blob at 88000000
> >> |    Booting using the fdt blob at 0x88000000
> >> |    Using Device Tree in place at 0000000088000000, end
> >00000000880047c7
> >> |
> >> | Starting kernel ...
> >> |
> >> | exception code: 2 , Illegal instruction , epc  , ra 88000004
> >88000000
> >> | ### ERROR ### Please RESET the board ###
> >
> >I think, that's the same issue I had (or still have) a week ago.
> >Just reminder that kernel 5.3 introduced a 64-byte header (thus no
> >need to wrap kernel) at least for Image target. Thus it's booti that
> >boots the kernel on U-Boot side.
> >Thus the 1st instruction of that header is "j 0x40" (to the beginning
> >of the actual kernel).  And 88000004 would definitely hold an illegal
> >instruction.
> >
> >0000000000000000 <.data>:
> >0:       81a0                    j       0x40
> >2:       0000                    unimp
> >4:       0000                    unimp
> >6:       0100                    nop
> >[..]
>
> Hmm that's the beginning of the kernel code. The address 88000004
> actually corresponds to the FDT. So the hart ending up in a trap
> actually tries to boot the FDT instead of the kernel.
>
> I haven't spotted any obvious differences between bootm.o compiled with
> GCC 8 and GCC 9. I wonder if there is somehow a race condition because
> some harts are already executing linux while the last one is still
> executing U-boot.

I suspect this is because of fragile secondary HART spinning logic
everywhere (OpenSBI, U-Boot, and Linux). Once we have SBI HSM
extension available in OpenSBI, the secondary HARTs will sleep in
WFI and will be selectively woken-up using SBI calls from Linux.

Regards,
Anup

>
> --
> Aurelien Jarno                          GPG: 4096R/1DDD8C9B
> aurelien@aurel32.net                 http://www.aurel32.net
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fail to bring hart online on HiFive Unleashed
  2019-10-08  6:33             ` Aurelien Jarno
  2019-10-08  7:17               ` Anup Patel
@ 2019-10-08 22:21               ` Troy Benjegerdes
  2019-10-10 19:59                 ` Aurelien Jarno
  2019-10-09  1:34               ` Atish Patra
  2 siblings, 1 reply; 24+ messages in thread
From: Troy Benjegerdes @ 2019-10-08 22:21 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: David Abdurachmanov, Atish Patra, linux-riscv



> On Oct 8, 2019, at 1:33 AM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> 
> Le 8 octobre 2019 08:14:58 GMT+02:00, David Abdurachmanov <david.abdurachmanov@sifive.com> a écrit :
>> On Tue, Oct 8, 2019 at 7:30 AM Aurelien Jarno <aurelien@aurel32.net>
>> wrote:
>>> 
>>> On 2019-10-07 22:19, Atish Patra wrote:
>>>> Thanks for the detailed analysis. Can you please keep me and david
>> in
>>>> cc when you report the issue to U-boot ?
>>> 
>>> Yep. I have progressed a bit on that, and now I am not convinced it's
>> an
>>> U-boot issue, it can be a GCC issue.
>>> 
>>> Here are the conditions to reproduce the bug:
>>> - U-boot runs on hart 1, 2 or 3
>>> - the autoboot process is not interrupted
>>> - extlinux is used to boot the kernel
>>> - arch/riscv/lib/bootm.c is compiled with GCC 9 (works fine with GCC
>> 8)
>>> 
>>> When the problem happens, the missing hart actually ends its
>> execution
>>> in an illegal instruction trap trying to execute the FDT (I only
>> noticed
>>> that recently as the message was hidden by the use of earlycon=sbi):
>>> 
>>> | SiFive FSBL:       2018-03-20
>>> | HiFive-U serial #: 00000246
>>> |
>>> | OpenSBI v0.4-50-g30f09fb (Oct  6 2019 21:58:05)
>>> |    ____                    _____ ____ _____
>>> |   / __ \                  / ____|  _ \_   _|
>>> |  | |  | |_ __   ___ _ __ | (___ | |_) || |
>>> |  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
>>> |  | |__| | |_) |  __/ | | |____) | |_) || |_
>>> |   \____/| .__/ \___|_| |_|_____/|____/_____|
>>> |         | |
>>> |         |_|
>>> |
>>> | Platform Name          : SiFive Freedom U540
>>> | Platform HART Features : RV64ACDFIMSU
>>> | Platform Max HARTs     : 5
>>> | Current Hart           : 2
>>> | Firmware Base          : 0x80000000
>>> | Firmware Size          : 104 KB
>>> | Runtime SBI Version    : 0.2
>>> |
>>> | PMP0: 0x0000000080000000-0x000000008001ffff (A)
>>> | PMP1: 0x0000000000000000-0x0000007fffffffff (A,R,W,X)
>>> |
>>> |
>>> | U-Boot 2019.10-rc4-00037-gdac51e9aaf-dirty (Oct 06 2019 - 21:56:51
>> +0000)
>>> |
>>> | CPU:   rv64imafdc
>>> | Model: SiFive HiFive Unleashed A00
>>> | DRAM:  8 GiB
>>> |
>>> | MMC:   spi@10050000:mmc@0: 0
>>> | In:    serial@10010000
>>> | Out:   serial@10010000
>>> | Err:   serial@10010000
>>> | Net:   eth0: ethernet@10090000
>>> | Hit any key to stop autoboot:  0
>>> | switch to partitions #0, OK
>>> | mmc0 is current device
>>> | Scanning mmc 0:2...
>>> | Found /boot/extlinux/extlinux.conf
>>> | Retrieving file: /boot/extlinux/extlinux.conf
>>> | 510 bytes read in 5 ms (99.6 KiB/s)
>>> | U-Boot menu
>>> | 1:      kernel 5.3.4
>>> | 2:      Debian GNU/Linux kernel 5.3.0-trunk-riscv64
>>> | Enter choice: 1
>>> | 1:      kernel 5.3.4
>>> | Retrieving file: /boot/vmlinux-5.3.4
>>> | 9486076 bytes read in 4813 ms (1.9 MiB/s)
>>> | append: root=/dev/mmcblk0p2 rw console=ttySIF0 rootwait
>>> | Retrieving file: /boot/hifive-unleashed-a00.dtb
>>> | 6088 bytes read in 7 ms (848.6 KiB/s)
>>> | ## Flattened Device Tree blob at 88000000
>>> |    Booting using the fdt blob at 0x88000000
>>> |    Using Device Tree in place at 0000000088000000, end
>> 00000000880047c7
>>> |
>>> | Starting kernel ...
>>> |
>>> | exception code: 2 , Illegal instruction , epc  , ra 88000004
>> 88000000
>>> | ### ERROR ### Please RESET the board ###
>> 
>> I think, that's the same issue I had (or still have) a week ago.
>> Just reminder that kernel 5.3 introduced a 64-byte header (thus no
>> need to wrap kernel) at least for Image target. Thus it's booti that
>> boots the kernel on U-Boot side.
>> Thus the 1st instruction of that header is "j 0x40" (to the beginning
>> of the actual kernel).  And 88000004 would definitely hold an illegal
>> instruction.
>> 
>> 0000000000000000 <.data>:
>> 0:       81a0                    j       0x40
>> 2:       0000                    unimp
>> 4:       0000                    unimp
>> 6:       0100                    nop
>> [..]
> 
> Hmm that's the beginning of the kernel code. The address 88000004
> actually corresponds to the FDT. So the hart ending up in a trap
> actually tries to boot the FDT instead of the kernel.
> 
> I haven't spotted any obvious differences between bootm.o compiled with
> GCC 8 and GCC 9. I wonder if there is somehow a race condition because
> some harts are already executing linux while the last one is still
> executing U-boot.

This is from our GCC maintainer, Jim Wilson:

But we've fixed 3 combine optimization bugs on the gcc-9 branch
recently, and I've got a fourth one that I'm working on now, so there
is a good chance that this is a known and already fixed problem. 




_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fail to bring hart online on HiFive Unleashed
  2019-10-08  6:33             ` Aurelien Jarno
  2019-10-08  7:17               ` Anup Patel
  2019-10-08 22:21               ` Troy Benjegerdes
@ 2019-10-09  1:34               ` Atish Patra
  2019-10-10 19:58                 ` Aurelien Jarno
  2 siblings, 1 reply; 24+ messages in thread
From: Atish Patra @ 2019-10-09  1:34 UTC (permalink / raw)
  To: aurelien, david.abdurachmanov; +Cc: linux-riscv

On Tue, 2019-10-08 at 08:33 +0200, Aurelien Jarno wrote:
> Le 8 octobre 2019 08:14:58 GMT+02:00, David Abdurachmanov <
> david.abdurachmanov@sifive.com> a écrit :
> > On Tue, Oct 8, 2019 at 7:30 AM Aurelien Jarno <aurelien@aurel32.net
> > >
> > wrote:
> > > On 2019-10-07 22:19, Atish Patra wrote:
> > > > Thanks for the detailed analysis. Can you please keep me and
> > > > david
> > in
> > > > cc when you report the issue to U-boot ?
> > > 
> > > Yep. I have progressed a bit on that, and now I am not convinced
> > > it's
> > an
> > > U-boot issue, it can be a GCC issue.
> > > 
> > > Here are the conditions to reproduce the bug:
> > > - U-boot runs on hart 1, 2 or 3
> > > - the autoboot process is not interrupted
> > > - extlinux is used to boot the kernel
> > > - arch/riscv/lib/bootm.c is compiled with GCC 9 (works fine with
> > > GCC
> > 8)
> > > When the problem happens, the missing hart actually ends its
> > execution
> > > in an illegal instruction trap trying to execute the FDT (I only
> > noticed
> > > that recently as the message was hidden by the use of
> > > earlycon=sbi):
> > > 
> > > > SiFive FSBL:       2018-03-20
> > > > HiFive-U serial #: 00000246
> > > > 
> > > > OpenSBI v0.4-50-g30f09fb (Oct  6 2019 21:58:05)
> > > >    ____                    _____ ____ _____
> > > >   / __ \                  / ____|  _ \_   _|
> > > >  | |  | |_ __   ___ _ __ | (___ | |_) || |
> > > >  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
> > > >  | |__| | |_) |  __/ | | |____) | |_) || |_
> > > >   \____/| .__/ \___|_| |_|_____/|____/_____|
> > > >         | |
> > > >         |_|
> > > > 
> > > > Platform Name          : SiFive Freedom U540
> > > > Platform HART Features : RV64ACDFIMSU
> > > > Platform Max HARTs     : 5
> > > > Current Hart           : 2
> > > > Firmware Base          : 0x80000000
> > > > Firmware Size          : 104 KB
> > > > Runtime SBI Version    : 0.2
> > > > 
> > > > PMP0: 0x0000000080000000-0x000000008001ffff (A)
> > > > PMP1: 0x0000000000000000-0x0000007fffffffff (A,R,W,X)
> > > > 
> > > > 
> > > > U-Boot 2019.10-rc4-00037-gdac51e9aaf-dirty (Oct 06 2019 -
> > > > 21:56:51
> > +0000)
> > > > CPU:   rv64imafdc
> > > > Model: SiFive HiFive Unleashed A00
> > > > DRAM:  8 GiB
> > > > 
> > > > MMC:   spi@10050000:mmc@0: 0
> > > > In:    serial@10010000
> > > > Out:   serial@10010000
> > > > Err:   serial@10010000
> > > > Net:   eth0: ethernet@10090000
> > > > Hit any key to stop autoboot:  0
> > > > switch to partitions #0, OK
> > > > mmc0 is current device
> > > > Scanning mmc 0:2...
> > > > Found /boot/extlinux/extlinux.conf
> > > > Retrieving file: /boot/extlinux/extlinux.conf
> > > > 510 bytes read in 5 ms (99.6 KiB/s)
> > > > U-Boot menu
> > > > 1:      kernel 5.3.4
> > > > 2:      Debian GNU/Linux kernel 5.3.0-trunk-riscv64
> > > > Enter choice: 1
> > > > 1:      kernel 5.3.4
> > > > Retrieving file: /boot/vmlinux-5.3.4
> > > > 9486076 bytes read in 4813 ms (1.9 MiB/s)
> > > > append: root=/dev/mmcblk0p2 rw console=ttySIF0 rootwait
> > > > Retrieving file: /boot/hifive-unleashed-a00.dtb
> > > > 6088 bytes read in 7 ms (848.6 KiB/s)
> > > > ## Flattened Device Tree blob at 88000000
> > > >    Booting using the fdt blob at 0x88000000
> > > >    Using Device Tree in place at 0000000088000000, end
> > 00000000880047c7
> > > > Starting kernel ...
> > > > 
> > > > exception code: 2 , Illegal instruction , epc  , ra 88000004
> > 88000000
> > > > ### ERROR ### Please RESET the board ###
> > 
> > I think, that's the same issue I had (or still have) a week ago.
> > Just reminder that kernel 5.3 introduced a 64-byte header (thus no
> > need to wrap kernel) at least for Image target. Thus it's booti
> > that
> > boots the kernel on U-Boot side.
> > Thus the 1st instruction of that header is "j 0x40" (to the
> > beginning
> > of the actual kernel).  And 88000004 would definitely hold an
> > illegal
> > instruction.
> > 
> > 0000000000000000 <.data>:
> > 0:       81a0                    j       0x40
> > 2:       0000                    unimp
> > 4:       0000                    unimp
> > 6:       0100                    nop
> > [..]
> 
> Hmm that's the beginning of the kernel code. The address 88000004
> actually corresponds to the FDT. So the hart ending up in a trap
> actually tries to boot the FDT instead of the kernel.
> 

Do you see the issue if you manually use bootm instead of extlinux?

=> bootm $kernel_addr_r - $fdt_addr_r

This is a probably not related as bootm is jumping to wrong location
for some reason. However, it may be worth a shot as it fixes fdt
corruption. 

http://lists.infradead.org/pipermail/linux-riscv/2019-September/006911.html

(It is already merged in 5.4-rc2)

> I haven't spotted any obvious differences between bootm.o compiled
> with
> GCC 8 and GCC 9. I wonder if there is somehow a race condition
> because
> some harts are already executing linux while the last one is still
> executing U-boot.
> 

-- 
Regards,
Atish
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fail to bring hart online on HiFive Unleashed
  2019-10-09  1:34               ` Atish Patra
@ 2019-10-10 19:58                 ` Aurelien Jarno
  2019-10-15 21:38                   ` Auer, Lukas
  0 siblings, 1 reply; 24+ messages in thread
From: Aurelien Jarno @ 2019-10-10 19:58 UTC (permalink / raw)
  To: Atish Patra; +Cc: david.abdurachmanov, linux-riscv

On 2019-10-09 01:34, Atish Patra wrote:
> On Tue, 2019-10-08 at 08:33 +0200, Aurelien Jarno wrote:
> > Le 8 octobre 2019 08:14:58 GMT+02:00, David Abdurachmanov <
> > david.abdurachmanov@sifive.com> a écrit :
> > > On Tue, Oct 8, 2019 at 7:30 AM Aurelien Jarno <aurelien@aurel32.net
> > > >
> > > wrote:
> > > > On 2019-10-07 22:19, Atish Patra wrote:
> > > > > Thanks for the detailed analysis. Can you please keep me and
> > > > > david
> > > in
> > > > > cc when you report the issue to U-boot ?
> > > > 
> > > > Yep. I have progressed a bit on that, and now I am not convinced
> > > > it's
> > > an
> > > > U-boot issue, it can be a GCC issue.
> > > > 
> > > > Here are the conditions to reproduce the bug:
> > > > - U-boot runs on hart 1, 2 or 3
> > > > - the autoboot process is not interrupted
> > > > - extlinux is used to boot the kernel
> > > > - arch/riscv/lib/bootm.c is compiled with GCC 9 (works fine with
> > > > GCC
> > > 8)
> > > > When the problem happens, the missing hart actually ends its
> > > execution
> > > > in an illegal instruction trap trying to execute the FDT (I only
> > > noticed
> > > > that recently as the message was hidden by the use of
> > > > earlycon=sbi):
> > > > 
> > > > > SiFive FSBL:       2018-03-20
> > > > > HiFive-U serial #: 00000246
> > > > > 
> > > > > OpenSBI v0.4-50-g30f09fb (Oct  6 2019 21:58:05)
> > > > >    ____                    _____ ____ _____
> > > > >   / __ \                  / ____|  _ \_   _|
> > > > >  | |  | |_ __   ___ _ __ | (___ | |_) || |
> > > > >  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
> > > > >  | |__| | |_) |  __/ | | |____) | |_) || |_
> > > > >   \____/| .__/ \___|_| |_|_____/|____/_____|
> > > > >         | |
> > > > >         |_|
> > > > > 
> > > > > Platform Name          : SiFive Freedom U540
> > > > > Platform HART Features : RV64ACDFIMSU
> > > > > Platform Max HARTs     : 5
> > > > > Current Hart           : 2
> > > > > Firmware Base          : 0x80000000
> > > > > Firmware Size          : 104 KB
> > > > > Runtime SBI Version    : 0.2
> > > > > 
> > > > > PMP0: 0x0000000080000000-0x000000008001ffff (A)
> > > > > PMP1: 0x0000000000000000-0x0000007fffffffff (A,R,W,X)
> > > > > 
> > > > > 
> > > > > U-Boot 2019.10-rc4-00037-gdac51e9aaf-dirty (Oct 06 2019 -
> > > > > 21:56:51
> > > +0000)
> > > > > CPU:   rv64imafdc
> > > > > Model: SiFive HiFive Unleashed A00
> > > > > DRAM:  8 GiB
> > > > > 
> > > > > MMC:   spi@10050000:mmc@0: 0
> > > > > In:    serial@10010000
> > > > > Out:   serial@10010000
> > > > > Err:   serial@10010000
> > > > > Net:   eth0: ethernet@10090000
> > > > > Hit any key to stop autoboot:  0
> > > > > switch to partitions #0, OK
> > > > > mmc0 is current device
> > > > > Scanning mmc 0:2...
> > > > > Found /boot/extlinux/extlinux.conf
> > > > > Retrieving file: /boot/extlinux/extlinux.conf
> > > > > 510 bytes read in 5 ms (99.6 KiB/s)
> > > > > U-Boot menu
> > > > > 1:      kernel 5.3.4
> > > > > 2:      Debian GNU/Linux kernel 5.3.0-trunk-riscv64
> > > > > Enter choice: 1
> > > > > 1:      kernel 5.3.4
> > > > > Retrieving file: /boot/vmlinux-5.3.4
> > > > > 9486076 bytes read in 4813 ms (1.9 MiB/s)
> > > > > append: root=/dev/mmcblk0p2 rw console=ttySIF0 rootwait
> > > > > Retrieving file: /boot/hifive-unleashed-a00.dtb
> > > > > 6088 bytes read in 7 ms (848.6 KiB/s)
> > > > > ## Flattened Device Tree blob at 88000000
> > > > >    Booting using the fdt blob at 0x88000000
> > > > >    Using Device Tree in place at 0000000088000000, end
> > > 00000000880047c7
> > > > > Starting kernel ...
> > > > > 
> > > > > exception code: 2 , Illegal instruction , epc  , ra 88000004
> > > 88000000
> > > > > ### ERROR ### Please RESET the board ###
> > > 
> > > I think, that's the same issue I had (or still have) a week ago.
> > > Just reminder that kernel 5.3 introduced a 64-byte header (thus no
> > > need to wrap kernel) at least for Image target. Thus it's booti
> > > that
> > > boots the kernel on U-Boot side.
> > > Thus the 1st instruction of that header is "j 0x40" (to the
> > > beginning
> > > of the actual kernel).  And 88000004 would definitely hold an
> > > illegal
> > > instruction.
> > > 
> > > 0000000000000000 <.data>:
> > > 0:       81a0                    j       0x40
> > > 2:       0000                    unimp
> > > 4:       0000                    unimp
> > > 6:       0100                    nop
> > > [..]
> > 
> > Hmm that's the beginning of the kernel code. The address 88000004
> > actually corresponds to the FDT. So the hart ending up in a trap
> > actually tries to boot the FDT instead of the kernel.
> > 
> 
> Do you see the issue if you manually use bootm instead of extlinux?
> 
> => bootm $kernel_addr_r - $fdt_addr_r
> 
> This is a probably not related as bootm is jumping to wrong location
> for some reason. However, it may be worth a shot as it fixes fdt
> corruption. 

I have just tested, and it doesn't work. On the other hand I have try to
run that manually, and interrupting the boot process usually hides the
problem.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fail to bring hart online on HiFive Unleashed
  2019-10-08 22:21               ` Troy Benjegerdes
@ 2019-10-10 19:59                 ` Aurelien Jarno
  2019-10-11 14:05                   ` David Abdurachmanov
  0 siblings, 1 reply; 24+ messages in thread
From: Aurelien Jarno @ 2019-10-10 19:59 UTC (permalink / raw)
  To: Troy Benjegerdes; +Cc: David Abdurachmanov, Atish Patra, linux-riscv

On 2019-10-08 17:21, Troy Benjegerdes wrote:
> 
> 
> > On Oct 8, 2019, at 1:33 AM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> > 
> > Le 8 octobre 2019 08:14:58 GMT+02:00, David Abdurachmanov <david.abdurachmanov@sifive.com> a écrit :
> >> On Tue, Oct 8, 2019 at 7:30 AM Aurelien Jarno <aurelien@aurel32.net>
> >> wrote:
> >>> 
> >>> On 2019-10-07 22:19, Atish Patra wrote:
> >>>> Thanks for the detailed analysis. Can you please keep me and david
> >> in
> >>>> cc when you report the issue to U-boot ?
> >>> 
> >>> Yep. I have progressed a bit on that, and now I am not convinced it's
> >> an
> >>> U-boot issue, it can be a GCC issue.
> >>> 
> >>> Here are the conditions to reproduce the bug:
> >>> - U-boot runs on hart 1, 2 or 3
> >>> - the autoboot process is not interrupted
> >>> - extlinux is used to boot the kernel
> >>> - arch/riscv/lib/bootm.c is compiled with GCC 9 (works fine with GCC
> >> 8)
> >>> 
> >>> When the problem happens, the missing hart actually ends its
> >> execution
> >>> in an illegal instruction trap trying to execute the FDT (I only
> >> noticed
> >>> that recently as the message was hidden by the use of earlycon=sbi):
> >>> 
> >>> | SiFive FSBL:       2018-03-20
> >>> | HiFive-U serial #: 00000246
> >>> |
> >>> | OpenSBI v0.4-50-g30f09fb (Oct  6 2019 21:58:05)
> >>> |    ____                    _____ ____ _____
> >>> |   / __ \                  / ____|  _ \_   _|
> >>> |  | |  | |_ __   ___ _ __ | (___ | |_) || |
> >>> |  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
> >>> |  | |__| | |_) |  __/ | | |____) | |_) || |_
> >>> |   \____/| .__/ \___|_| |_|_____/|____/_____|
> >>> |         | |
> >>> |         |_|
> >>> |
> >>> | Platform Name          : SiFive Freedom U540
> >>> | Platform HART Features : RV64ACDFIMSU
> >>> | Platform Max HARTs     : 5
> >>> | Current Hart           : 2
> >>> | Firmware Base          : 0x80000000
> >>> | Firmware Size          : 104 KB
> >>> | Runtime SBI Version    : 0.2
> >>> |
> >>> | PMP0: 0x0000000080000000-0x000000008001ffff (A)
> >>> | PMP1: 0x0000000000000000-0x0000007fffffffff (A,R,W,X)
> >>> |
> >>> |
> >>> | U-Boot 2019.10-rc4-00037-gdac51e9aaf-dirty (Oct 06 2019 - 21:56:51
> >> +0000)
> >>> |
> >>> | CPU:   rv64imafdc
> >>> | Model: SiFive HiFive Unleashed A00
> >>> | DRAM:  8 GiB
> >>> |
> >>> | MMC:   spi@10050000:mmc@0: 0
> >>> | In:    serial@10010000
> >>> | Out:   serial@10010000
> >>> | Err:   serial@10010000
> >>> | Net:   eth0: ethernet@10090000
> >>> | Hit any key to stop autoboot:  0
> >>> | switch to partitions #0, OK
> >>> | mmc0 is current device
> >>> | Scanning mmc 0:2...
> >>> | Found /boot/extlinux/extlinux.conf
> >>> | Retrieving file: /boot/extlinux/extlinux.conf
> >>> | 510 bytes read in 5 ms (99.6 KiB/s)
> >>> | U-Boot menu
> >>> | 1:      kernel 5.3.4
> >>> | 2:      Debian GNU/Linux kernel 5.3.0-trunk-riscv64
> >>> | Enter choice: 1
> >>> | 1:      kernel 5.3.4
> >>> | Retrieving file: /boot/vmlinux-5.3.4
> >>> | 9486076 bytes read in 4813 ms (1.9 MiB/s)
> >>> | append: root=/dev/mmcblk0p2 rw console=ttySIF0 rootwait
> >>> | Retrieving file: /boot/hifive-unleashed-a00.dtb
> >>> | 6088 bytes read in 7 ms (848.6 KiB/s)
> >>> | ## Flattened Device Tree blob at 88000000
> >>> |    Booting using the fdt blob at 0x88000000
> >>> |    Using Device Tree in place at 0000000088000000, end
> >> 00000000880047c7
> >>> |
> >>> | Starting kernel ...
> >>> |
> >>> | exception code: 2 , Illegal instruction , epc  , ra 88000004
> >> 88000000
> >>> | ### ERROR ### Please RESET the board ###
> >> 
> >> I think, that's the same issue I had (or still have) a week ago.
> >> Just reminder that kernel 5.3 introduced a 64-byte header (thus no
> >> need to wrap kernel) at least for Image target. Thus it's booti that
> >> boots the kernel on U-Boot side.
> >> Thus the 1st instruction of that header is "j 0x40" (to the beginning
> >> of the actual kernel).  And 88000004 would definitely hold an illegal
> >> instruction.
> >> 
> >> 0000000000000000 <.data>:
> >> 0:       81a0                    j       0x40
> >> 2:       0000                    unimp
> >> 4:       0000                    unimp
> >> 6:       0100                    nop
> >> [..]
> > 
> > Hmm that's the beginning of the kernel code. The address 88000004
> > actually corresponds to the FDT. So the hart ending up in a trap
> > actually tries to boot the FDT instead of the kernel.
> > 
> > I haven't spotted any obvious differences between bootm.o compiled with
> > GCC 8 and GCC 9. I wonder if there is somehow a race condition because
> > some harts are already executing linux while the last one is still
> > executing U-boot.
> 
> This is from our GCC maintainer, Jim Wilson:
> 
> But we've fixed 3 combine optimization bugs on the gcc-9 branch
> recently, and I've got a fourth one that I'm working on now, so there
> is a good chance that this is a known and already fixed problem. 

I have just tried with the gcc-9 branch from 2 days ago (with PR 
target/91635 fixed) and the problem is still there.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fail to bring hart online on HiFive Unleashed
  2019-10-10 19:59                 ` Aurelien Jarno
@ 2019-10-11 14:05                   ` David Abdurachmanov
  0 siblings, 0 replies; 24+ messages in thread
From: David Abdurachmanov @ 2019-10-11 14:05 UTC (permalink / raw)
  To: Aurelien Jarno
  Cc: Troy Benjegerdes, David Abdurachmanov, linux-riscv, Atish Patra

On Thu, Oct 10, 2019 at 11:00 PM Aurelien Jarno <aurelien@aurel32.net> wrote:
>
> On 2019-10-08 17:21, Troy Benjegerdes wrote:
> >
> >
> > > On Oct 8, 2019, at 1:33 AM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> > >
> > > Le 8 octobre 2019 08:14:58 GMT+02:00, David Abdurachmanov <david.abdurachmanov@sifive.com> a écrit :
> > >> On Tue, Oct 8, 2019 at 7:30 AM Aurelien Jarno <aurelien@aurel32.net>
> > >> wrote:
> > >>>
> > >>> On 2019-10-07 22:19, Atish Patra wrote:
> > >>>> Thanks for the detailed analysis. Can you please keep me and david
> > >> in
> > >>>> cc when you report the issue to U-boot ?
> > >>>
> > >>> Yep. I have progressed a bit on that, and now I am not convinced it's
> > >> an
> > >>> U-boot issue, it can be a GCC issue.
> > >>>
> > >>> Here are the conditions to reproduce the bug:
> > >>> - U-boot runs on hart 1, 2 or 3
> > >>> - the autoboot process is not interrupted
> > >>> - extlinux is used to boot the kernel
> > >>> - arch/riscv/lib/bootm.c is compiled with GCC 9 (works fine with GCC
> > >> 8)
> > >>>
> > >>> When the problem happens, the missing hart actually ends its
> > >> execution
> > >>> in an illegal instruction trap trying to execute the FDT (I only
> > >> noticed
> > >>> that recently as the message was hidden by the use of earlycon=sbi):
> > >>>
> > >>> | SiFive FSBL:       2018-03-20
> > >>> | HiFive-U serial #: 00000246
> > >>> |
> > >>> | OpenSBI v0.4-50-g30f09fb (Oct  6 2019 21:58:05)
> > >>> |    ____                    _____ ____ _____
> > >>> |   / __ \                  / ____|  _ \_   _|
> > >>> |  | |  | |_ __   ___ _ __ | (___ | |_) || |
> > >>> |  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
> > >>> |  | |__| | |_) |  __/ | | |____) | |_) || |_
> > >>> |   \____/| .__/ \___|_| |_|_____/|____/_____|
> > >>> |         | |
> > >>> |         |_|
> > >>> |
> > >>> | Platform Name          : SiFive Freedom U540
> > >>> | Platform HART Features : RV64ACDFIMSU
> > >>> | Platform Max HARTs     : 5
> > >>> | Current Hart           : 2
> > >>> | Firmware Base          : 0x80000000
> > >>> | Firmware Size          : 104 KB
> > >>> | Runtime SBI Version    : 0.2
> > >>> |
> > >>> | PMP0: 0x0000000080000000-0x000000008001ffff (A)
> > >>> | PMP1: 0x0000000000000000-0x0000007fffffffff (A,R,W,X)
> > >>> |
> > >>> |
> > >>> | U-Boot 2019.10-rc4-00037-gdac51e9aaf-dirty (Oct 06 2019 - 21:56:51
> > >> +0000)
> > >>> |
> > >>> | CPU:   rv64imafdc
> > >>> | Model: SiFive HiFive Unleashed A00
> > >>> | DRAM:  8 GiB
> > >>> |
> > >>> | MMC:   spi@10050000:mmc@0: 0
> > >>> | In:    serial@10010000
> > >>> | Out:   serial@10010000
> > >>> | Err:   serial@10010000
> > >>> | Net:   eth0: ethernet@10090000
> > >>> | Hit any key to stop autoboot:  0
> > >>> | switch to partitions #0, OK
> > >>> | mmc0 is current device
> > >>> | Scanning mmc 0:2...
> > >>> | Found /boot/extlinux/extlinux.conf
> > >>> | Retrieving file: /boot/extlinux/extlinux.conf
> > >>> | 510 bytes read in 5 ms (99.6 KiB/s)
> > >>> | U-Boot menu
> > >>> | 1:      kernel 5.3.4
> > >>> | 2:      Debian GNU/Linux kernel 5.3.0-trunk-riscv64
> > >>> | Enter choice: 1
> > >>> | 1:      kernel 5.3.4
> > >>> | Retrieving file: /boot/vmlinux-5.3.4
> > >>> | 9486076 bytes read in 4813 ms (1.9 MiB/s)
> > >>> | append: root=/dev/mmcblk0p2 rw console=ttySIF0 rootwait
> > >>> | Retrieving file: /boot/hifive-unleashed-a00.dtb
> > >>> | 6088 bytes read in 7 ms (848.6 KiB/s)
> > >>> | ## Flattened Device Tree blob at 88000000
> > >>> |    Booting using the fdt blob at 0x88000000
> > >>> |    Using Device Tree in place at 0000000088000000, end
> > >> 00000000880047c7
> > >>> |
> > >>> | Starting kernel ...
> > >>> |
> > >>> | exception code: 2 , Illegal instruction , epc  , ra 88000004
> > >> 88000000
> > >>> | ### ERROR ### Please RESET the board ###
> > >>
> > >> I think, that's the same issue I had (or still have) a week ago.
> > >> Just reminder that kernel 5.3 introduced a 64-byte header (thus no
> > >> need to wrap kernel) at least for Image target. Thus it's booti that
> > >> boots the kernel on U-Boot side.
> > >> Thus the 1st instruction of that header is "j 0x40" (to the beginning
> > >> of the actual kernel).  And 88000004 would definitely hold an illegal
> > >> instruction.
> > >>
> > >> 0000000000000000 <.data>:
> > >> 0:       81a0                    j       0x40
> > >> 2:       0000                    unimp
> > >> 4:       0000                    unimp
> > >> 6:       0100                    nop
> > >> [..]
> > >
> > > Hmm that's the beginning of the kernel code. The address 88000004
> > > actually corresponds to the FDT. So the hart ending up in a trap
> > > actually tries to boot the FDT instead of the kernel.
> > >
> > > I haven't spotted any obvious differences between bootm.o compiled with
> > > GCC 8 and GCC 9. I wonder if there is somehow a race condition because
> > > some harts are already executing linux while the last one is still
> > > executing U-boot.
> >
> > This is from our GCC maintainer, Jim Wilson:
> >
> > But we've fixed 3 combine optimization bugs on the gcc-9 branch
> > recently, and I've got a fourth one that I'm working on now, so there
> > is a good chance that this is a known and already fixed problem.
>
> I have just tried with the gcc-9 branch from 2 days ago (with PR
> target/91635 fixed) and the problem is still there.

So I tried to boot again with EXTLINUX (instead of interrupting
extlinux and then booting manually) on my current Fedora 31 setup. It
booted properly (I didn't expect that). Hopefully the next one from
Koji also works.

david

>
> --
> Aurelien Jarno                          GPG: 4096R/1DDD8C9B
> aurelien@aurel32.net                 http://www.aurel32.net
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fail to bring hart online on HiFive Unleashed
  2019-10-03 20:07 Fail to bring hart online on HiFive Unleashed Aurelien Jarno
  2019-10-03 23:13 ` Atish Patra
@ 2019-10-14  9:23 ` Andreas Schwab
  1 sibling, 0 replies; 24+ messages in thread
From: Andreas Schwab @ 2019-10-14  9:23 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: linux-riscv

On Okt 03 2019, Aurelien Jarno <aurelien@aurel32.net> wrote:

> Any idea what could be the issue?

The last time it happend to me it was due to insufficient initialisation
in opensbi.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fail to bring hart online on HiFive Unleashed
  2019-10-10 19:58                 ` Aurelien Jarno
@ 2019-10-15 21:38                   ` Auer, Lukas
  2019-10-15 22:22                     ` Aurelien Jarno
  0 siblings, 1 reply; 24+ messages in thread
From: Auer, Lukas @ 2019-10-15 21:38 UTC (permalink / raw)
  To: aurelien, Atish.Patra; +Cc: david.abdurachmanov, linux-riscv

On Thu, 2019-10-10 at 21:58 +0200, Aurelien Jarno wrote:
> On 2019-10-09 01:34, Atish Patra wrote:
> > On Tue, 2019-10-08 at 08:33 +0200, Aurelien Jarno wrote:
> > > Le 8 octobre 2019 08:14:58 GMT+02:00, David Abdurachmanov <
> > > david.abdurachmanov@sifive.com> a écrit :
> > > > On Tue, Oct 8, 2019 at 7:30 AM Aurelien Jarno <aurelien@aurel32.net
> > > > wrote:
> > > > > On 2019-10-07 22:19, Atish Patra wrote:
> > > > > > Thanks for the detailed analysis. Can you please keep me and
> > > > > > david
> > > > in
> > > > > > cc when you report the issue to U-boot ?
> > > > > 
> > > > > Yep. I have progressed a bit on that, and now I am not convinced
> > > > > it's
> > > > an
> > > > > U-boot issue, it can be a GCC issue.
> > > > > 
> > > > > Here are the conditions to reproduce the bug:
> > > > > - U-boot runs on hart 1, 2 or 3
> > > > > - the autoboot process is not interrupted
> > > > > - extlinux is used to boot the kernel
> > > > > - arch/riscv/lib/bootm.c is compiled with GCC 9 (works fine with
> > > > > GCC
> > > > 8)
> > > > > When the problem happens, the missing hart actually ends its
> > > > execution
> > > > > in an illegal instruction trap trying to execute the FDT (I only
> > > > noticed
> > > > > that recently as the message was hidden by the use of
> > > > > earlycon=sbi):
> > > > > 
> > > > > > SiFive FSBL:       2018-03-20
> > > > > > HiFive-U serial #: 00000246
> > > > > > 
> > > > > > OpenSBI v0.4-50-g30f09fb (Oct  6 2019 21:58:05)
> > > > > >    ____                    _____ ____ _____
> > > > > >   / __ \                  / ____|  _ \_   _|
> > > > > >  | |  | |_ __   ___ _ __ | (___ | |_) || |
> > > > > >  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
> > > > > >  | |__| | |_) |  __/ | | |____) | |_) || |_
> > > > > >   \____/| .__/ \___|_| |_|_____/|____/_____|
> > > > > >         | |
> > > > > >         |_|
> > > > > > 
> > > > > > Platform Name          : SiFive Freedom U540
> > > > > > Platform HART Features : RV64ACDFIMSU
> > > > > > Platform Max HARTs     : 5
> > > > > > Current Hart           : 2
> > > > > > Firmware Base          : 0x80000000
> > > > > > Firmware Size          : 104 KB
> > > > > > Runtime SBI Version    : 0.2
> > > > > > 
> > > > > > PMP0: 0x0000000080000000-0x000000008001ffff (A)
> > > > > > PMP1: 0x0000000000000000-0x0000007fffffffff (A,R,W,X)
> > > > > > 
> > > > > > 
> > > > > > U-Boot 2019.10-rc4-00037-gdac51e9aaf-dirty (Oct 06 2019 -
> > > > > > 21:56:51
> > > > +0000)
> > > > > > CPU:   rv64imafdc
> > > > > > Model: SiFive HiFive Unleashed A00
> > > > > > DRAM:  8 GiB
> > > > > > 
> > > > > > MMC:   spi@10050000:mmc@0: 0
> > > > > > In:    serial@10010000
> > > > > > Out:   serial@10010000
> > > > > > Err:   serial@10010000
> > > > > > Net:   eth0: ethernet@10090000
> > > > > > Hit any key to stop autoboot:  0
> > > > > > switch to partitions #0, OK
> > > > > > mmc0 is current device
> > > > > > Scanning mmc 0:2...
> > > > > > Found /boot/extlinux/extlinux.conf
> > > > > > Retrieving file: /boot/extlinux/extlinux.conf
> > > > > > 510 bytes read in 5 ms (99.6 KiB/s)
> > > > > > U-Boot menu
> > > > > > 1:      kernel 5.3.4
> > > > > > 2:      Debian GNU/Linux kernel 5.3.0-trunk-riscv64
> > > > > > Enter choice: 1
> > > > > > 1:      kernel 5.3.4
> > > > > > Retrieving file: /boot/vmlinux-5.3.4
> > > > > > 9486076 bytes read in 4813 ms (1.9 MiB/s)
> > > > > > append: root=/dev/mmcblk0p2 rw console=ttySIF0 rootwait
> > > > > > Retrieving file: /boot/hifive-unleashed-a00.dtb
> > > > > > 6088 bytes read in 7 ms (848.6 KiB/s)
> > > > > > ## Flattened Device Tree blob at 88000000
> > > > > >    Booting using the fdt blob at 0x88000000
> > > > > >    Using Device Tree in place at 0000000088000000, end
> > > > 00000000880047c7
> > > > > > Starting kernel ...
> > > > > > 
> > > > > > exception code: 2 , Illegal instruction , epc  , ra 88000004
> > > > 88000000
> > > > > > ### ERROR ### Please RESET the board ###
> > > > 
> > > > I think, that's the same issue I had (or still have) a week ago.
> > > > Just reminder that kernel 5.3 introduced a 64-byte header (thus no
> > > > need to wrap kernel) at least for Image target. Thus it's booti
> > > > that
> > > > boots the kernel on U-Boot side.
> > > > Thus the 1st instruction of that header is "j 0x40" (to the
> > > > beginning
> > > > of the actual kernel).  And 88000004 would definitely hold an
> > > > illegal
> > > > instruction.
> > > > 
> > > > 0000000000000000 <.data>:
> > > > 0:       81a0                    j       0x40
> > > > 2:       0000                    unimp
> > > > 4:       0000                    unimp
> > > > 6:       0100                    nop
> > > > [..]
> > > 
> > > Hmm that's the beginning of the kernel code. The address 88000004
> > > actually corresponds to the FDT. So the hart ending up in a trap
> > > actually tries to boot the FDT instead of the kernel.
> > > 
> > 
> > Do you see the issue if you manually use bootm instead of extlinux?
> > 
> > => bootm $kernel_addr_r - $fdt_addr_r
> > 
> > This is a probably not related as bootm is jumping to wrong location
> > for some reason. However, it may be worth a shot as it fixes fdt
> > corruption. 
> 
> I have just tested, and it doesn't work. On the other hand I have try to
> run that manually, and interrupting the boot process usually hides the
> problem.
> 

I tried to reproduce the issue today, but was not able to. If you can
upload the relevant files somewhere, I can retry it with them. I have
also added information on the boot flow in U-Boot below in hopes that
it is helpful for debugging.

U-Boot divides the harts in the system into the main hart (running   
U-Boot) and the secondary harts (all others). The main hart is
responsible for notifying the secondary harts of where to jump to. To
communicate with them, it uses IPIs and the U-Boot global data data
structure (register gp stores a pointer to it), located at the end of
RAM. Other variables in global data that could be helpful for debugging
are arch.boot_hart (the main hart running U-Boot) and
arch.available_harts (a bitmask of all harts that have entered U-Boot). 
They are defined in 
https://gitlab.denx.de/u-boot/u-boot/blob/master/arch/riscv/include/asm/global_data.h
.

Booting Linux will usually use the bootm command / functions at some
point. Before jumping to the kernel, the main hart instructs the
secondary harts to jump to the kernel image. The relevant code for this
is at 
https://gitlab.denx.de/u-boot/u-boot/blob/master/arch/riscv/lib/bootm.c#L101
. This will send an IPI to all secondary harts. They are received in
arch/riscv/cpu/start.S and are eventually handled in handle_ipi() at 
https://gitlab.denx.de/u-boot/u-boot/blob/master/arch/riscv/lib/smp.c#L86
.

What I find strange with the error you are seeing is that one of the
harts is jumping to the device tree binary. As you mentioned, it could
be that we have a race condition somewhere, for example causing
something to be overwritten in global data while some harts are still
running U-Boot. However, I would expect more or less random data and
not the address of the device tree binary in that case. For that reason
I would tend to rule out this scenario. Since only one hart is failing
to enter Linux, I assume that all secondary harts successfully boot
Linux and only the main hart is having problems. That would mean that
something is going wrong in arch/riscv/lib/bootm.c .

Andreas also brought up a good point. We did have a similar problem
before, which was caused by insufficient initialization. The workaround
to fix this was to use the power switch instead of the reset button to
reset the board. I haven't tested it, but I believe initialization in
OpenSBI should be better now, meaning that this might not be a problem
anymore. However, there might also be a similar problem in U-Boot.

Regards,
Lukas
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fail to bring hart online on HiFive Unleashed
  2019-10-15 21:38                   ` Auer, Lukas
@ 2019-10-15 22:22                     ` Aurelien Jarno
  2019-10-16 20:49                       ` Auer, Lukas
  0 siblings, 1 reply; 24+ messages in thread
From: Aurelien Jarno @ 2019-10-15 22:22 UTC (permalink / raw)
  To: Auer, Lukas; +Cc: david.abdurachmanov, Atish.Patra, linux-riscv

On 2019-10-15 21:38, Auer, Lukas wrote:
> On Thu, 2019-10-10 at 21:58 +0200, Aurelien Jarno wrote:
> > On 2019-10-09 01:34, Atish Patra wrote:
> > > On Tue, 2019-10-08 at 08:33 +0200, Aurelien Jarno wrote:
> > > > Le 8 octobre 2019 08:14:58 GMT+02:00, David Abdurachmanov <
> > > > david.abdurachmanov@sifive.com> a écrit :
> > > > > On Tue, Oct 8, 2019 at 7:30 AM Aurelien Jarno <aurelien@aurel32.net
> > > > > wrote:
> > > > > > On 2019-10-07 22:19, Atish Patra wrote:
> > > > > > > Thanks for the detailed analysis. Can you please keep me and
> > > > > > > david
> > > > > in
> > > > > > > cc when you report the issue to U-boot ?
> > > > > > 
> > > > > > Yep. I have progressed a bit on that, and now I am not convinced
> > > > > > it's
> > > > > an
> > > > > > U-boot issue, it can be a GCC issue.
> > > > > > 
> > > > > > Here are the conditions to reproduce the bug:
> > > > > > - U-boot runs on hart 1, 2 or 3
> > > > > > - the autoboot process is not interrupted
> > > > > > - extlinux is used to boot the kernel
> > > > > > - arch/riscv/lib/bootm.c is compiled with GCC 9 (works fine with
> > > > > > GCC
> > > > > 8)
> > > > > > When the problem happens, the missing hart actually ends its
> > > > > execution
> > > > > > in an illegal instruction trap trying to execute the FDT (I only
> > > > > noticed
> > > > > > that recently as the message was hidden by the use of
> > > > > > earlycon=sbi):
> > > > > > 
> > > > > > > SiFive FSBL:       2018-03-20
> > > > > > > HiFive-U serial #: 00000246
> > > > > > > 
> > > > > > > OpenSBI v0.4-50-g30f09fb (Oct  6 2019 21:58:05)
> > > > > > >    ____                    _____ ____ _____
> > > > > > >   / __ \                  / ____|  _ \_   _|
> > > > > > >  | |  | |_ __   ___ _ __ | (___ | |_) || |
> > > > > > >  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
> > > > > > >  | |__| | |_) |  __/ | | |____) | |_) || |_
> > > > > > >   \____/| .__/ \___|_| |_|_____/|____/_____|
> > > > > > >         | |
> > > > > > >         |_|
> > > > > > > 
> > > > > > > Platform Name          : SiFive Freedom U540
> > > > > > > Platform HART Features : RV64ACDFIMSU
> > > > > > > Platform Max HARTs     : 5
> > > > > > > Current Hart           : 2
> > > > > > > Firmware Base          : 0x80000000
> > > > > > > Firmware Size          : 104 KB
> > > > > > > Runtime SBI Version    : 0.2
> > > > > > > 
> > > > > > > PMP0: 0x0000000080000000-0x000000008001ffff (A)
> > > > > > > PMP1: 0x0000000000000000-0x0000007fffffffff (A,R,W,X)
> > > > > > > 
> > > > > > > 
> > > > > > > U-Boot 2019.10-rc4-00037-gdac51e9aaf-dirty (Oct 06 2019 -
> > > > > > > 21:56:51
> > > > > +0000)
> > > > > > > CPU:   rv64imafdc
> > > > > > > Model: SiFive HiFive Unleashed A00
> > > > > > > DRAM:  8 GiB
> > > > > > > 
> > > > > > > MMC:   spi@10050000:mmc@0: 0
> > > > > > > In:    serial@10010000
> > > > > > > Out:   serial@10010000
> > > > > > > Err:   serial@10010000
> > > > > > > Net:   eth0: ethernet@10090000
> > > > > > > Hit any key to stop autoboot:  0
> > > > > > > switch to partitions #0, OK
> > > > > > > mmc0 is current device
> > > > > > > Scanning mmc 0:2...
> > > > > > > Found /boot/extlinux/extlinux.conf
> > > > > > > Retrieving file: /boot/extlinux/extlinux.conf
> > > > > > > 510 bytes read in 5 ms (99.6 KiB/s)
> > > > > > > U-Boot menu
> > > > > > > 1:      kernel 5.3.4
> > > > > > > 2:      Debian GNU/Linux kernel 5.3.0-trunk-riscv64
> > > > > > > Enter choice: 1
> > > > > > > 1:      kernel 5.3.4
> > > > > > > Retrieving file: /boot/vmlinux-5.3.4
> > > > > > > 9486076 bytes read in 4813 ms (1.9 MiB/s)
> > > > > > > append: root=/dev/mmcblk0p2 rw console=ttySIF0 rootwait
> > > > > > > Retrieving file: /boot/hifive-unleashed-a00.dtb
> > > > > > > 6088 bytes read in 7 ms (848.6 KiB/s)
> > > > > > > ## Flattened Device Tree blob at 88000000
> > > > > > >    Booting using the fdt blob at 0x88000000
> > > > > > >    Using Device Tree in place at 0000000088000000, end
> > > > > 00000000880047c7
> > > > > > > Starting kernel ...
> > > > > > > 
> > > > > > > exception code: 2 , Illegal instruction , epc  , ra 88000004
> > > > > 88000000
> > > > > > > ### ERROR ### Please RESET the board ###
> > > > > 
> > > > > I think, that's the same issue I had (or still have) a week ago.
> > > > > Just reminder that kernel 5.3 introduced a 64-byte header (thus no
> > > > > need to wrap kernel) at least for Image target. Thus it's booti
> > > > > that
> > > > > boots the kernel on U-Boot side.
> > > > > Thus the 1st instruction of that header is "j 0x40" (to the
> > > > > beginning
> > > > > of the actual kernel).  And 88000004 would definitely hold an
> > > > > illegal
> > > > > instruction.
> > > > > 
> > > > > 0000000000000000 <.data>:
> > > > > 0:       81a0                    j       0x40
> > > > > 2:       0000                    unimp
> > > > > 4:       0000                    unimp
> > > > > 6:       0100                    nop
> > > > > [..]
> > > > 
> > > > Hmm that's the beginning of the kernel code. The address 88000004
> > > > actually corresponds to the FDT. So the hart ending up in a trap
> > > > actually tries to boot the FDT instead of the kernel.
> > > > 
> > > 
> > > Do you see the issue if you manually use bootm instead of extlinux?
> > > 
> > > => bootm $kernel_addr_r - $fdt_addr_r
> > > 
> > > This is a probably not related as bootm is jumping to wrong location
> > > for some reason. However, it may be worth a shot as it fixes fdt
> > > corruption. 
> > 
> > I have just tested, and it doesn't work. On the other hand I have try to
> > run that manually, and interrupting the boot process usually hides the
> > problem.
> > 
> 
> I tried to reproduce the issue today, but was not able to. If you can
> upload the relevant files somewhere, I can retry it with them. I have
> also added information on the boot flow in U-Boot below in hopes that
> it is helpful for debugging.

You can find the files there:
https://temp.aurel32.net/hifive-opensbi-uboot/

fw_payload.bin contains the OpenSBI + U-Boot payload to be copied to the
first partition of the SD card. The boot.tar.gz contains the /boot 
directory (kernel, fdt and extlinux.conf) and has to be put on the
second partition of the SD card. Note that this partition should have
the GPT boot flag enabled for extlinux to work.

I haven't looked more at the issue recently now that I have found that
using GCC 8 is a fix/workaround. Therefore those files are from ~10 days
ago. I will try to do more tests during the week-end.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fail to bring hart online on HiFive Unleashed
  2019-10-15 22:22                     ` Aurelien Jarno
@ 2019-10-16 20:49                       ` Auer, Lukas
  2019-10-17 15:45                         ` David Abdurachmanov
  2019-10-17 20:42                         ` Aurelien Jarno
  0 siblings, 2 replies; 24+ messages in thread
From: Auer, Lukas @ 2019-10-16 20:49 UTC (permalink / raw)
  To: aurelien; +Cc: david.abdurachmanov, Atish.Patra, linux-riscv

On Wed, 2019-10-16 at 00:22 +0200, Aurelien Jarno wrote:
> On 2019-10-15 21:38, Auer, Lukas wrote:
> > On Thu, 2019-10-10 at 21:58 +0200, Aurelien Jarno wrote:
> > > On 2019-10-09 01:34, Atish Patra wrote:
> > > > On Tue, 2019-10-08 at 08:33 +0200, Aurelien Jarno wrote:
> > > > > Le 8 octobre 2019 08:14:58 GMT+02:00, David Abdurachmanov <
> > > > > david.abdurachmanov@sifive.com> a écrit :
> > > > > > On Tue, Oct 8, 2019 at 7:30 AM Aurelien Jarno <aurelien@aurel32.net
> > > > > > wrote:
> > > > > > > On 2019-10-07 22:19, Atish Patra wrote:
> > > > > > > > Thanks for the detailed analysis. Can you please keep me and
> > > > > > > > david
> > > > > > in
> > > > > > > > cc when you report the issue to U-boot ?
> > > > > > > 
> > > > > > > Yep. I have progressed a bit on that, and now I am not convinced
> > > > > > > it's
> > > > > > an
> > > > > > > U-boot issue, it can be a GCC issue.
> > > > > > > 
> > > > > > > Here are the conditions to reproduce the bug:
> > > > > > > - U-boot runs on hart 1, 2 or 3
> > > > > > > - the autoboot process is not interrupted
> > > > > > > - extlinux is used to boot the kernel
> > > > > > > - arch/riscv/lib/bootm.c is compiled with GCC 9 (works fine with
> > > > > > > GCC
> > > > > > 8)
> > > > > > > When the problem happens, the missing hart actually ends its
> > > > > > execution
> > > > > > > in an illegal instruction trap trying to execute the FDT (I only
> > > > > > noticed
> > > > > > > that recently as the message was hidden by the use of
> > > > > > > earlycon=sbi):
> > > > > > > 
> > > > > > > > SiFive FSBL:       2018-03-20
> > > > > > > > HiFive-U serial #: 00000246
> > > > > > > > 
> > > > > > > > OpenSBI v0.4-50-g30f09fb (Oct  6 2019 21:58:05)
> > > > > > > >    ____                    _____ ____ _____
> > > > > > > >   / __ \                  / ____|  _ \_   _|
> > > > > > > >  | |  | |_ __   ___ _ __ | (___ | |_) || |
> > > > > > > >  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
> > > > > > > >  | |__| | |_) |  __/ | | |____) | |_) || |_
> > > > > > > >   \____/| .__/ \___|_| |_|_____/|____/_____|
> > > > > > > >         | |
> > > > > > > >         |_|
> > > > > > > > 
> > > > > > > > Platform Name          : SiFive Freedom U540
> > > > > > > > Platform HART Features : RV64ACDFIMSU
> > > > > > > > Platform Max HARTs     : 5
> > > > > > > > Current Hart           : 2
> > > > > > > > Firmware Base          : 0x80000000
> > > > > > > > Firmware Size          : 104 KB
> > > > > > > > Runtime SBI Version    : 0.2
> > > > > > > > 
> > > > > > > > PMP0: 0x0000000080000000-0x000000008001ffff (A)
> > > > > > > > PMP1: 0x0000000000000000-0x0000007fffffffff (A,R,W,X)
> > > > > > > > 
> > > > > > > > 
> > > > > > > > U-Boot 2019.10-rc4-00037-gdac51e9aaf-dirty (Oct 06 2019 -
> > > > > > > > 21:56:51
> > > > > > +0000)
> > > > > > > > CPU:   rv64imafdc
> > > > > > > > Model: SiFive HiFive Unleashed A00
> > > > > > > > DRAM:  8 GiB
> > > > > > > > 
> > > > > > > > MMC:   spi@10050000:mmc@0: 0
> > > > > > > > In:    serial@10010000
> > > > > > > > Out:   serial@10010000
> > > > > > > > Err:   serial@10010000
> > > > > > > > Net:   eth0: ethernet@10090000
> > > > > > > > Hit any key to stop autoboot:  0
> > > > > > > > switch to partitions #0, OK
> > > > > > > > mmc0 is current device
> > > > > > > > Scanning mmc 0:2...
> > > > > > > > Found /boot/extlinux/extlinux.conf
> > > > > > > > Retrieving file: /boot/extlinux/extlinux.conf
> > > > > > > > 510 bytes read in 5 ms (99.6 KiB/s)
> > > > > > > > U-Boot menu
> > > > > > > > 1:      kernel 5.3.4
> > > > > > > > 2:      Debian GNU/Linux kernel 5.3.0-trunk-riscv64
> > > > > > > > Enter choice: 1
> > > > > > > > 1:      kernel 5.3.4
> > > > > > > > Retrieving file: /boot/vmlinux-5.3.4
> > > > > > > > 9486076 bytes read in 4813 ms (1.9 MiB/s)
> > > > > > > > append: root=/dev/mmcblk0p2 rw console=ttySIF0 rootwait
> > > > > > > > Retrieving file: /boot/hifive-unleashed-a00.dtb
> > > > > > > > 6088 bytes read in 7 ms (848.6 KiB/s)
> > > > > > > > ## Flattened Device Tree blob at 88000000
> > > > > > > >    Booting using the fdt blob at 0x88000000
> > > > > > > >    Using Device Tree in place at 0000000088000000, end
> > > > > > 00000000880047c7
> > > > > > > > Starting kernel ...
> > > > > > > > 
> > > > > > > > exception code: 2 , Illegal instruction , epc  , ra 88000004
> > > > > > 88000000
> > > > > > > > ### ERROR ### Please RESET the board ###
> > > > > > 
> > > > > > I think, that's the same issue I had (or still have) a week ago.
> > > > > > Just reminder that kernel 5.3 introduced a 64-byte header (thus no
> > > > > > need to wrap kernel) at least for Image target. Thus it's booti
> > > > > > that
> > > > > > boots the kernel on U-Boot side.
> > > > > > Thus the 1st instruction of that header is "j 0x40" (to the
> > > > > > beginning
> > > > > > of the actual kernel).  And 88000004 would definitely hold an
> > > > > > illegal
> > > > > > instruction.
> > > > > > 
> > > > > > 0000000000000000 <.data>:
> > > > > > 0:       81a0                    j       0x40
> > > > > > 2:       0000                    unimp
> > > > > > 4:       0000                    unimp
> > > > > > 6:       0100                    nop
> > > > > > [..]
> > > > > 
> > > > > Hmm that's the beginning of the kernel code. The address 88000004
> > > > > actually corresponds to the FDT. So the hart ending up in a trap
> > > > > actually tries to boot the FDT instead of the kernel.
> > > > > 
> > > > 
> > > > Do you see the issue if you manually use bootm instead of extlinux?
> > > > 
> > > > => bootm $kernel_addr_r - $fdt_addr_r
> > > > 
> > > > This is a probably not related as bootm is jumping to wrong location
> > > > for some reason. However, it may be worth a shot as it fixes fdt
> > > > corruption. 
> > > 
> > > I have just tested, and it doesn't work. On the other hand I have try to
> > > run that manually, and interrupting the boot process usually hides the
> > > problem.
> > > 
> > 
> > I tried to reproduce the issue today, but was not able to. If you can
> > upload the relevant files somewhere, I can retry it with them. I have
> > also added information on the boot flow in U-Boot below in hopes that
> > it is helpful for debugging.
> 
> You can find the files there:
> https://temp.aurel32.net/hifive-opensbi-uboot/
> 
> fw_payload.bin contains the OpenSBI + U-Boot payload to be copied to the
> first partition of the SD card. The boot.tar.gz contains the /boot 
> directory (kernel, fdt and extlinux.conf) and has to be put on the
> second partition of the SD card. Note that this partition should have
> the GPT boot flag enabled for extlinux to work.
> 
> I haven't looked more at the issue recently now that I have found that
> using GCC 8 is a fix/workaround. Therefore those files are from ~10 days
> ago. I will try to do more tests during the week-end.
> 

Thanks for the files, I was able to reproduce the issue now. Seems like
it is caused by a stack overflow. When smp_call_function() is called
during bootm, the stack of the main hart overflows into the stack of
one of the other harts. The return address of the main hart now lies
within the stack of the other hart. Once that hart gets woken by the
IPI it overwrites the return address, in our case with 0x88000000. This
will cause the illegal instruction trap once the main hart returns.
This also explains, why the problem does not occur when the main hart
is hart 4, since its stack is at the bottom and therefore can't
overflow into one of the other stacks.

Increasing the stack size (CONFIG_STACK_SIZE_SHIFT) to 14 fixes the
problem. I'll double check that there's nothing else causing an issue
and will then send a patch to increase the stack size.

Regards,
Lukas
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fail to bring hart online on HiFive Unleashed
  2019-10-16 20:49                       ` Auer, Lukas
@ 2019-10-17 15:45                         ` David Abdurachmanov
  2019-10-17 20:42                         ` Aurelien Jarno
  1 sibling, 0 replies; 24+ messages in thread
From: David Abdurachmanov @ 2019-10-17 15:45 UTC (permalink / raw)
  To: Auer, Lukas; +Cc: david.abdurachmanov, Atish.Patra, linux-riscv, aurelien

On Wed, Oct 16, 2019 at 11:49 PM Auer, Lukas
<lukas.auer@aisec.fraunhofer.de> wrote:
>
> On Wed, 2019-10-16 at 00:22 +0200, Aurelien Jarno wrote:
> > On 2019-10-15 21:38, Auer, Lukas wrote:
> > > On Thu, 2019-10-10 at 21:58 +0200, Aurelien Jarno wrote:
> > > > On 2019-10-09 01:34, Atish Patra wrote:
> > > > > On Tue, 2019-10-08 at 08:33 +0200, Aurelien Jarno wrote:
> > > > > > Le 8 octobre 2019 08:14:58 GMT+02:00, David Abdurachmanov <
> > > > > > david.abdurachmanov@sifive.com> a écrit :
> > > > > > > On Tue, Oct 8, 2019 at 7:30 AM Aurelien Jarno <aurelien@aurel32.net
> > > > > > > wrote:
> > > > > > > > On 2019-10-07 22:19, Atish Patra wrote:
> > > > > > > > > Thanks for the detailed analysis. Can you please keep me and
> > > > > > > > > david
> > > > > > > in
> > > > > > > > > cc when you report the issue to U-boot ?
> > > > > > > >
> > > > > > > > Yep. I have progressed a bit on that, and now I am not convinced
> > > > > > > > it's
> > > > > > > an
> > > > > > > > U-boot issue, it can be a GCC issue.
> > > > > > > >
> > > > > > > > Here are the conditions to reproduce the bug:
> > > > > > > > - U-boot runs on hart 1, 2 or 3
> > > > > > > > - the autoboot process is not interrupted
> > > > > > > > - extlinux is used to boot the kernel
> > > > > > > > - arch/riscv/lib/bootm.c is compiled with GCC 9 (works fine with
> > > > > > > > GCC
> > > > > > > 8)
> > > > > > > > When the problem happens, the missing hart actually ends its
> > > > > > > execution
> > > > > > > > in an illegal instruction trap trying to execute the FDT (I only
> > > > > > > noticed
> > > > > > > > that recently as the message was hidden by the use of
> > > > > > > > earlycon=sbi):
> > > > > > > >
> > > > > > > > > SiFive FSBL:       2018-03-20
> > > > > > > > > HiFive-U serial #: 00000246
> > > > > > > > >
> > > > > > > > > OpenSBI v0.4-50-g30f09fb (Oct  6 2019 21:58:05)
> > > > > > > > >    ____                    _____ ____ _____
> > > > > > > > >   / __ \                  / ____|  _ \_   _|
> > > > > > > > >  | |  | |_ __   ___ _ __ | (___ | |_) || |
> > > > > > > > >  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
> > > > > > > > >  | |__| | |_) |  __/ | | |____) | |_) || |_
> > > > > > > > >   \____/| .__/ \___|_| |_|_____/|____/_____|
> > > > > > > > >         | |
> > > > > > > > >         |_|
> > > > > > > > >
> > > > > > > > > Platform Name          : SiFive Freedom U540
> > > > > > > > > Platform HART Features : RV64ACDFIMSU
> > > > > > > > > Platform Max HARTs     : 5
> > > > > > > > > Current Hart           : 2
> > > > > > > > > Firmware Base          : 0x80000000
> > > > > > > > > Firmware Size          : 104 KB
> > > > > > > > > Runtime SBI Version    : 0.2
> > > > > > > > >
> > > > > > > > > PMP0: 0x0000000080000000-0x000000008001ffff (A)
> > > > > > > > > PMP1: 0x0000000000000000-0x0000007fffffffff (A,R,W,X)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > U-Boot 2019.10-rc4-00037-gdac51e9aaf-dirty (Oct 06 2019 -
> > > > > > > > > 21:56:51
> > > > > > > +0000)
> > > > > > > > > CPU:   rv64imafdc
> > > > > > > > > Model: SiFive HiFive Unleashed A00
> > > > > > > > > DRAM:  8 GiB
> > > > > > > > >
> > > > > > > > > MMC:   spi@10050000:mmc@0: 0
> > > > > > > > > In:    serial@10010000
> > > > > > > > > Out:   serial@10010000
> > > > > > > > > Err:   serial@10010000
> > > > > > > > > Net:   eth0: ethernet@10090000
> > > > > > > > > Hit any key to stop autoboot:  0
> > > > > > > > > switch to partitions #0, OK
> > > > > > > > > mmc0 is current device
> > > > > > > > > Scanning mmc 0:2...
> > > > > > > > > Found /boot/extlinux/extlinux.conf
> > > > > > > > > Retrieving file: /boot/extlinux/extlinux.conf
> > > > > > > > > 510 bytes read in 5 ms (99.6 KiB/s)
> > > > > > > > > U-Boot menu
> > > > > > > > > 1:      kernel 5.3.4
> > > > > > > > > 2:      Debian GNU/Linux kernel 5.3.0-trunk-riscv64
> > > > > > > > > Enter choice: 1
> > > > > > > > > 1:      kernel 5.3.4
> > > > > > > > > Retrieving file: /boot/vmlinux-5.3.4
> > > > > > > > > 9486076 bytes read in 4813 ms (1.9 MiB/s)
> > > > > > > > > append: root=/dev/mmcblk0p2 rw console=ttySIF0 rootwait
> > > > > > > > > Retrieving file: /boot/hifive-unleashed-a00.dtb
> > > > > > > > > 6088 bytes read in 7 ms (848.6 KiB/s)
> > > > > > > > > ## Flattened Device Tree blob at 88000000
> > > > > > > > >    Booting using the fdt blob at 0x88000000
> > > > > > > > >    Using Device Tree in place at 0000000088000000, end
> > > > > > > 00000000880047c7
> > > > > > > > > Starting kernel ...
> > > > > > > > >
> > > > > > > > > exception code: 2 , Illegal instruction , epc  , ra 88000004
> > > > > > > 88000000
> > > > > > > > > ### ERROR ### Please RESET the board ###
> > > > > > >
> > > > > > > I think, that's the same issue I had (or still have) a week ago.
> > > > > > > Just reminder that kernel 5.3 introduced a 64-byte header (thus no
> > > > > > > need to wrap kernel) at least for Image target. Thus it's booti
> > > > > > > that
> > > > > > > boots the kernel on U-Boot side.
> > > > > > > Thus the 1st instruction of that header is "j 0x40" (to the
> > > > > > > beginning
> > > > > > > of the actual kernel).  And 88000004 would definitely hold an
> > > > > > > illegal
> > > > > > > instruction.
> > > > > > >
> > > > > > > 0000000000000000 <.data>:
> > > > > > > 0:       81a0                    j       0x40
> > > > > > > 2:       0000                    unimp
> > > > > > > 4:       0000                    unimp
> > > > > > > 6:       0100                    nop
> > > > > > > [..]
> > > > > >
> > > > > > Hmm that's the beginning of the kernel code. The address 88000004
> > > > > > actually corresponds to the FDT. So the hart ending up in a trap
> > > > > > actually tries to boot the FDT instead of the kernel.
> > > > > >
> > > > >
> > > > > Do you see the issue if you manually use bootm instead of extlinux?
> > > > >
> > > > > => bootm $kernel_addr_r - $fdt_addr_r
> > > > >
> > > > > This is a probably not related as bootm is jumping to wrong location
> > > > > for some reason. However, it may be worth a shot as it fixes fdt
> > > > > corruption.
> > > >
> > > > I have just tested, and it doesn't work. On the other hand I have try to
> > > > run that manually, and interrupting the boot process usually hides the
> > > > problem.
> > > >
> > >
> > > I tried to reproduce the issue today, but was not able to. If you can
> > > upload the relevant files somewhere, I can retry it with them. I have
> > > also added information on the boot flow in U-Boot below in hopes that
> > > it is helpful for debugging.
> >
> > You can find the files there:
> > https://temp.aurel32.net/hifive-opensbi-uboot/
> >
> > fw_payload.bin contains the OpenSBI + U-Boot payload to be copied to the
> > first partition of the SD card. The boot.tar.gz contains the /boot
> > directory (kernel, fdt and extlinux.conf) and has to be put on the
> > second partition of the SD card. Note that this partition should have
> > the GPT boot flag enabled for extlinux to work.
> >
> > I haven't looked more at the issue recently now that I have found that
> > using GCC 8 is a fix/workaround. Therefore those files are from ~10 days
> > ago. I will try to do more tests during the week-end.
> >
>
> Thanks for the files, I was able to reproduce the issue now. Seems like
> it is caused by a stack overflow. When smp_call_function() is called
> during bootm, the stack of the main hart overflows into the stack of
> one of the other harts. The return address of the main hart now lies
> within the stack of the other hart. Once that hart gets woken by the
> IPI it overwrites the return address, in our case with 0x88000000. This
> will cause the illegal instruction trap once the main hart returns.
> This also explains, why the problem does not occur when the main hart
> is hart 4, since its stack is at the bottom and therefore can't
> overflow into one of the other stacks.
>
> Increasing the stack size (CONFIG_STACK_SIZE_SHIFT) to 14 fixes the
> problem. I'll double check that there's nothing else causing an issue
> and will then send a patch to increase the stack size.

My Fedora/RISCV build from yesterday also produced a lot of illegal
instruction at 0x88000000 crashes. I bumped CONFIG_STACK_SIZE_SHIFT to
14 and booted the board ~20 times without the issue today. Thus you
can add:

Tested-by: David Abdurachmanov <david.abdurachmanov@sifive.com>

>
> Regards,
> Lukas
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fail to bring hart online on HiFive Unleashed
  2019-10-16 20:49                       ` Auer, Lukas
  2019-10-17 15:45                         ` David Abdurachmanov
@ 2019-10-17 20:42                         ` Aurelien Jarno
  2019-10-20 18:57                           ` Auer, Lukas
  1 sibling, 1 reply; 24+ messages in thread
From: Aurelien Jarno @ 2019-10-17 20:42 UTC (permalink / raw)
  To: Auer, Lukas; +Cc: david.abdurachmanov, Atish.Patra, linux-riscv

On 2019-10-16 20:49, Auer, Lukas wrote:
> On Wed, 2019-10-16 at 00:22 +0200, Aurelien Jarno wrote:
> > On 2019-10-15 21:38, Auer, Lukas wrote:
> > > On Thu, 2019-10-10 at 21:58 +0200, Aurelien Jarno wrote:
> > > > On 2019-10-09 01:34, Atish Patra wrote:
> > > > > On Tue, 2019-10-08 at 08:33 +0200, Aurelien Jarno wrote:
> > > > > > Le 8 octobre 2019 08:14:58 GMT+02:00, David Abdurachmanov <
> > > > > > david.abdurachmanov@sifive.com> a écrit :
> > > > > > > On Tue, Oct 8, 2019 at 7:30 AM Aurelien Jarno <aurelien@aurel32.net
> > > > > > > wrote:
> > > > > > > > On 2019-10-07 22:19, Atish Patra wrote:
> > > > > > > > > Thanks for the detailed analysis. Can you please keep me and
> > > > > > > > > david
> > > > > > > in
> > > > > > > > > cc when you report the issue to U-boot ?
> > > > > > > > 
> > > > > > > > Yep. I have progressed a bit on that, and now I am not convinced
> > > > > > > > it's
> > > > > > > an
> > > > > > > > U-boot issue, it can be a GCC issue.
> > > > > > > > 
> > > > > > > > Here are the conditions to reproduce the bug:
> > > > > > > > - U-boot runs on hart 1, 2 or 3
> > > > > > > > - the autoboot process is not interrupted
> > > > > > > > - extlinux is used to boot the kernel
> > > > > > > > - arch/riscv/lib/bootm.c is compiled with GCC 9 (works fine with
> > > > > > > > GCC
> > > > > > > 8)
> > > > > > > > When the problem happens, the missing hart actually ends its
> > > > > > > execution
> > > > > > > > in an illegal instruction trap trying to execute the FDT (I only
> > > > > > > noticed
> > > > > > > > that recently as the message was hidden by the use of
> > > > > > > > earlycon=sbi):
> > > > > > > > 
> > > > > > > > > SiFive FSBL:       2018-03-20
> > > > > > > > > HiFive-U serial #: 00000246
> > > > > > > > > 
> > > > > > > > > OpenSBI v0.4-50-g30f09fb (Oct  6 2019 21:58:05)
> > > > > > > > >    ____                    _____ ____ _____
> > > > > > > > >   / __ \                  / ____|  _ \_   _|
> > > > > > > > >  | |  | |_ __   ___ _ __ | (___ | |_) || |
> > > > > > > > >  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
> > > > > > > > >  | |__| | |_) |  __/ | | |____) | |_) || |_
> > > > > > > > >   \____/| .__/ \___|_| |_|_____/|____/_____|
> > > > > > > > >         | |
> > > > > > > > >         |_|
> > > > > > > > > 
> > > > > > > > > Platform Name          : SiFive Freedom U540
> > > > > > > > > Platform HART Features : RV64ACDFIMSU
> > > > > > > > > Platform Max HARTs     : 5
> > > > > > > > > Current Hart           : 2
> > > > > > > > > Firmware Base          : 0x80000000
> > > > > > > > > Firmware Size          : 104 KB
> > > > > > > > > Runtime SBI Version    : 0.2
> > > > > > > > > 
> > > > > > > > > PMP0: 0x0000000080000000-0x000000008001ffff (A)
> > > > > > > > > PMP1: 0x0000000000000000-0x0000007fffffffff (A,R,W,X)
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > U-Boot 2019.10-rc4-00037-gdac51e9aaf-dirty (Oct 06 2019 -
> > > > > > > > > 21:56:51
> > > > > > > +0000)
> > > > > > > > > CPU:   rv64imafdc
> > > > > > > > > Model: SiFive HiFive Unleashed A00
> > > > > > > > > DRAM:  8 GiB
> > > > > > > > > 
> > > > > > > > > MMC:   spi@10050000:mmc@0: 0
> > > > > > > > > In:    serial@10010000
> > > > > > > > > Out:   serial@10010000
> > > > > > > > > Err:   serial@10010000
> > > > > > > > > Net:   eth0: ethernet@10090000
> > > > > > > > > Hit any key to stop autoboot:  0
> > > > > > > > > switch to partitions #0, OK
> > > > > > > > > mmc0 is current device
> > > > > > > > > Scanning mmc 0:2...
> > > > > > > > > Found /boot/extlinux/extlinux.conf
> > > > > > > > > Retrieving file: /boot/extlinux/extlinux.conf
> > > > > > > > > 510 bytes read in 5 ms (99.6 KiB/s)
> > > > > > > > > U-Boot menu
> > > > > > > > > 1:      kernel 5.3.4
> > > > > > > > > 2:      Debian GNU/Linux kernel 5.3.0-trunk-riscv64
> > > > > > > > > Enter choice: 1
> > > > > > > > > 1:      kernel 5.3.4
> > > > > > > > > Retrieving file: /boot/vmlinux-5.3.4
> > > > > > > > > 9486076 bytes read in 4813 ms (1.9 MiB/s)
> > > > > > > > > append: root=/dev/mmcblk0p2 rw console=ttySIF0 rootwait
> > > > > > > > > Retrieving file: /boot/hifive-unleashed-a00.dtb
> > > > > > > > > 6088 bytes read in 7 ms (848.6 KiB/s)
> > > > > > > > > ## Flattened Device Tree blob at 88000000
> > > > > > > > >    Booting using the fdt blob at 0x88000000
> > > > > > > > >    Using Device Tree in place at 0000000088000000, end
> > > > > > > 00000000880047c7
> > > > > > > > > Starting kernel ...
> > > > > > > > > 
> > > > > > > > > exception code: 2 , Illegal instruction , epc  , ra 88000004
> > > > > > > 88000000
> > > > > > > > > ### ERROR ### Please RESET the board ###
> > > > > > > 
> > > > > > > I think, that's the same issue I had (or still have) a week ago.
> > > > > > > Just reminder that kernel 5.3 introduced a 64-byte header (thus no
> > > > > > > need to wrap kernel) at least for Image target. Thus it's booti
> > > > > > > that
> > > > > > > boots the kernel on U-Boot side.
> > > > > > > Thus the 1st instruction of that header is "j 0x40" (to the
> > > > > > > beginning
> > > > > > > of the actual kernel).  And 88000004 would definitely hold an
> > > > > > > illegal
> > > > > > > instruction.
> > > > > > > 
> > > > > > > 0000000000000000 <.data>:
> > > > > > > 0:       81a0                    j       0x40
> > > > > > > 2:       0000                    unimp
> > > > > > > 4:       0000                    unimp
> > > > > > > 6:       0100                    nop
> > > > > > > [..]
> > > > > > 
> > > > > > Hmm that's the beginning of the kernel code. The address 88000004
> > > > > > actually corresponds to the FDT. So the hart ending up in a trap
> > > > > > actually tries to boot the FDT instead of the kernel.
> > > > > > 
> > > > > 
> > > > > Do you see the issue if you manually use bootm instead of extlinux?
> > > > > 
> > > > > => bootm $kernel_addr_r - $fdt_addr_r
> > > > > 
> > > > > This is a probably not related as bootm is jumping to wrong location
> > > > > for some reason. However, it may be worth a shot as it fixes fdt
> > > > > corruption. 
> > > > 
> > > > I have just tested, and it doesn't work. On the other hand I have try to
> > > > run that manually, and interrupting the boot process usually hides the
> > > > problem.
> > > > 
> > > 
> > > I tried to reproduce the issue today, but was not able to. If you can
> > > upload the relevant files somewhere, I can retry it with them. I have
> > > also added information on the boot flow in U-Boot below in hopes that
> > > it is helpful for debugging.
> > 
> > You can find the files there:
> > https://temp.aurel32.net/hifive-opensbi-uboot/
> > 
> > fw_payload.bin contains the OpenSBI + U-Boot payload to be copied to the
> > first partition of the SD card. The boot.tar.gz contains the /boot 
> > directory (kernel, fdt and extlinux.conf) and has to be put on the
> > second partition of the SD card. Note that this partition should have
> > the GPT boot flag enabled for extlinux to work.
> > 
> > I haven't looked more at the issue recently now that I have found that
> > using GCC 8 is a fix/workaround. Therefore those files are from ~10 days
> > ago. I will try to do more tests during the week-end.
> > 
> 
> Thanks for the files, I was able to reproduce the issue now. Seems like
> it is caused by a stack overflow. When smp_call_function() is called
> during bootm, the stack of the main hart overflows into the stack of
> one of the other harts. The return address of the main hart now lies
> within the stack of the other hart. Once that hart gets woken by the
> IPI it overwrites the return address, in our case with 0x88000000. This
> will cause the illegal instruction trap once the main hart returns.
> This also explains, why the problem does not occur when the main hart
> is hart 4, since its stack is at the bottom and therefore can't
> overflow into one of the other stacks.
> 
> Increasing the stack size (CONFIG_STACK_SIZE_SHIFT) to 14 fixes the
> problem. I'll double check that there's nothing else causing an issue
> and will then send a patch to increase the stack size.

Thanks a lot for debugging that. I have just tried, I confirm it fixes
the issue for me.

Tested-by: Aurelien Jarno <aurelien@aurel32.net>

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Fail to bring hart online on HiFive Unleashed
  2019-10-17 20:42                         ` Aurelien Jarno
@ 2019-10-20 18:57                           ` Auer, Lukas
  0 siblings, 0 replies; 24+ messages in thread
From: Auer, Lukas @ 2019-10-20 18:57 UTC (permalink / raw)
  To: aurelien; +Cc: david.abdurachmanov, Atish.Patra, linux-riscv

On Thu, 2019-10-17 at 22:42 +0200, Aurelien Jarno wrote:
> On 2019-10-16 20:49, Auer, Lukas wrote:
> > On Wed, 2019-10-16 at 00:22 +0200, Aurelien Jarno wrote:
> > > On 2019-10-15 21:38, Auer, Lukas wrote:
> > > > On Thu, 2019-10-10 at 21:58 +0200, Aurelien Jarno wrote:
> > > > > On 2019-10-09 01:34, Atish Patra wrote:
> > > > > > On Tue, 2019-10-08 at 08:33 +0200, Aurelien Jarno wrote:
> > > > > > > Le 8 octobre 2019 08:14:58 GMT+02:00, David Abdurachmanov <
> > > > > > > david.abdurachmanov@sifive.com> a écrit :
> > > > > > > > On Tue, Oct 8, 2019 at 7:30 AM Aurelien Jarno <aurelien@aurel32.net
> > > > > > > > wrote:
> > > > > > > > > On 2019-10-07 22:19, Atish Patra wrote:
> > > > > > > > > > Thanks for the detailed analysis. Can you please keep me and
> > > > > > > > > > david
> > > > > > > > in
> > > > > > > > > > cc when you report the issue to U-boot ?
> > > > > > > > > 
> > > > > > > > > Yep. I have progressed a bit on that, and now I am not convinced
> > > > > > > > > it's
> > > > > > > > an
> > > > > > > > > U-boot issue, it can be a GCC issue.
> > > > > > > > > 
> > > > > > > > > Here are the conditions to reproduce the bug:
> > > > > > > > > - U-boot runs on hart 1, 2 or 3
> > > > > > > > > - the autoboot process is not interrupted
> > > > > > > > > - extlinux is used to boot the kernel
> > > > > > > > > - arch/riscv/lib/bootm.c is compiled with GCC 9 (works fine with
> > > > > > > > > GCC
> > > > > > > > 8)
> > > > > > > > > When the problem happens, the missing hart actually ends its
> > > > > > > > execution
> > > > > > > > > in an illegal instruction trap trying to execute the FDT (I only
> > > > > > > > noticed
> > > > > > > > > that recently as the message was hidden by the use of
> > > > > > > > > earlycon=sbi):
> > > > > > > > > 
> > > > > > > > > > SiFive FSBL:       2018-03-20
> > > > > > > > > > HiFive-U serial #: 00000246
> > > > > > > > > > 
> > > > > > > > > > OpenSBI v0.4-50-g30f09fb (Oct  6 2019 21:58:05)
> > > > > > > > > >    ____                    _____ ____ _____
> > > > > > > > > >   / __ \                  / ____|  _ \_   _|
> > > > > > > > > >  | |  | |_ __   ___ _ __ | (___ | |_) || |
> > > > > > > > > >  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
> > > > > > > > > >  | |__| | |_) |  __/ | | |____) | |_) || |_
> > > > > > > > > >   \____/| .__/ \___|_| |_|_____/|____/_____|
> > > > > > > > > >         | |
> > > > > > > > > >         |_|
> > > > > > > > > > 
> > > > > > > > > > Platform Name          : SiFive Freedom U540
> > > > > > > > > > Platform HART Features : RV64ACDFIMSU
> > > > > > > > > > Platform Max HARTs     : 5
> > > > > > > > > > Current Hart           : 2
> > > > > > > > > > Firmware Base          : 0x80000000
> > > > > > > > > > Firmware Size          : 104 KB
> > > > > > > > > > Runtime SBI Version    : 0.2
> > > > > > > > > > 
> > > > > > > > > > PMP0: 0x0000000080000000-0x000000008001ffff (A)
> > > > > > > > > > PMP1: 0x0000000000000000-0x0000007fffffffff (A,R,W,X)
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > U-Boot 2019.10-rc4-00037-gdac51e9aaf-dirty (Oct 06 2019 -
> > > > > > > > > > 21:56:51
> > > > > > > > +0000)
> > > > > > > > > > CPU:   rv64imafdc
> > > > > > > > > > Model: SiFive HiFive Unleashed A00
> > > > > > > > > > DRAM:  8 GiB
> > > > > > > > > > 
> > > > > > > > > > MMC:   spi@10050000:mmc@0: 0
> > > > > > > > > > In:    serial@10010000
> > > > > > > > > > Out:   serial@10010000
> > > > > > > > > > Err:   serial@10010000
> > > > > > > > > > Net:   eth0: ethernet@10090000
> > > > > > > > > > Hit any key to stop autoboot:  0
> > > > > > > > > > switch to partitions #0, OK
> > > > > > > > > > mmc0 is current device
> > > > > > > > > > Scanning mmc 0:2...
> > > > > > > > > > Found /boot/extlinux/extlinux.conf
> > > > > > > > > > Retrieving file: /boot/extlinux/extlinux.conf
> > > > > > > > > > 510 bytes read in 5 ms (99.6 KiB/s)
> > > > > > > > > > U-Boot menu
> > > > > > > > > > 1:      kernel 5.3.4
> > > > > > > > > > 2:      Debian GNU/Linux kernel 5.3.0-trunk-riscv64
> > > > > > > > > > Enter choice: 1
> > > > > > > > > > 1:      kernel 5.3.4
> > > > > > > > > > Retrieving file: /boot/vmlinux-5.3.4
> > > > > > > > > > 9486076 bytes read in 4813 ms (1.9 MiB/s)
> > > > > > > > > > append: root=/dev/mmcblk0p2 rw console=ttySIF0 rootwait
> > > > > > > > > > Retrieving file: /boot/hifive-unleashed-a00.dtb
> > > > > > > > > > 6088 bytes read in 7 ms (848.6 KiB/s)
> > > > > > > > > > ## Flattened Device Tree blob at 88000000
> > > > > > > > > >    Booting using the fdt blob at 0x88000000
> > > > > > > > > >    Using Device Tree in place at 0000000088000000, end
> > > > > > > > 00000000880047c7
> > > > > > > > > > Starting kernel ...
> > > > > > > > > > 
> > > > > > > > > > exception code: 2 , Illegal instruction , epc  , ra 88000004
> > > > > > > > 88000000
> > > > > > > > > > ### ERROR ### Please RESET the board ###
> > > > > > > > 
> > > > > > > > I think, that's the same issue I had (or still have) a week ago.
> > > > > > > > Just reminder that kernel 5.3 introduced a 64-byte header (thus no
> > > > > > > > need to wrap kernel) at least for Image target. Thus it's booti
> > > > > > > > that
> > > > > > > > boots the kernel on U-Boot side.
> > > > > > > > Thus the 1st instruction of that header is "j 0x40" (to the
> > > > > > > > beginning
> > > > > > > > of the actual kernel).  And 88000004 would definitely hold an
> > > > > > > > illegal
> > > > > > > > instruction.
> > > > > > > > 
> > > > > > > > 0000000000000000 <.data>:
> > > > > > > > 0:       81a0                    j       0x40
> > > > > > > > 2:       0000                    unimp
> > > > > > > > 4:       0000                    unimp
> > > > > > > > 6:       0100                    nop
> > > > > > > > [..]
> > > > > > > 
> > > > > > > Hmm that's the beginning of the kernel code. The address 88000004
> > > > > > > actually corresponds to the FDT. So the hart ending up in a trap
> > > > > > > actually tries to boot the FDT instead of the kernel.
> > > > > > > 
> > > > > > 
> > > > > > Do you see the issue if you manually use bootm instead of extlinux?
> > > > > > 
> > > > > > => bootm $kernel_addr_r - $fdt_addr_r
> > > > > > 
> > > > > > This is a probably not related as bootm is jumping to wrong location
> > > > > > for some reason. However, it may be worth a shot as it fixes fdt
> > > > > > corruption. 
> > > > > 
> > > > > I have just tested, and it doesn't work. On the other hand I have try to
> > > > > run that manually, and interrupting the boot process usually hides the
> > > > > problem.
> > > > > 
> > > > 
> > > > I tried to reproduce the issue today, but was not able to. If you can
> > > > upload the relevant files somewhere, I can retry it with them. I have
> > > > also added information on the boot flow in U-Boot below in hopes that
> > > > it is helpful for debugging.
> > > 
> > > You can find the files there:
> > > https://temp.aurel32.net/hifive-opensbi-uboot/
> > > 
> > > fw_payload.bin contains the OpenSBI + U-Boot payload to be copied to the
> > > first partition of the SD card. The boot.tar.gz contains the /boot 
> > > directory (kernel, fdt and extlinux.conf) and has to be put on the
> > > second partition of the SD card. Note that this partition should have
> > > the GPT boot flag enabled for extlinux to work.
> > > 
> > > I haven't looked more at the issue recently now that I have found that
> > > using GCC 8 is a fix/workaround. Therefore those files are from ~10 days
> > > ago. I will try to do more tests during the week-end.
> > > 
> > 
> > Thanks for the files, I was able to reproduce the issue now. Seems like
> > it is caused by a stack overflow. When smp_call_function() is called
> > during bootm, the stack of the main hart overflows into the stack of
> > one of the other harts. The return address of the main hart now lies
> > within the stack of the other hart. Once that hart gets woken by the
> > IPI it overwrites the return address, in our case with 0x88000000. This
> > will cause the illegal instruction trap once the main hart returns.
> > This also explains, why the problem does not occur when the main hart
> > is hart 4, since its stack is at the bottom and therefore can't
> > overflow into one of the other stacks.
> > 
> > Increasing the stack size (CONFIG_STACK_SIZE_SHIFT) to 14 fixes the
> > problem. I'll double check that there's nothing else causing an issue
> > and will then send a patch to increase the stack size.
> 
> Thanks a lot for debugging that. I have just tried, I confirm it fixes
> the issue for me.
> 
> Tested-by: Aurelien Jarno <aurelien@aurel32.net>
> 

Thanks for testing the fix, David and Aurelien! I have submitted a
patch to U-Boot with the fix.

https://patchwork.ozlabs.org/patch/1180057/

Regards,
Lukas
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2019-10-20 18:57 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-03 20:07 Fail to bring hart online on HiFive Unleashed Aurelien Jarno
2019-10-03 23:13 ` Atish Patra
2019-10-03 23:16   ` Troy Benjegerdes
2019-10-05 10:25   ` Aurelien Jarno
2019-10-05 10:54     ` Aurelien Jarno
2019-10-06 12:28     ` Aurelien Jarno
2019-10-07 22:19       ` Atish Patra
2019-10-08  4:30         ` Aurelien Jarno
2019-10-08  6:14           ` David Abdurachmanov
2019-10-08  6:33             ` Aurelien Jarno
2019-10-08  7:17               ` Anup Patel
2019-10-08 22:21               ` Troy Benjegerdes
2019-10-10 19:59                 ` Aurelien Jarno
2019-10-11 14:05                   ` David Abdurachmanov
2019-10-09  1:34               ` Atish Patra
2019-10-10 19:58                 ` Aurelien Jarno
2019-10-15 21:38                   ` Auer, Lukas
2019-10-15 22:22                     ` Aurelien Jarno
2019-10-16 20:49                       ` Auer, Lukas
2019-10-17 15:45                         ` David Abdurachmanov
2019-10-17 20:42                         ` Aurelien Jarno
2019-10-20 18:57                           ` Auer, Lukas
2019-10-08  7:06           ` Anup Patel
2019-10-14  9:23 ` Andreas Schwab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).