All of lore.kernel.org
 help / color / mirror / Atom feed
* RPi4 U-Boot freeze
@ 2020-08-23 17:06 Stefan Agner
  2020-09-03 11:23 ` Stefan Agner
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Agner @ 2020-08-23 17:06 UTC (permalink / raw)
  To: u-boot

Hi,

I noticed a quite common freeze when running 32-bit U-Boot 2020.01
(rpi_4_32b_defconfig) on a 2GB RPi4 model:

U-Boot 2020.01 (Aug 07 2020 - 13:00:23 +0000)

DRAM:  1.9 GiB
<freeze, no more output>

This happens fairly often, I would say 4 out of 5 boot tries. However,
if it boots, everything seems to run fine.

The issue seems to go away when using 2020.04 or any newer release,
however, when trying to find the actual patch fixing the issue using git
bisect I ended up with a MMC merge request which really seems unrelated
(36bdcf7f3b). It seems that the problem is quite evasive and disappears
if certain structure are aligned differently etc.

Enabling initcall debugging showed that U-Boot crashes right after
relocation:

...
initcall: 00016f2c

RAM Configuration:
Bank #0: 0 948 MiB
Bank #1: 40000000 1 GiB
Bank #2: 0 0 Bytes
Bank #3: 0 0 Bytes

DRAM:  1.9 GiB
initcall: 00016bb8
New Stack Pointer is: 3af6d9e0
initcall: 00016da4
initcall: 00016ef0
initcall: 00016ef8
initcall: 00016d38
Relocation Offset is: 3b375000
Relocating to 3b37d000, new gd at 3af78ec0, sp at 3af6d9e0
initcall: 00016ec8 [clear_bss]
initcall: 0004465c [display_options?? only appears sometimes]
<freeze>

I realized when using CONFIG_SYS_(I|D)CACHE_OFF=y the problem seems to
disappear. But to be 100% certain that it is cache related, I used my
original configuration (which is known to "reliably" freeze), and
replaced 00016ec8 with 00008688 manually in the binary, essentially
swapping out function pointers in "init_sequence_f" [00008688 is
cleanup_before_linux, which flushes and disables I-cache/D-cache]. And
indeed, that hacked up binary does boot reliably every time:

...
initcall: 00016f2c

RAM Configuration:
Bank #0: 0 948 MiB
Bank #1: 40000000 1 GiB
Bank #2: 0 0 Bytes
Bank #3: 0 0 Bytes

DRAM:  1.9 GiB
initcall: 00016bb8
New Stack Pointer is: 3af6d9e0
initcall: 00016da4
initcall: 00016ef0
initcall: 00016ef8
initcall: 00016d38
Relocation Offset is: 3b375000
Relocating to 3b37d000, new gd at 3af78ec0, sp at 3af6d9e0
initcall: 00008688
initcall: 3b38c10c
initcall: 3b38c114
initcall: 000172e0 (relocated to 3b38c2e0)
initcall: 0001712c (relocated to 3b38c12c)
...

From what I understand on RPi4 caches are enabled when entering U-Boot.
I was wondering if the relocation code really can handle that?

--
Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RPi4 U-Boot freeze
  2020-08-23 17:06 RPi4 U-Boot freeze Stefan Agner
@ 2020-09-03 11:23 ` Stefan Agner
  2020-09-04  1:48   ` Bin Meng
  2020-09-07 14:36   ` Peter Robinson
  0 siblings, 2 replies; 10+ messages in thread
From: Stefan Agner @ 2020-09-03 11:23 UTC (permalink / raw)
  To: u-boot

Any thoughts on this issue?

Just to be sure, I did some memory testing on the 2GB module, but no
issues found.

I still somehow suspected that something else might be wrong with my
hardware, so I bought a new RPi4 (this time with 4GB of RAM) but the
very same with that:

U-Boot 2020.01 (Aug 23 2020 - 22:02:31 +0000)

DRAM:  3.9 GiB
<freeze>

I still think there is something wrong with caching. From what I
understand caches are enabled by the RPi (4) firmware. Is it safe to run
32-bit ARM U-Boot with enabled caches?

--
Stefan

On 2020-08-23 19:06, Stefan Agner wrote:
> Hi,
> 
> I noticed a quite common freeze when running 32-bit U-Boot 2020.01
> (rpi_4_32b_defconfig) on a 2GB RPi4 model:
> 
> U-Boot 2020.01 (Aug 07 2020 - 13:00:23 +0000)
> 
> DRAM:  1.9 GiB
> <freeze, no more output>
> 
> This happens fairly often, I would say 4 out of 5 boot tries. However,
> if it boots, everything seems to run fine.
> 
> The issue seems to go away when using 2020.04 or any newer release,
> however, when trying to find the actual patch fixing the issue using git
> bisect I ended up with a MMC merge request which really seems unrelated
> (36bdcf7f3b). It seems that the problem is quite evasive and disappears
> if certain structure are aligned differently etc.
> 
> Enabling initcall debugging showed that U-Boot crashes right after
> relocation:
> 
> ...
> initcall: 00016f2c
> 
> RAM Configuration:
> Bank #0: 0 948 MiB
> Bank #1: 40000000 1 GiB
> Bank #2: 0 0 Bytes
> Bank #3: 0 0 Bytes
> 
> DRAM:  1.9 GiB
> initcall: 00016bb8
> New Stack Pointer is: 3af6d9e0
> initcall: 00016da4
> initcall: 00016ef0
> initcall: 00016ef8
> initcall: 00016d38
> Relocation Offset is: 3b375000
> Relocating to 3b37d000, new gd at 3af78ec0, sp at 3af6d9e0
> initcall: 00016ec8 [clear_bss]
> initcall: 0004465c [display_options?? only appears sometimes]
> <freeze>
> 
> I realized when using CONFIG_SYS_(I|D)CACHE_OFF=y the problem seems to
> disappear. But to be 100% certain that it is cache related, I used my
> original configuration (which is known to "reliably" freeze), and
> replaced 00016ec8 with 00008688 manually in the binary, essentially
> swapping out function pointers in "init_sequence_f" [00008688 is
> cleanup_before_linux, which flushes and disables I-cache/D-cache]. And
> indeed, that hacked up binary does boot reliably every time:
> 
> ...
> initcall: 00016f2c
> 
> RAM Configuration:
> Bank #0: 0 948 MiB
> Bank #1: 40000000 1 GiB
> Bank #2: 0 0 Bytes
> Bank #3: 0 0 Bytes
> 
> DRAM:  1.9 GiB
> initcall: 00016bb8
> New Stack Pointer is: 3af6d9e0
> initcall: 00016da4
> initcall: 00016ef0
> initcall: 00016ef8
> initcall: 00016d38
> Relocation Offset is: 3b375000
> Relocating to 3b37d000, new gd at 3af78ec0, sp at 3af6d9e0
> initcall: 00008688
> initcall: 3b38c10c
> initcall: 3b38c114
> initcall: 000172e0 (relocated to 3b38c2e0)
> initcall: 0001712c (relocated to 3b38c12c)
> ...
> 
> From what I understand on RPi4 caches are enabled when entering U-Boot.
> I was wondering if the relocation code really can handle that?
> 
> --
> Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RPi4 U-Boot freeze
  2020-09-03 11:23 ` Stefan Agner
@ 2020-09-04  1:48   ` Bin Meng
  2020-09-07 14:36   ` Peter Robinson
  1 sibling, 0 replies; 10+ messages in thread
From: Bin Meng @ 2020-09-04  1:48 UTC (permalink / raw)
  To: u-boot

On Thu, Sep 3, 2020 at 7:23 PM Stefan Agner <stefan@agner.ch> wrote:
>
> Any thoughts on this issue?
>
> Just to be sure, I did some memory testing on the 2GB module, but no
> issues found.
>
> I still somehow suspected that something else might be wrong with my
> hardware, so I bought a new RPi4 (this time with 4GB of RAM) but the
> very same with that:
>
> U-Boot 2020.01 (Aug 23 2020 - 22:02:31 +0000)
>
> DRAM:  3.9 GiB
> <freeze>
>
> I still think there is something wrong with caching. From what I
> understand caches are enabled by the RPi (4) firmware. Is it safe to run
> 32-bit ARM U-Boot with enabled caches?
>
> --
> Stefan
>
> On 2020-08-23 19:06, Stefan Agner wrote:
> > Hi,
> >
> > I noticed a quite common freeze when running 32-bit U-Boot 2020.01
> > (rpi_4_32b_defconfig) on a 2GB RPi4 model:
> >
> > U-Boot 2020.01 (Aug 07 2020 - 13:00:23 +0000)
> >
> > DRAM:  1.9 GiB
> > <freeze, no more output>
> >
> > This happens fairly often, I would say 4 out of 5 boot tries. However,
> > if it boots, everything seems to run fine.
> >
> > The issue seems to go away when using 2020.04 or any newer release,
> > however, when trying to find the actual patch fixing the issue using git
> > bisect I ended up with a MMC merge request which really seems unrelated
> > (36bdcf7f3b). It seems that the problem is quite evasive and disappears
> > if certain structure are aligned differently etc.
> >
> > Enabling initcall debugging showed that U-Boot crashes right after
> > relocation:

I suspect there are some timing issues in the DDR training codes.

Regards,
Bin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RPi4 U-Boot freeze
  2020-09-03 11:23 ` Stefan Agner
  2020-09-04  1:48   ` Bin Meng
@ 2020-09-07 14:36   ` Peter Robinson
  2020-09-10 21:12     ` Stefan Agner
  1 sibling, 1 reply; 10+ messages in thread
From: Peter Robinson @ 2020-09-07 14:36 UTC (permalink / raw)
  To: u-boot

> Any thoughts on this issue?

Any reason why you're using 2020.01 and not at least 2020.07, or at
least seeing if it's reproducible on 2020.10-rc3? The RPi4 support has
changed quite a bit since then I suspect.

> Just to be sure, I did some memory testing on the 2GB module, but no
> issues found.
>
> I still somehow suspected that something else might be wrong with my
> hardware, so I bought a new RPi4 (this time with 4GB of RAM) but the
> very same with that:
>
> U-Boot 2020.01 (Aug 23 2020 - 22:02:31 +0000)
>
> DRAM:  3.9 GiB
> <freeze>
>
> I still think there is something wrong with caching. From what I
> understand caches are enabled by the RPi (4) firmware. Is it safe to run
> 32-bit ARM U-Boot with enabled caches?
>
> --
> Stefan
>
> On 2020-08-23 19:06, Stefan Agner wrote:
> > Hi,
> >
> > I noticed a quite common freeze when running 32-bit U-Boot 2020.01
> > (rpi_4_32b_defconfig) on a 2GB RPi4 model:
> >
> > U-Boot 2020.01 (Aug 07 2020 - 13:00:23 +0000)
> >
> > DRAM:  1.9 GiB
> > <freeze, no more output>
> >
> > This happens fairly often, I would say 4 out of 5 boot tries. However,
> > if it boots, everything seems to run fine.
> >
> > The issue seems to go away when using 2020.04 or any newer release,
> > however, when trying to find the actual patch fixing the issue using git
> > bisect I ended up with a MMC merge request which really seems unrelated
> > (36bdcf7f3b). It seems that the problem is quite evasive and disappears
> > if certain structure are aligned differently etc.
> >
> > Enabling initcall debugging showed that U-Boot crashes right after
> > relocation:
> >
> > ...
> > initcall: 00016f2c
> >
> > RAM Configuration:
> > Bank #0: 0 948 MiB
> > Bank #1: 40000000 1 GiB
> > Bank #2: 0 0 Bytes
> > Bank #3: 0 0 Bytes
> >
> > DRAM:  1.9 GiB
> > initcall: 00016bb8
> > New Stack Pointer is: 3af6d9e0
> > initcall: 00016da4
> > initcall: 00016ef0
> > initcall: 00016ef8
> > initcall: 00016d38
> > Relocation Offset is: 3b375000
> > Relocating to 3b37d000, new gd at 3af78ec0, sp at 3af6d9e0
> > initcall: 00016ec8 [clear_bss]
> > initcall: 0004465c [display_options?? only appears sometimes]
> > <freeze>
> >
> > I realized when using CONFIG_SYS_(I|D)CACHE_OFF=y the problem seems to
> > disappear. But to be 100% certain that it is cache related, I used my
> > original configuration (which is known to "reliably" freeze), and
> > replaced 00016ec8 with 00008688 manually in the binary, essentially
> > swapping out function pointers in "init_sequence_f" [00008688 is
> > cleanup_before_linux, which flushes and disables I-cache/D-cache]. And
> > indeed, that hacked up binary does boot reliably every time:
> >
> > ...
> > initcall: 00016f2c
> >
> > RAM Configuration:
> > Bank #0: 0 948 MiB
> > Bank #1: 40000000 1 GiB
> > Bank #2: 0 0 Bytes
> > Bank #3: 0 0 Bytes
> >
> > DRAM:  1.9 GiB
> > initcall: 00016bb8
> > New Stack Pointer is: 3af6d9e0
> > initcall: 00016da4
> > initcall: 00016ef0
> > initcall: 00016ef8
> > initcall: 00016d38
> > Relocation Offset is: 3b375000
> > Relocating to 3b37d000, new gd at 3af78ec0, sp at 3af6d9e0
> > initcall: 00008688
> > initcall: 3b38c10c
> > initcall: 3b38c114
> > initcall: 000172e0 (relocated to 3b38c2e0)
> > initcall: 0001712c (relocated to 3b38c12c)
> > ...
> >
> > From what I understand on RPi4 caches are enabled when entering U-Boot.
> > I was wondering if the relocation code really can handle that?
> >
> > --
> > Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RPi4 U-Boot freeze
  2020-09-07 14:36   ` Peter Robinson
@ 2020-09-10 21:12     ` Stefan Agner
  2020-09-14  8:15       ` Matthias Brugger
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Agner @ 2020-09-10 21:12 UTC (permalink / raw)
  To: u-boot

On 2020-09-07 16:36, Peter Robinson wrote:
>> Any thoughts on this issue?
> 
> Any reason why you're using 2020.01 and not at least 2020.07, or at
> least seeing if it's reproducible on 2020.10-rc3? The RPi4 support has
> changed quite a bit since then I suspect.
> 

Hi Peter,

It's a stable release and we support a couple of devices with the same
U-Boot version. I'd rather prefer to stay with 2020.01 for RPi4 as well.

We are on 2020.07 on development branch, and it does work fine there. So
I thought it can't be that hard, just bisect and backport whatever fixes
it... Unfortunately, it seems that there is no particular commit which
fixes it (the bisect ended up in a random unrelated commit, and it seems
that the issue appears/disappears depending on alignment/size...).

I also did applied pretty much every RPi4 related commit made after
2020.01 up until master back to 2020.01, no success either.

I fear that the problem in fact is still in master, but only appears if
certain things align a certain way... That is why I thought I bring it
up, to see if anybody else has noticed something along this lines. We do
have a rather trimmed down configuration, which might make the problem
appear more (fit in a D cache...).

I probably will just disable cached around relocation for 2020.01 and
see if it resurfaces on development branch.

--
Stefan


>> Just to be sure, I did some memory testing on the 2GB module, but no
>> issues found.
>>
>> I still somehow suspected that something else might be wrong with my
>> hardware, so I bought a new RPi4 (this time with 4GB of RAM) but the
>> very same with that:
>>
>> U-Boot 2020.01 (Aug 23 2020 - 22:02:31 +0000)
>>
>> DRAM:  3.9 GiB
>> <freeze>
>>
>> I still think there is something wrong with caching. From what I
>> understand caches are enabled by the RPi (4) firmware. Is it safe to run
>> 32-bit ARM U-Boot with enabled caches?
>>
>> --
>> Stefan
>>
>> On 2020-08-23 19:06, Stefan Agner wrote:
>> > Hi,
>> >
>> > I noticed a quite common freeze when running 32-bit U-Boot 2020.01
>> > (rpi_4_32b_defconfig) on a 2GB RPi4 model:
>> >
>> > U-Boot 2020.01 (Aug 07 2020 - 13:00:23 +0000)
>> >
>> > DRAM:  1.9 GiB
>> > <freeze, no more output>
>> >
>> > This happens fairly often, I would say 4 out of 5 boot tries. However,
>> > if it boots, everything seems to run fine.
>> >
>> > The issue seems to go away when using 2020.04 or any newer release,
>> > however, when trying to find the actual patch fixing the issue using git
>> > bisect I ended up with a MMC merge request which really seems unrelated
>> > (36bdcf7f3b). It seems that the problem is quite evasive and disappears
>> > if certain structure are aligned differently etc.
>> >
>> > Enabling initcall debugging showed that U-Boot crashes right after
>> > relocation:
>> >
>> > ...
>> > initcall: 00016f2c
>> >
>> > RAM Configuration:
>> > Bank #0: 0 948 MiB
>> > Bank #1: 40000000 1 GiB
>> > Bank #2: 0 0 Bytes
>> > Bank #3: 0 0 Bytes
>> >
>> > DRAM:  1.9 GiB
>> > initcall: 00016bb8
>> > New Stack Pointer is: 3af6d9e0
>> > initcall: 00016da4
>> > initcall: 00016ef0
>> > initcall: 00016ef8
>> > initcall: 00016d38
>> > Relocation Offset is: 3b375000
>> > Relocating to 3b37d000, new gd at 3af78ec0, sp at 3af6d9e0
>> > initcall: 00016ec8 [clear_bss]
>> > initcall: 0004465c [display_options?? only appears sometimes]
>> > <freeze>
>> >
>> > I realized when using CONFIG_SYS_(I|D)CACHE_OFF=y the problem seems to
>> > disappear. But to be 100% certain that it is cache related, I used my
>> > original configuration (which is known to "reliably" freeze), and
>> > replaced 00016ec8 with 00008688 manually in the binary, essentially
>> > swapping out function pointers in "init_sequence_f" [00008688 is
>> > cleanup_before_linux, which flushes and disables I-cache/D-cache]. And
>> > indeed, that hacked up binary does boot reliably every time:
>> >
>> > ...
>> > initcall: 00016f2c
>> >
>> > RAM Configuration:
>> > Bank #0: 0 948 MiB
>> > Bank #1: 40000000 1 GiB
>> > Bank #2: 0 0 Bytes
>> > Bank #3: 0 0 Bytes
>> >
>> > DRAM:  1.9 GiB
>> > initcall: 00016bb8
>> > New Stack Pointer is: 3af6d9e0
>> > initcall: 00016da4
>> > initcall: 00016ef0
>> > initcall: 00016ef8
>> > initcall: 00016d38
>> > Relocation Offset is: 3b375000
>> > Relocating to 3b37d000, new gd at 3af78ec0, sp at 3af6d9e0
>> > initcall: 00008688
>> > initcall: 3b38c10c
>> > initcall: 3b38c114
>> > initcall: 000172e0 (relocated to 3b38c2e0)
>> > initcall: 0001712c (relocated to 3b38c12c)
>> > ...
>> >
>> > From what I understand on RPi4 caches are enabled when entering U-Boot.
>> > I was wondering if the relocation code really can handle that?
>> >
>> > --
>> > Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RPi4 U-Boot freeze
  2020-09-10 21:12     ` Stefan Agner
@ 2020-09-14  8:15       ` Matthias Brugger
  2020-09-17 20:56         ` Stefan Agner
  2020-09-19 11:55         ` Stefan Agner
  0 siblings, 2 replies; 10+ messages in thread
From: Matthias Brugger @ 2020-09-14  8:15 UTC (permalink / raw)
  To: u-boot



On 10/09/2020 23:12, Stefan Agner wrote:
> On 2020-09-07 16:36, Peter Robinson wrote:
>>> Any thoughts on this issue?
>>
>> Any reason why you're using 2020.01 and not at least 2020.07, or at
>> least seeing if it's reproducible on 2020.10-rc3? The RPi4 support has
>> changed quite a bit since then I suspect.
>>
> 
> Hi Peter,
> 
> It's a stable release and we support a couple of devices with the same
> U-Boot version. I'd rather prefer to stay with 2020.01 for RPi4 as well.
> 
> We are on 2020.07 on development branch, and it does work fine there. So
> I thought it can't be that hard, just bisect and backport whatever fixes
> it... Unfortunately, it seems that there is no particular commit which
> fixes it (the bisect ended up in a random unrelated commit, and it seems
> that the issue appears/disappears depending on alignment/size...).
> 
> I also did applied pretty much every RPi4 related commit made after
> 2020.01 up until master back to 2020.01, no success either.
> 

Which version of the Raspberry Pi firmware did you use?
Unfortunately changes in the FW breaks stuff on U-Boot from time to time.

Regards,
Mathias

> I fear that the problem in fact is still in master, but only appears if
> certain things align a certain way... That is why I thought I bring it
> up, to see if anybody else has noticed something along this lines. We do
> have a rather trimmed down configuration, which might make the problem
> appear more (fit in a D cache...).
> 
> I probably will just disable cached around relocation for 2020.01 and
> see if it resurfaces on development branch.
> 
> --
> Stefan
> 
> 
>>> Just to be sure, I did some memory testing on the 2GB module, but no
>>> issues found.
>>>
>>> I still somehow suspected that something else might be wrong with my
>>> hardware, so I bought a new RPi4 (this time with 4GB of RAM) but the
>>> very same with that:
>>>
>>> U-Boot 2020.01 (Aug 23 2020 - 22:02:31 +0000)
>>>
>>> DRAM:  3.9 GiB
>>> <freeze>
>>>
>>> I still think there is something wrong with caching. From what I
>>> understand caches are enabled by the RPi (4) firmware. Is it safe to run
>>> 32-bit ARM U-Boot with enabled caches?
>>>
>>> --
>>> Stefan
>>>
>>> On 2020-08-23 19:06, Stefan Agner wrote:
>>>> Hi,
>>>>
>>>> I noticed a quite common freeze when running 32-bit U-Boot 2020.01
>>>> (rpi_4_32b_defconfig) on a 2GB RPi4 model:
>>>>
>>>> U-Boot 2020.01 (Aug 07 2020 - 13:00:23 +0000)
>>>>
>>>> DRAM:  1.9 GiB
>>>> <freeze, no more output>
>>>>
>>>> This happens fairly often, I would say 4 out of 5 boot tries. However,
>>>> if it boots, everything seems to run fine.
>>>>
>>>> The issue seems to go away when using 2020.04 or any newer release,
>>>> however, when trying to find the actual patch fixing the issue using git
>>>> bisect I ended up with a MMC merge request which really seems unrelated
>>>> (36bdcf7f3b). It seems that the problem is quite evasive and disappears
>>>> if certain structure are aligned differently etc.
>>>>
>>>> Enabling initcall debugging showed that U-Boot crashes right after
>>>> relocation:
>>>>
>>>> ...
>>>> initcall: 00016f2c
>>>>
>>>> RAM Configuration:
>>>> Bank #0: 0 948 MiB
>>>> Bank #1: 40000000 1 GiB
>>>> Bank #2: 0 0 Bytes
>>>> Bank #3: 0 0 Bytes
>>>>
>>>> DRAM:  1.9 GiB
>>>> initcall: 00016bb8
>>>> New Stack Pointer is: 3af6d9e0
>>>> initcall: 00016da4
>>>> initcall: 00016ef0
>>>> initcall: 00016ef8
>>>> initcall: 00016d38
>>>> Relocation Offset is: 3b375000
>>>> Relocating to 3b37d000, new gd at 3af78ec0, sp at 3af6d9e0
>>>> initcall: 00016ec8 [clear_bss]
>>>> initcall: 0004465c [display_options?? only appears sometimes]
>>>> <freeze>
>>>>
>>>> I realized when using CONFIG_SYS_(I|D)CACHE_OFF=y the problem seems to
>>>> disappear. But to be 100% certain that it is cache related, I used my
>>>> original configuration (which is known to "reliably" freeze), and
>>>> replaced 00016ec8 with 00008688 manually in the binary, essentially
>>>> swapping out function pointers in "init_sequence_f" [00008688 is
>>>> cleanup_before_linux, which flushes and disables I-cache/D-cache]. And
>>>> indeed, that hacked up binary does boot reliably every time:
>>>>
>>>> ...
>>>> initcall: 00016f2c
>>>>
>>>> RAM Configuration:
>>>> Bank #0: 0 948 MiB
>>>> Bank #1: 40000000 1 GiB
>>>> Bank #2: 0 0 Bytes
>>>> Bank #3: 0 0 Bytes
>>>>
>>>> DRAM:  1.9 GiB
>>>> initcall: 00016bb8
>>>> New Stack Pointer is: 3af6d9e0
>>>> initcall: 00016da4
>>>> initcall: 00016ef0
>>>> initcall: 00016ef8
>>>> initcall: 00016d38
>>>> Relocation Offset is: 3b375000
>>>> Relocating to 3b37d000, new gd at 3af78ec0, sp at 3af6d9e0
>>>> initcall: 00008688
>>>> initcall: 3b38c10c
>>>> initcall: 3b38c114
>>>> initcall: 000172e0 (relocated to 3b38c2e0)
>>>> initcall: 0001712c (relocated to 3b38c12c)
>>>> ...
>>>>
>>>>  From what I understand on RPi4 caches are enabled when entering U-Boot.
>>>> I was wondering if the relocation code really can handle that?
>>>>
>>>> --
>>>> Stefan
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RPi4 U-Boot freeze
  2020-09-14  8:15       ` Matthias Brugger
@ 2020-09-17 20:56         ` Stefan Agner
  2020-09-19 11:55         ` Stefan Agner
  1 sibling, 0 replies; 10+ messages in thread
From: Stefan Agner @ 2020-09-17 20:56 UTC (permalink / raw)
  To: u-boot

On 2020-09-14 10:15, Matthias Brugger wrote:
> On 10/09/2020 23:12, Stefan Agner wrote:
>> On 2020-09-07 16:36, Peter Robinson wrote:
>>>> Any thoughts on this issue?
>>>
>>> Any reason why you're using 2020.01 and not at least 2020.07, or at
>>> least seeing if it's reproducible on 2020.10-rc3? The RPi4 support has
>>> changed quite a bit since then I suspect.
>>>
>>
>> Hi Peter,
>>
>> It's a stable release and we support a couple of devices with the same
>> U-Boot version. I'd rather prefer to stay with 2020.01 for RPi4 as well.
>>
>> We are on 2020.07 on development branch, and it does work fine there. So
>> I thought it can't be that hard, just bisect and backport whatever fixes
>> it... Unfortunately, it seems that there is no particular commit which
>> fixes it (the bisect ended up in a random unrelated commit, and it seems
>> that the issue appears/disappears depending on alignment/size...).
>>
>> I also did applied pretty much every RPi4 related commit made after
>> 2020.01 up until master back to 2020.01, no success either.
>>
> 
> Which version of the Raspberry Pi firmware did you use?
> Unfortunately changes in the FW breaks stuff on U-Boot from time to time.

The 4GB I left untouched so far, it came with the following setup:

pi at raspberrypi:~$ sudo rpi-eeprom-update
BCM2711 detected
Dedicated VL805 EEPROM detected
*** UPDATE AVAILABLE ***
BOOTLOADER: update available
CURRENT: Mon 15 Jul 12:59:55 UTC 2019 (1563195595)
 LATEST: Thu 16 Apr 17:11:26 UTC 2020 (1587057086)
 FW DIR: /lib/firmware/raspberrypi/bootloader/critical
VL805: update available
CURRENT: 00013701
 LATEST: 000137ad

The 2GB I did some firmware updates already, currently I ran it with the
following settings:

pi at raspberrypi:~$ sudo rpi-eeprom-update 
BCM2711 detected
Dedicated VL805 EEPROM detected
BOOTLOADER: up-to-date
CURRENT: Thu 16 Apr 17:11:26 UTC 2020 (1587057086)
 LATEST: Thu 16 Apr 17:11:26 UTC 2020 (1587057086)
 FW DIR: /lib/firmware/raspberrypi/bootloader/critical
VL805: up-to-date
CURRENT: 000138a1
 LATEST: 000137ad

I was able to reproduce the issue with U-Boot 2020.07, but I still have
two non-upstream patches ontop (I really can't see how they can affect
relocation, but they seem to cause a state which makes the issue
appear). I try to find a configuration which shows it without
non-upstream code.

--
Stefan


> 
> Regards,
> Mathias
> 
>> I fear that the problem in fact is still in master, but only appears if
>> certain things align a certain way... That is why I thought I bring it
>> up, to see if anybody else has noticed something along this lines. We do
>> have a rather trimmed down configuration, which might make the problem
>> appear more (fit in a D cache...).
>>
>> I probably will just disable cached around relocation for 2020.01 and
>> see if it resurfaces on development branch.
>>
>> --
>> Stefan
>>
>>
>>>> Just to be sure, I did some memory testing on the 2GB module, but no
>>>> issues found.
>>>>
>>>> I still somehow suspected that something else might be wrong with my
>>>> hardware, so I bought a new RPi4 (this time with 4GB of RAM) but the
>>>> very same with that:
>>>>
>>>> U-Boot 2020.01 (Aug 23 2020 - 22:02:31 +0000)
>>>>
>>>> DRAM:  3.9 GiB
>>>> <freeze>
>>>>
>>>> I still think there is something wrong with caching. From what I
>>>> understand caches are enabled by the RPi (4) firmware. Is it safe to run
>>>> 32-bit ARM U-Boot with enabled caches?
>>>>
>>>> --
>>>> Stefan
>>>>
>>>> On 2020-08-23 19:06, Stefan Agner wrote:
>>>>> Hi,
>>>>>
>>>>> I noticed a quite common freeze when running 32-bit U-Boot 2020.01
>>>>> (rpi_4_32b_defconfig) on a 2GB RPi4 model:
>>>>>
>>>>> U-Boot 2020.01 (Aug 07 2020 - 13:00:23 +0000)
>>>>>
>>>>> DRAM:  1.9 GiB
>>>>> <freeze, no more output>
>>>>>
>>>>> This happens fairly often, I would say 4 out of 5 boot tries. However,
>>>>> if it boots, everything seems to run fine.
>>>>>
>>>>> The issue seems to go away when using 2020.04 or any newer release,
>>>>> however, when trying to find the actual patch fixing the issue using git
>>>>> bisect I ended up with a MMC merge request which really seems unrelated
>>>>> (36bdcf7f3b). It seems that the problem is quite evasive and disappears
>>>>> if certain structure are aligned differently etc.
>>>>>
>>>>> Enabling initcall debugging showed that U-Boot crashes right after
>>>>> relocation:
>>>>>
>>>>> ...
>>>>> initcall: 00016f2c
>>>>>
>>>>> RAM Configuration:
>>>>> Bank #0: 0 948 MiB
>>>>> Bank #1: 40000000 1 GiB
>>>>> Bank #2: 0 0 Bytes
>>>>> Bank #3: 0 0 Bytes
>>>>>
>>>>> DRAM:  1.9 GiB
>>>>> initcall: 00016bb8
>>>>> New Stack Pointer is: 3af6d9e0
>>>>> initcall: 00016da4
>>>>> initcall: 00016ef0
>>>>> initcall: 00016ef8
>>>>> initcall: 00016d38
>>>>> Relocation Offset is: 3b375000
>>>>> Relocating to 3b37d000, new gd at 3af78ec0, sp at 3af6d9e0
>>>>> initcall: 00016ec8 [clear_bss]
>>>>> initcall: 0004465c [display_options?? only appears sometimes]
>>>>> <freeze>
>>>>>
>>>>> I realized when using CONFIG_SYS_(I|D)CACHE_OFF=y the problem seems to
>>>>> disappear. But to be 100% certain that it is cache related, I used my
>>>>> original configuration (which is known to "reliably" freeze), and
>>>>> replaced 00016ec8 with 00008688 manually in the binary, essentially
>>>>> swapping out function pointers in "init_sequence_f" [00008688 is
>>>>> cleanup_before_linux, which flushes and disables I-cache/D-cache]. And
>>>>> indeed, that hacked up binary does boot reliably every time:
>>>>>
>>>>> ...
>>>>> initcall: 00016f2c
>>>>>
>>>>> RAM Configuration:
>>>>> Bank #0: 0 948 MiB
>>>>> Bank #1: 40000000 1 GiB
>>>>> Bank #2: 0 0 Bytes
>>>>> Bank #3: 0 0 Bytes
>>>>>
>>>>> DRAM:  1.9 GiB
>>>>> initcall: 00016bb8
>>>>> New Stack Pointer is: 3af6d9e0
>>>>> initcall: 00016da4
>>>>> initcall: 00016ef0
>>>>> initcall: 00016ef8
>>>>> initcall: 00016d38
>>>>> Relocation Offset is: 3b375000
>>>>> Relocating to 3b37d000, new gd at 3af78ec0, sp at 3af6d9e0
>>>>> initcall: 00008688
>>>>> initcall: 3b38c10c
>>>>> initcall: 3b38c114
>>>>> initcall: 000172e0 (relocated to 3b38c2e0)
>>>>> initcall: 0001712c (relocated to 3b38c12c)
>>>>> ...
>>>>>
>>>>>  From what I understand on RPi4 caches are enabled when entering U-Boot.
>>>>> I was wondering if the relocation code really can handle that?
>>>>>
>>>>> --
>>>>> Stefan
>>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RPi4 U-Boot freeze
  2020-09-14  8:15       ` Matthias Brugger
  2020-09-17 20:56         ` Stefan Agner
@ 2020-09-19 11:55         ` Stefan Agner
  2020-09-19 21:20           ` Sean Anderson
  1 sibling, 1 reply; 10+ messages in thread
From: Stefan Agner @ 2020-09-19 11:55 UTC (permalink / raw)
  To: u-boot

On 2020-09-14 10:15, Matthias Brugger wrote:
> On 10/09/2020 23:12, Stefan Agner wrote:
>> On 2020-09-07 16:36, Peter Robinson wrote:
>>>> Any thoughts on this issue?
>>>
>>> Any reason why you're using 2020.01 and not at least 2020.07, or at
>>> least seeing if it's reproducible on 2020.10-rc3? The RPi4 support has
>>> changed quite a bit since then I suspect.
>>>
>>
>> Hi Peter,
>>
>> It's a stable release and we support a couple of devices with the same
>> U-Boot version. I'd rather prefer to stay with 2020.01 for RPi4 as well.
>>
>> We are on 2020.07 on development branch, and it does work fine there. So
>> I thought it can't be that hard, just bisect and backport whatever fixes
>> it... Unfortunately, it seems that there is no particular commit which
>> fixes it (the bisect ended up in a random unrelated commit, and it seems
>> that the issue appears/disappears depending on alignment/size...).
>>
>> I also did applied pretty much every RPi4 related commit made after
>> 2020.01 up until master back to 2020.01, no success either.
>>
> 
> Which version of the Raspberry Pi firmware did you use?
> Unfortunately changes in the FW breaks stuff on U-Boot from time to time.
> 

Ok, I am now able to reproduce the issue on master as well as 2020.07
with standard rpi_4_32b_defconfig, but I still need to have parts of a
change which seems to trigger the issue in. From what I can tell, the
change *really* should not lead to a freeze. The change is just
accessing global variables from the data section... (see below).

To me it still seems as if relocation somehow did not work correctly in
one way or another.

Are there maybe restrictions in U-Boot when the data section can be
accessed? E.g. is it not legal to access the data section from the
serial driver?


diff --git a/drivers/serial/serial_bcm283x_mu.c
b/drivers/serial/serial_bcm283x_mu.c
index 8a4af87eb6..74de6801ab 100644
--- a/drivers/serial/serial_bcm283x_mu.c
+++ b/drivers/serial/serial_bcm283x_mu.c
@@ -50,7 +50,8 @@ struct bcm283x_mu_regs {
 struct bcm283x_mu_priv {
        struct bcm283x_mu_regs *regs;
 };
-
+static char *fs_argv[15];
+static uint32_t putc_retry = 0;
 static int bcm283x_mu_serial_getc(struct udevice *dev);
 
 static int bcm283x_mu_serial_setbrg(struct udevice *dev, int baudrate)
@@ -95,6 +96,8 @@ static int bcm283x_mu_serial_putc(struct udevice *dev,
const char data)
        struct bcm283x_mu_priv *priv = dev_get_priv(dev);
        struct bcm283x_mu_regs *regs = priv->regs;
 
+       putc_retry++;
+
        /* Wait until there is space in the FIFO */
        if (!(readl(&regs->lsr) & BCM283X_MU_LSR_TX_EMPTY))
                return -EAGAIN;
@@ -162,6 +165,10 @@ static int bcm283x_mu_serial_probe(struct udevice
*dev)
        struct bcm283x_mu_priv *priv = dev_get_priv(dev);
        fdt_addr_t addr;
 
+       /* Make sure compiler does not optimize out this fs_argv
instance */
+       if (fs_argv[0])
+               fs_argv[0] = "test";
+
        /* Don't spawn the device if it's not muxed */
        if (!bcm283x_is_serial_muxed())
                return -ENODEV;

Most curious of all, it seems that the name (!!!) of the variable
fs_argv matters! I am not sure if that changes order of variables in
data section or something. I can also reproduce the issue with two
compilers (GCC 8.3 and GCC 9.2), so a compiler error seems somewhat
unlikely...

Any ideas? I am a bit out of idea how to debug this (I guess JTAG/gdb
might help, but I don't have such a setup).

FWIW, I plan to just drop the change which seems to at least partially
cause the isssue
(https://github.com/home-assistant/operating-system/blob/dev/buildroot-external/board/raspberrypi/patches/uboot/0002-avoid-block-uart-write.patch).
Still I think there is something wrong which will show itself someday in
a certain configuration.

--
Stefan


> Regards,
> Mathias
> 
>> I fear that the problem in fact is still in master, but only appears if
>> certain things align a certain way... That is why I thought I bring it
>> up, to see if anybody else has noticed something along this lines. We do
>> have a rather trimmed down configuration, which might make the problem
>> appear more (fit in a D cache...).
>>
>> I probably will just disable cached around relocation for 2020.01 and
>> see if it resurfaces on development branch.
>>
>> --
>> Stefan
>>
>>
>>>> Just to be sure, I did some memory testing on the 2GB module, but no
>>>> issues found.
>>>>
>>>> I still somehow suspected that something else might be wrong with my
>>>> hardware, so I bought a new RPi4 (this time with 4GB of RAM) but the
>>>> very same with that:
>>>>
>>>> U-Boot 2020.01 (Aug 23 2020 - 22:02:31 +0000)
>>>>
>>>> DRAM:  3.9 GiB
>>>> <freeze>
>>>>
>>>> I still think there is something wrong with caching. From what I
>>>> understand caches are enabled by the RPi (4) firmware. Is it safe to run
>>>> 32-bit ARM U-Boot with enabled caches?
>>>>
>>>> --
>>>> Stefan
>>>>
>>>> On 2020-08-23 19:06, Stefan Agner wrote:
>>>>> Hi,
>>>>>
>>>>> I noticed a quite common freeze when running 32-bit U-Boot 2020.01
>>>>> (rpi_4_32b_defconfig) on a 2GB RPi4 model:
>>>>>
>>>>> U-Boot 2020.01 (Aug 07 2020 - 13:00:23 +0000)
>>>>>
>>>>> DRAM:  1.9 GiB
>>>>> <freeze, no more output>
>>>>>
>>>>> This happens fairly often, I would say 4 out of 5 boot tries. However,
>>>>> if it boots, everything seems to run fine.
>>>>>
>>>>> The issue seems to go away when using 2020.04 or any newer release,
>>>>> however, when trying to find the actual patch fixing the issue using git
>>>>> bisect I ended up with a MMC merge request which really seems unrelated
>>>>> (36bdcf7f3b). It seems that the problem is quite evasive and disappears
>>>>> if certain structure are aligned differently etc.
>>>>>
>>>>> Enabling initcall debugging showed that U-Boot crashes right after
>>>>> relocation:
>>>>>
>>>>> ...
>>>>> initcall: 00016f2c
>>>>>
>>>>> RAM Configuration:
>>>>> Bank #0: 0 948 MiB
>>>>> Bank #1: 40000000 1 GiB
>>>>> Bank #2: 0 0 Bytes
>>>>> Bank #3: 0 0 Bytes
>>>>>
>>>>> DRAM:  1.9 GiB
>>>>> initcall: 00016bb8
>>>>> New Stack Pointer is: 3af6d9e0
>>>>> initcall: 00016da4
>>>>> initcall: 00016ef0
>>>>> initcall: 00016ef8
>>>>> initcall: 00016d38
>>>>> Relocation Offset is: 3b375000
>>>>> Relocating to 3b37d000, new gd at 3af78ec0, sp at 3af6d9e0
>>>>> initcall: 00016ec8 [clear_bss]
>>>>> initcall: 0004465c [display_options?? only appears sometimes]
>>>>> <freeze>
>>>>>
>>>>> I realized when using CONFIG_SYS_(I|D)CACHE_OFF=y the problem seems to
>>>>> disappear. But to be 100% certain that it is cache related, I used my
>>>>> original configuration (which is known to "reliably" freeze), and
>>>>> replaced 00016ec8 with 00008688 manually in the binary, essentially
>>>>> swapping out function pointers in "init_sequence_f" [00008688 is
>>>>> cleanup_before_linux, which flushes and disables I-cache/D-cache]. And
>>>>> indeed, that hacked up binary does boot reliably every time:
>>>>>
>>>>> ...
>>>>> initcall: 00016f2c
>>>>>
>>>>> RAM Configuration:
>>>>> Bank #0: 0 948 MiB
>>>>> Bank #1: 40000000 1 GiB
>>>>> Bank #2: 0 0 Bytes
>>>>> Bank #3: 0 0 Bytes
>>>>>
>>>>> DRAM:  1.9 GiB
>>>>> initcall: 00016bb8
>>>>> New Stack Pointer is: 3af6d9e0
>>>>> initcall: 00016da4
>>>>> initcall: 00016ef0
>>>>> initcall: 00016ef8
>>>>> initcall: 00016d38
>>>>> Relocation Offset is: 3b375000
>>>>> Relocating to 3b37d000, new gd at 3af78ec0, sp at 3af6d9e0
>>>>> initcall: 00008688
>>>>> initcall: 3b38c10c
>>>>> initcall: 3b38c114
>>>>> initcall: 000172e0 (relocated to 3b38c2e0)
>>>>> initcall: 0001712c (relocated to 3b38c12c)
>>>>> ...
>>>>>
>>>>>  From what I understand on RPi4 caches are enabled when entering U-Boot.
>>>>> I was wondering if the relocation code really can handle that?
>>>>>
>>>>> --
>>>>> Stefan
>>

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* RPi4 U-Boot freeze
  2020-09-19 11:55         ` Stefan Agner
@ 2020-09-19 21:20           ` Sean Anderson
  2020-09-20  9:09             ` Stefan Agner
  0 siblings, 1 reply; 10+ messages in thread
From: Sean Anderson @ 2020-09-19 21:20 UTC (permalink / raw)
  To: u-boot

On 9/19/20 7:55 AM, Stefan Agner wrote:
> On 2020-09-14 10:15, Matthias Brugger wrote:
>> On 10/09/2020 23:12, Stefan Agner wrote:
>>> On 2020-09-07 16:36, Peter Robinson wrote:
>>>>> Any thoughts on this issue?
>>>>
>>>> Any reason why you're using 2020.01 and not at least 2020.07, or at
>>>> least seeing if it's reproducible on 2020.10-rc3? The RPi4 support has
>>>> changed quite a bit since then I suspect.
>>>>
>>>
>>> Hi Peter,
>>>
>>> It's a stable release and we support a couple of devices with the same
>>> U-Boot version. I'd rather prefer to stay with 2020.01 for RPi4 as well.
>>>
>>> We are on 2020.07 on development branch, and it does work fine there. So
>>> I thought it can't be that hard, just bisect and backport whatever fixes
>>> it... Unfortunately, it seems that there is no particular commit which
>>> fixes it (the bisect ended up in a random unrelated commit, and it seems
>>> that the issue appears/disappears depending on alignment/size...).
>>>
>>> I also did applied pretty much every RPi4 related commit made after
>>> 2020.01 up until master back to 2020.01, no success either.
>>>
>>
>> Which version of the Raspberry Pi firmware did you use?
>> Unfortunately changes in the FW breaks stuff on U-Boot from time to time.
>>
> 
> Ok, I am now able to reproduce the issue on master as well as 2020.07
> with standard rpi_4_32b_defconfig, but I still need to have parts of a
> change which seems to trigger the issue in. From what I can tell, the
> change *really* should not lead to a freeze. The change is just
> accessing global variables from the data section... (see below).
> 
> To me it still seems as if relocation somehow did not work correctly in
> one way or another.
> 
> Are there maybe restrictions in U-Boot when the data section can be
> accessed? E.g. is it not legal to access the data section from the
> serial driver?

One thing bit me recently, and might be relevant here. Because
putc_retry is initialized to zero, it is located in bss and not data.
In U-Boot, bss is not accessable before relocation. The serial driver is
one of the devices which U-Boot needs before relocating, so setting
putc_retry may overwrite data in the device tree. To get around this,
you could try adding __attribute__((section(".data"))) to that variable.

--Sean

> 
> 
> diff --git a/drivers/serial/serial_bcm283x_mu.c
> b/drivers/serial/serial_bcm283x_mu.c
> index 8a4af87eb6..74de6801ab 100644
> --- a/drivers/serial/serial_bcm283x_mu.c
> +++ b/drivers/serial/serial_bcm283x_mu.c
> @@ -50,7 +50,8 @@ struct bcm283x_mu_regs {
>  struct bcm283x_mu_priv {
>         struct bcm283x_mu_regs *regs;
>  };
> -
> +static char *fs_argv[15];
> +static uint32_t putc_retry = 0;
>  static int bcm283x_mu_serial_getc(struct udevice *dev);
>  
>  static int bcm283x_mu_serial_setbrg(struct udevice *dev, int baudrate)
> @@ -95,6 +96,8 @@ static int bcm283x_mu_serial_putc(struct udevice *dev,
> const char data)
>         struct bcm283x_mu_priv *priv = dev_get_priv(dev);
>         struct bcm283x_mu_regs *regs = priv->regs;
>  
> +       putc_retry++;
> +
>         /* Wait until there is space in the FIFO */
>         if (!(readl(&regs->lsr) & BCM283X_MU_LSR_TX_EMPTY))
>                 return -EAGAIN;
> @@ -162,6 +165,10 @@ static int bcm283x_mu_serial_probe(struct udevice
> *dev)
>         struct bcm283x_mu_priv *priv = dev_get_priv(dev);
>         fdt_addr_t addr;
>  
> +       /* Make sure compiler does not optimize out this fs_argv
> instance */
> +       if (fs_argv[0])
> +               fs_argv[0] = "test";
> +
>         /* Don't spawn the device if it's not muxed */
>         if (!bcm283x_is_serial_muxed())
>                 return -ENODEV;
> 
> Most curious of all, it seems that the name (!!!) of the variable
> fs_argv matters! I am not sure if that changes order of variables in
> data section or something. I can also reproduce the issue with two
> compilers (GCC 8.3 and GCC 9.2), so a compiler error seems somewhat
> unlikely...
> 
> Any ideas? I am a bit out of idea how to debug this (I guess JTAG/gdb
> might help, but I don't have such a setup).
> 
> FWIW, I plan to just drop the change which seems to at least partially
> cause the isssue
> (https://github.com/home-assistant/operating-system/blob/dev/buildroot-external/board/raspberrypi/patches/uboot/0002-avoid-block-uart-write.patch).
> Still I think there is something wrong which will show itself someday in
> a certain configuration.
> 
> --
> Stefan
> 
> 
>> Regards,
>> Mathias
>>
>>> I fear that the problem in fact is still in master, but only appears if
>>> certain things align a certain way... That is why I thought I bring it
>>> up, to see if anybody else has noticed something along this lines. We do
>>> have a rather trimmed down configuration, which might make the problem
>>> appear more (fit in a D cache...).
>>>
>>> I probably will just disable cached around relocation for 2020.01 and
>>> see if it resurfaces on development branch.
>>>
>>> --
>>> Stefan
>>>
>>>
>>>>> Just to be sure, I did some memory testing on the 2GB module, but no
>>>>> issues found.
>>>>>
>>>>> I still somehow suspected that something else might be wrong with my
>>>>> hardware, so I bought a new RPi4 (this time with 4GB of RAM) but the
>>>>> very same with that:
>>>>>
>>>>> U-Boot 2020.01 (Aug 23 2020 - 22:02:31 +0000)
>>>>>
>>>>> DRAM:  3.9 GiB
>>>>> <freeze>
>>>>>
>>>>> I still think there is something wrong with caching. From what I
>>>>> understand caches are enabled by the RPi (4) firmware. Is it safe to run
>>>>> 32-bit ARM U-Boot with enabled caches?
>>>>>
>>>>> --
>>>>> Stefan
>>>>>
>>>>> On 2020-08-23 19:06, Stefan Agner wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I noticed a quite common freeze when running 32-bit U-Boot 2020.01
>>>>>> (rpi_4_32b_defconfig) on a 2GB RPi4 model:
>>>>>>
>>>>>> U-Boot 2020.01 (Aug 07 2020 - 13:00:23 +0000)
>>>>>>
>>>>>> DRAM:  1.9 GiB
>>>>>> <freeze, no more output>
>>>>>>
>>>>>> This happens fairly often, I would say 4 out of 5 boot tries. However,
>>>>>> if it boots, everything seems to run fine.
>>>>>>
>>>>>> The issue seems to go away when using 2020.04 or any newer release,
>>>>>> however, when trying to find the actual patch fixing the issue using git
>>>>>> bisect I ended up with a MMC merge request which really seems unrelated
>>>>>> (36bdcf7f3b). It seems that the problem is quite evasive and disappears
>>>>>> if certain structure are aligned differently etc.
>>>>>>
>>>>>> Enabling initcall debugging showed that U-Boot crashes right after
>>>>>> relocation:
>>>>>>
>>>>>> ...
>>>>>> initcall: 00016f2c
>>>>>>
>>>>>> RAM Configuration:
>>>>>> Bank #0: 0 948 MiB
>>>>>> Bank #1: 40000000 1 GiB
>>>>>> Bank #2: 0 0 Bytes
>>>>>> Bank #3: 0 0 Bytes
>>>>>>
>>>>>> DRAM:  1.9 GiB
>>>>>> initcall: 00016bb8
>>>>>> New Stack Pointer is: 3af6d9e0
>>>>>> initcall: 00016da4
>>>>>> initcall: 00016ef0
>>>>>> initcall: 00016ef8
>>>>>> initcall: 00016d38
>>>>>> Relocation Offset is: 3b375000
>>>>>> Relocating to 3b37d000, new gd at 3af78ec0, sp at 3af6d9e0
>>>>>> initcall: 00016ec8 [clear_bss]
>>>>>> initcall: 0004465c [display_options?? only appears sometimes]
>>>>>> <freeze>
>>>>>>
>>>>>> I realized when using CONFIG_SYS_(I|D)CACHE_OFF=y the problem seems to
>>>>>> disappear. But to be 100% certain that it is cache related, I used my
>>>>>> original configuration (which is known to "reliably" freeze), and
>>>>>> replaced 00016ec8 with 00008688 manually in the binary, essentially
>>>>>> swapping out function pointers in "init_sequence_f" [00008688 is
>>>>>> cleanup_before_linux, which flushes and disables I-cache/D-cache]. And
>>>>>> indeed, that hacked up binary does boot reliably every time:
>>>>>>
>>>>>> ...
>>>>>> initcall: 00016f2c
>>>>>>
>>>>>> RAM Configuration:
>>>>>> Bank #0: 0 948 MiB
>>>>>> Bank #1: 40000000 1 GiB
>>>>>> Bank #2: 0 0 Bytes
>>>>>> Bank #3: 0 0 Bytes
>>>>>>
>>>>>> DRAM:  1.9 GiB
>>>>>> initcall: 00016bb8
>>>>>> New Stack Pointer is: 3af6d9e0
>>>>>> initcall: 00016da4
>>>>>> initcall: 00016ef0
>>>>>> initcall: 00016ef8
>>>>>> initcall: 00016d38
>>>>>> Relocation Offset is: 3b375000
>>>>>> Relocating to 3b37d000, new gd at 3af78ec0, sp at 3af6d9e0
>>>>>> initcall: 00008688
>>>>>> initcall: 3b38c10c
>>>>>> initcall: 3b38c114
>>>>>> initcall: 000172e0 (relocated to 3b38c2e0)
>>>>>> initcall: 0001712c (relocated to 3b38c12c)
>>>>>> ...
>>>>>>
>>>>>>  From what I understand on RPi4 caches are enabled when entering U-Boot.
>>>>>> I was wondering if the relocation code really can handle that?
>>>>>>
>>>>>> --
>>>>>> Stefan
>>>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RPi4 U-Boot freeze
  2020-09-19 21:20           ` Sean Anderson
@ 2020-09-20  9:09             ` Stefan Agner
  0 siblings, 0 replies; 10+ messages in thread
From: Stefan Agner @ 2020-09-20  9:09 UTC (permalink / raw)
  To: u-boot

On 2020-09-19 23:20, Sean Anderson wrote:
> On 9/19/20 7:55 AM, Stefan Agner wrote:
>> On 2020-09-14 10:15, Matthias Brugger wrote:
>>> On 10/09/2020 23:12, Stefan Agner wrote:
>>>> On 2020-09-07 16:36, Peter Robinson wrote:
>>>>>> Any thoughts on this issue?
>>>>>
>>>>> Any reason why you're using 2020.01 and not at least 2020.07, or at
>>>>> least seeing if it's reproducible on 2020.10-rc3? The RPi4 support has
>>>>> changed quite a bit since then I suspect.
>>>>>
>>>>
>>>> Hi Peter,
>>>>
>>>> It's a stable release and we support a couple of devices with the same
>>>> U-Boot version. I'd rather prefer to stay with 2020.01 for RPi4 as well.
>>>>
>>>> We are on 2020.07 on development branch, and it does work fine there. So
>>>> I thought it can't be that hard, just bisect and backport whatever fixes
>>>> it... Unfortunately, it seems that there is no particular commit which
>>>> fixes it (the bisect ended up in a random unrelated commit, and it seems
>>>> that the issue appears/disappears depending on alignment/size...).
>>>>
>>>> I also did applied pretty much every RPi4 related commit made after
>>>> 2020.01 up until master back to 2020.01, no success either.
>>>>
>>>
>>> Which version of the Raspberry Pi firmware did you use?
>>> Unfortunately changes in the FW breaks stuff on U-Boot from time to time.
>>>
>>
>> Ok, I am now able to reproduce the issue on master as well as 2020.07
>> with standard rpi_4_32b_defconfig, but I still need to have parts of a
>> change which seems to trigger the issue in. From what I can tell, the
>> change *really* should not lead to a freeze. The change is just
>> accessing global variables from the data section... (see below).
>>
>> To me it still seems as if relocation somehow did not work correctly in
>> one way or another.
>>
>> Are there maybe restrictions in U-Boot when the data section can be
>> accessed? E.g. is it not legal to access the data section from the
>> serial driver?
> 
> One thing bit me recently, and might be relevant here. Because
> putc_retry is initialized to zero, it is located in bss and not data.
> In U-Boot, bss is not accessable before relocation. The serial driver is
> one of the devices which U-Boot needs before relocating, so setting
> putc_retry may overwrite data in the device tree. To get around this,
> you could try adding __attribute__((section(".data"))) to that variable.

Thanks a lot for that hint, I did not know that bss is not accessible
before relocation.

That indeed seems to be the problem! Actually just removing the
initialization is fine here, as global variables are initialized to zero
anyways (from what it seems also in U-Boot's implementation).

I wonder if some compile time warning/check could be introduced, but I
guess there is not really a way to know what code is used before
relocation.

Anyways, thanks a lot, sorry for the noise. We can consider this solved.

--
Stefan



> 
> --Sean
> 
>>
>>
>> diff --git a/drivers/serial/serial_bcm283x_mu.c
>> b/drivers/serial/serial_bcm283x_mu.c
>> index 8a4af87eb6..74de6801ab 100644
>> --- a/drivers/serial/serial_bcm283x_mu.c
>> +++ b/drivers/serial/serial_bcm283x_mu.c
>> @@ -50,7 +50,8 @@ struct bcm283x_mu_regs {
>>  struct bcm283x_mu_priv {
>>         struct bcm283x_mu_regs *regs;
>>  };
>> -
>> +static char *fs_argv[15];
>> +static uint32_t putc_retry = 0;
>>  static int bcm283x_mu_serial_getc(struct udevice *dev);
>>
>>  static int bcm283x_mu_serial_setbrg(struct udevice *dev, int baudrate)
>> @@ -95,6 +96,8 @@ static int bcm283x_mu_serial_putc(struct udevice *dev,
>> const char data)
>>         struct bcm283x_mu_priv *priv = dev_get_priv(dev);
>>         struct bcm283x_mu_regs *regs = priv->regs;
>>
>> +       putc_retry++;
>> +
>>         /* Wait until there is space in the FIFO */
>>         if (!(readl(&regs->lsr) & BCM283X_MU_LSR_TX_EMPTY))
>>                 return -EAGAIN;
>> @@ -162,6 +165,10 @@ static int bcm283x_mu_serial_probe(struct udevice
>> *dev)
>>         struct bcm283x_mu_priv *priv = dev_get_priv(dev);
>>         fdt_addr_t addr;
>>
>> +       /* Make sure compiler does not optimize out this fs_argv
>> instance */
>> +       if (fs_argv[0])
>> +               fs_argv[0] = "test";
>> +
>>         /* Don't spawn the device if it's not muxed */
>>         if (!bcm283x_is_serial_muxed())
>>                 return -ENODEV;
>>
>> Most curious of all, it seems that the name (!!!) of the variable
>> fs_argv matters! I am not sure if that changes order of variables in
>> data section or something. I can also reproduce the issue with two
>> compilers (GCC 8.3 and GCC 9.2), so a compiler error seems somewhat
>> unlikely...
>>
>> Any ideas? I am a bit out of idea how to debug this (I guess JTAG/gdb
>> might help, but I don't have such a setup).
>>
>> FWIW, I plan to just drop the change which seems to at least partially
>> cause the isssue
>> (https://github.com/home-assistant/operating-system/blob/dev/buildroot-external/board/raspberrypi/patches/uboot/0002-avoid-block-uart-write.patch).
>> Still I think there is something wrong which will show itself someday in
>> a certain configuration.
>>
>> --
>> Stefan
>>
>>
>>> Regards,
>>> Mathias
>>>
>>>> I fear that the problem in fact is still in master, but only appears if
>>>> certain things align a certain way... That is why I thought I bring it
>>>> up, to see if anybody else has noticed something along this lines. We do
>>>> have a rather trimmed down configuration, which might make the problem
>>>> appear more (fit in a D cache...).
>>>>
>>>> I probably will just disable cached around relocation for 2020.01 and
>>>> see if it resurfaces on development branch.
>>>>
>>>> --
>>>> Stefan
>>>>
>>>>
>>>>>> Just to be sure, I did some memory testing on the 2GB module, but no
>>>>>> issues found.
>>>>>>
>>>>>> I still somehow suspected that something else might be wrong with my
>>>>>> hardware, so I bought a new RPi4 (this time with 4GB of RAM) but the
>>>>>> very same with that:
>>>>>>
>>>>>> U-Boot 2020.01 (Aug 23 2020 - 22:02:31 +0000)
>>>>>>
>>>>>> DRAM:  3.9 GiB
>>>>>> <freeze>
>>>>>>
>>>>>> I still think there is something wrong with caching. From what I
>>>>>> understand caches are enabled by the RPi (4) firmware. Is it safe to run
>>>>>> 32-bit ARM U-Boot with enabled caches?
>>>>>>
>>>>>> --
>>>>>> Stefan
>>>>>>
>>>>>> On 2020-08-23 19:06, Stefan Agner wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I noticed a quite common freeze when running 32-bit U-Boot 2020.01
>>>>>>> (rpi_4_32b_defconfig) on a 2GB RPi4 model:
>>>>>>>
>>>>>>> U-Boot 2020.01 (Aug 07 2020 - 13:00:23 +0000)
>>>>>>>
>>>>>>> DRAM:  1.9 GiB
>>>>>>> <freeze, no more output>
>>>>>>>
>>>>>>> This happens fairly often, I would say 4 out of 5 boot tries. However,
>>>>>>> if it boots, everything seems to run fine.
>>>>>>>
>>>>>>> The issue seems to go away when using 2020.04 or any newer release,
>>>>>>> however, when trying to find the actual patch fixing the issue using git
>>>>>>> bisect I ended up with a MMC merge request which really seems unrelated
>>>>>>> (36bdcf7f3b). It seems that the problem is quite evasive and disappears
>>>>>>> if certain structure are aligned differently etc.
>>>>>>>
>>>>>>> Enabling initcall debugging showed that U-Boot crashes right after
>>>>>>> relocation:
>>>>>>>
>>>>>>> ...
>>>>>>> initcall: 00016f2c
>>>>>>>
>>>>>>> RAM Configuration:
>>>>>>> Bank #0: 0 948 MiB
>>>>>>> Bank #1: 40000000 1 GiB
>>>>>>> Bank #2: 0 0 Bytes
>>>>>>> Bank #3: 0 0 Bytes
>>>>>>>
>>>>>>> DRAM:  1.9 GiB
>>>>>>> initcall: 00016bb8
>>>>>>> New Stack Pointer is: 3af6d9e0
>>>>>>> initcall: 00016da4
>>>>>>> initcall: 00016ef0
>>>>>>> initcall: 00016ef8
>>>>>>> initcall: 00016d38
>>>>>>> Relocation Offset is: 3b375000
>>>>>>> Relocating to 3b37d000, new gd at 3af78ec0, sp at 3af6d9e0
>>>>>>> initcall: 00016ec8 [clear_bss]
>>>>>>> initcall: 0004465c [display_options?? only appears sometimes]
>>>>>>> <freeze>
>>>>>>>
>>>>>>> I realized when using CONFIG_SYS_(I|D)CACHE_OFF=y the problem seems to
>>>>>>> disappear. But to be 100% certain that it is cache related, I used my
>>>>>>> original configuration (which is known to "reliably" freeze), and
>>>>>>> replaced 00016ec8 with 00008688 manually in the binary, essentially
>>>>>>> swapping out function pointers in "init_sequence_f" [00008688 is
>>>>>>> cleanup_before_linux, which flushes and disables I-cache/D-cache]. And
>>>>>>> indeed, that hacked up binary does boot reliably every time:
>>>>>>>
>>>>>>> ...
>>>>>>> initcall: 00016f2c
>>>>>>>
>>>>>>> RAM Configuration:
>>>>>>> Bank #0: 0 948 MiB
>>>>>>> Bank #1: 40000000 1 GiB
>>>>>>> Bank #2: 0 0 Bytes
>>>>>>> Bank #3: 0 0 Bytes
>>>>>>>
>>>>>>> DRAM:  1.9 GiB
>>>>>>> initcall: 00016bb8
>>>>>>> New Stack Pointer is: 3af6d9e0
>>>>>>> initcall: 00016da4
>>>>>>> initcall: 00016ef0
>>>>>>> initcall: 00016ef8
>>>>>>> initcall: 00016d38
>>>>>>> Relocation Offset is: 3b375000
>>>>>>> Relocating to 3b37d000, new gd at 3af78ec0, sp at 3af6d9e0
>>>>>>> initcall: 00008688
>>>>>>> initcall: 3b38c10c
>>>>>>> initcall: 3b38c114
>>>>>>> initcall: 000172e0 (relocated to 3b38c2e0)
>>>>>>> initcall: 0001712c (relocated to 3b38c12c)
>>>>>>> ...
>>>>>>>
>>>>>>>  From what I understand on RPi4 caches are enabled when entering U-Boot.
>>>>>>> I was wondering if the relocation code really can handle that?
>>>>>>>
>>>>>>> --
>>>>>>> Stefan
>>>>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-09-20  9:09 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-23 17:06 RPi4 U-Boot freeze Stefan Agner
2020-09-03 11:23 ` Stefan Agner
2020-09-04  1:48   ` Bin Meng
2020-09-07 14:36   ` Peter Robinson
2020-09-10 21:12     ` Stefan Agner
2020-09-14  8:15       ` Matthias Brugger
2020-09-17 20:56         ` Stefan Agner
2020-09-19 11:55         ` Stefan Agner
2020-09-19 21:20           ` Sean Anderson
2020-09-20  9:09             ` Stefan Agner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.