All of lore.kernel.org
 help / color / mirror / Atom feed
* [U-Boot] MIPS (mt7688): EBase change in U-Boot breaks Linux
@ 2018-12-12  8:18 Stefan Roese
  2018-12-12 11:21 ` Horatiu Vultur
  2018-12-13  1:00 ` Daniel Schwierzeck
  0 siblings, 2 replies; 22+ messages in thread
From: Stefan Roese @ 2018-12-12  8:18 UTC (permalink / raw)
  To: u-boot

Hi!

I've been hunting for a problem for quite some time, where Linux
hangs / crashes in userspace at some point on my MT7688 based
systems. I found that this problem can be avoided (worked around)
by not giving Linux the full memory (by using DT memory node fixup
or mem= kernel cmdline). When reducing this memory by the memory
used by U-Boot (stack pointer minus some KiB value as this is the
"lowest" memory used by U-Boot), then Linux runs just fine.

My first idea here was, that this issue is cache related (most
likely I-cache). But all tests and debugging in this area did not
fix this issue (even running with caches disabled).

Finally I found that this line in U-Boot makes Linux break:

arch/mips/lib/traps.c:

void trap_init(ulong reloc_addr)
	unsigned long ebase = gd->irq_sp;
	...
	write_c0_ebase(ebase);

This sets EBase to something like 0x87e9b000 on my system (128MiB).
And Linux then re-uses this value and copies the exceptions handlers
to this address, overwriting random code and leading to an unstable
system.

So my questions now is, how should this be handled on the MT7688
platform instead? One way would be to set EBase back to the
original value (0x80000000) before booting into Linux. Another
solution would be to add some Linux code like board_ebase_setup()
to the MT7688 Linux port.

Since I'm still no real MIPS expert yet, I would really like to get
some advise here on how to best solve this issue. Maybe I missed
something. Comments?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [U-Boot] MIPS (mt7688): EBase change in U-Boot breaks Linux
  2018-12-12  8:18 [U-Boot] MIPS (mt7688): EBase change in U-Boot breaks Linux Stefan Roese
@ 2018-12-12 11:21 ` Horatiu Vultur
  2018-12-12 11:41   ` Stefan Roese
  2018-12-13  1:00 ` Daniel Schwierzeck
  1 sibling, 1 reply; 22+ messages in thread
From: Horatiu Vultur @ 2018-12-12 11:21 UTC (permalink / raw)
  To: u-boot

Hi Stefan,

Is your Linux Kernel compile with CONFIG_CPU_MIPSR2_IRQ_VI? Because we
had similar issue with two of our boards(Ocelot and Luton).

In our case the problem was that that Linux Kernel didn't reserve memory
for the addresses pointed by ebase register and then later the kernel
used this address overwriting the interupt vector, that lead to random
crash.

The 12/12/2018 09:18, Stefan Roese wrote:
> Hi!
> 
> I've been hunting for a problem for quite some time, where Linux
> hangs / crashes in userspace at some point on my MT7688 based
> systems. I found that this problem can be avoided (worked around)
> by not giving Linux the full memory (by using DT memory node fixup
> or mem= kernel cmdline). When reducing this memory by the memory
> used by U-Boot (stack pointer minus some KiB value as this is the
> "lowest" memory used by U-Boot), then Linux runs just fine.
> 
> My first idea here was, that this issue is cache related (most
> likely I-cache). But all tests and debugging in this area did not
> fix this issue (even running with caches disabled).
> 
> Finally I found that this line in U-Boot makes Linux break:
> 
> arch/mips/lib/traps.c:
> 
> void trap_init(ulong reloc_addr)
> 	unsigned long ebase = gd->irq_sp;
> 	...
> 	write_c0_ebase(ebase);
> 
> This sets EBase to something like 0x87e9b000 on my system (128MiB).
> And Linux then re-uses this value and copies the exceptions handlers
> to this address, overwriting random code and leading to an unstable
> system.
> 
> So my questions now is, how should this be handled on the MT7688
> platform instead? One way would be to set EBase back to the
> original value (0x80000000) before booting into Linux. Another
> solution would be to add some Linux code like board_ebase_setup()
> to the MT7688 Linux port.
> 
> Since I'm still no real MIPS expert yet, I would really like to get
> some advise here on how to best solve this issue. Maybe I missed
> something. Comments?
> 
> Thanks,
> Stefan
> _______________________________________________
> U-Boot mailing list
> U-Boot at lists.denx.de
> https://lists.denx.de/listinfo/u-boot

-- 
/Horatiu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [U-Boot] MIPS (mt7688): EBase change in U-Boot breaks Linux
  2018-12-12 11:21 ` Horatiu Vultur
@ 2018-12-12 11:41   ` Stefan Roese
  2018-12-13  9:42     ` Horatiu Vultur
  0 siblings, 1 reply; 22+ messages in thread
From: Stefan Roese @ 2018-12-12 11:41 UTC (permalink / raw)
  To: u-boot

Hi Horatiu Vultur,

On 12.12.18 12:21, Horatiu Vultur wrote:
> Is your Linux Kernel compile with CONFIG_CPU_MIPSR2_IRQ_VI? Because we
> had similar issue with two of our boards(Ocelot and Luton).

No, its not configured for this MT7688 / RAMIPS SoC. Enabling this
option does fix this issue. Many thanks for the suggestion.

BTW: Should CPU_MIPSR2_IRQ_EI probably also be set?
  
> In our case the problem was that that Linux Kernel didn't reserve memory
> for the addresses pointed by ebase register and then later the kernel
> used this address overwriting the interupt vector, that lead to random
> crash.

Exactly what we've observed here. It took quite some debugging to
finally find the root-cause for this.

Thanks,
Stefan
  
> The 12/12/2018 09:18, Stefan Roese wrote:
>> Hi!
>>
>> I've been hunting for a problem for quite some time, where Linux
>> hangs / crashes in userspace at some point on my MT7688 based
>> systems. I found that this problem can be avoided (worked around)
>> by not giving Linux the full memory (by using DT memory node fixup
>> or mem= kernel cmdline). When reducing this memory by the memory
>> used by U-Boot (stack pointer minus some KiB value as this is the
>> "lowest" memory used by U-Boot), then Linux runs just fine.
>>
>> My first idea here was, that this issue is cache related (most
>> likely I-cache). But all tests and debugging in this area did not
>> fix this issue (even running with caches disabled).
>>
>> Finally I found that this line in U-Boot makes Linux break:
>>
>> arch/mips/lib/traps.c:
>>
>> void trap_init(ulong reloc_addr)
>> 	unsigned long ebase = gd->irq_sp;
>> 	...
>> 	write_c0_ebase(ebase);
>>
>> This sets EBase to something like 0x87e9b000 on my system (128MiB).
>> And Linux then re-uses this value and copies the exceptions handlers
>> to this address, overwriting random code and leading to an unstable
>> system.
>>
>> So my questions now is, how should this be handled on the MT7688
>> platform instead? One way would be to set EBase back to the
>> original value (0x80000000) before booting into Linux. Another
>> solution would be to add some Linux code like board_ebase_setup()
>> to the MT7688 Linux port.
>>
>> Since I'm still no real MIPS expert yet, I would really like to get
>> some advise here on how to best solve this issue. Maybe I missed
>> something. Comments?
>>
>> Thanks,
>> Stefan
>> _______________________________________________
>> U-Boot mailing list
>> U-Boot at lists.denx.de
>> https://lists.denx.de/listinfo/u-boot
> 

Viele Grüße,
Stefan

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-51 Fax: (+49)-8142-66989-80 Email: sr at denx.de

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [U-Boot] MIPS (mt7688): EBase change in U-Boot breaks Linux
  2018-12-12  8:18 [U-Boot] MIPS (mt7688): EBase change in U-Boot breaks Linux Stefan Roese
  2018-12-12 11:21 ` Horatiu Vultur
@ 2018-12-13  1:00 ` Daniel Schwierzeck
  2018-12-13 10:09   ` Stefan Roese
  1 sibling, 1 reply; 22+ messages in thread
From: Daniel Schwierzeck @ 2018-12-13  1:00 UTC (permalink / raw)
  To: u-boot

Hi Stefan,

Am 12.12.18 um 09:18 schrieb Stefan Roese:
> Hi!
> 
> I've been hunting for a problem for quite some time, where Linux
> hangs / crashes in userspace at some point on my MT7688 based
> systems. I found that this problem can be avoided (worked around)
> by not giving Linux the full memory (by using DT memory node fixup
> or mem= kernel cmdline). When reducing this memory by the memory
> used by U-Boot (stack pointer minus some KiB value as this is the
> "lowest" memory used by U-Boot), then Linux runs just fine.
> 
> My first idea here was, that this issue is cache related (most
> likely I-cache). But all tests and debugging in this area did not
> fix this issue (even running with caches disabled).
> 
> Finally I found that this line in U-Boot makes Linux break:
> 
> arch/mips/lib/traps.c:
> 
> void trap_init(ulong reloc_addr)
>     unsigned long ebase = gd->irq_sp;
>     ...
>     write_c0_ebase(ebase);
> 
> This sets EBase to something like 0x87e9b000 on my system (128MiB).
> And Linux then re-uses this value and copies the exceptions handlers
> to this address, overwriting random code and leading to an unstable
> system.
> 
> So my questions now is, how should this be handled on the MT7688
> platform instead? One way would be to set EBase back to the
> original value (0x80000000) before booting into Linux. Another
> solution would be to add some Linux code like board_ebase_setup()
> to the MT7688 Linux port.
> 
> Since I'm still no real MIPS expert yet, I would really like to get
> some advise here on how to best solve this issue. Maybe I missed
> something. Comments?
> 
> Thanks,
> Stefan

the relevant code is in arch/mips/kernel/traps.c:trap_init():

Within the branch if (cpu_has_veic || cpu_has_vint) the kernel will
allocate memory for the exception vectors and resets ebase to that memory.

In the else branch ebase is statically assigned to CAC_BASE which should
resolve to 0x80000000 on Ralink platform. The ebase is only read from
CP0 for MIPS r6 CPUs.

So the ebase set by U-Boot shouldn't be relevant for Ralink platform.
More likely some code at 0x80000000 is overwritten when installing the
exception handlers because all Ralink SoCs except MT7621 have
0xffffffff80000000 defined as load address. So adding something like
0x1000 should fix your problem too.

AFAIK the CPU probing should detect and set cpu_has_veic accordingly.
Maybe it's a bug by Ralink to not set this bit. I guess that's why a
platform could provide a cpu-feature-overrides.h. Or you could configure
CPU_MIPSR2_IRQ_VI as Horatio stated in his response.

@Paul regarding MIPS r6, is there some expectation of the bootloader to
set ebase to a reasonable value or to not change the value at all? Maybe
we need to fix U-Boot?

-- 
- Daniel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [U-Boot] MIPS (mt7688): EBase change in U-Boot breaks Linux
  2018-12-12 11:41   ` Stefan Roese
@ 2018-12-13  9:42     ` Horatiu Vultur
  0 siblings, 0 replies; 22+ messages in thread
From: Horatiu Vultur @ 2018-12-13  9:42 UTC (permalink / raw)
  To: u-boot

Hi Stefan,

The 12/12/2018 12:41, Stefan Roese wrote:
> Hi Horatiu Vultur,
> 
> On 12.12.18 12:21, Horatiu Vultur wrote:
> > Is your Linux Kernel compile with CONFIG_CPU_MIPSR2_IRQ_VI? Because we
> > had similar issue with two of our boards(Ocelot and Luton).
> 
> No, its not configured for this MT7688 / RAMIPS SoC. Enabling this
> option does fix this issue. Many thanks for the suggestion.

I am glad that it worked.

> 
> BTW: Should CPU_MIPSR2_IRQ_EI probably also be set?

Well, it depends on the interrupt controller that you have. In our case
we didn't set it. But I am not an MIPS expert, so I may be wrong.

> > In our case the problem was that that Linux Kernel didn't reserve memory
> > for the addresses pointed by ebase register and then later the kernel
> > used this address overwriting the interupt vector, that lead to random
> > crash.
> 
> Exactly what we've observed here. It took quite some debugging to
> finally find the root-cause for this.
> 
> Thanks,
> Stefan
> > The 12/12/2018 09:18, Stefan Roese wrote:
> > > Hi!
> > > 
> > > I've been hunting for a problem for quite some time, where Linux
> > > hangs / crashes in userspace at some point on my MT7688 based
> > > systems. I found that this problem can be avoided (worked around)
> > > by not giving Linux the full memory (by using DT memory node fixup
> > > or mem= kernel cmdline). When reducing this memory by the memory
> > > used by U-Boot (stack pointer minus some KiB value as this is the
> > > "lowest" memory used by U-Boot), then Linux runs just fine.
> > > 
> > > My first idea here was, that this issue is cache related (most
> > > likely I-cache). But all tests and debugging in this area did not
> > > fix this issue (even running with caches disabled).
> > > 
> > > Finally I found that this line in U-Boot makes Linux break:
> > > 
> > > arch/mips/lib/traps.c:
> > > 
> > > void trap_init(ulong reloc_addr)
> > > 	unsigned long ebase = gd->irq_sp;
> > > 	...
> > > 	write_c0_ebase(ebase);
> > > 
> > > This sets EBase to something like 0x87e9b000 on my system (128MiB).
> > > And Linux then re-uses this value and copies the exceptions handlers
> > > to this address, overwriting random code and leading to an unstable
> > > system.
> > > 
> > > So my questions now is, how should this be handled on the MT7688
> > > platform instead? One way would be to set EBase back to the
> > > original value (0x80000000) before booting into Linux. Another
> > > solution would be to add some Linux code like board_ebase_setup()
> > > to the MT7688 Linux port.
> > > 
> > > Since I'm still no real MIPS expert yet, I would really like to get
> > > some advise here on how to best solve this issue. Maybe I missed
> > > something. Comments?
> > > 
> > > Thanks,
> > > Stefan
> > > _______________________________________________
> > > U-Boot mailing list
> > > U-Boot at lists.denx.de
> > > https://lists.denx.de/listinfo/u-boot
> > 
> 
> Viele Grüße,
> Stefan
> 
> -- 
> DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-51 Fax: (+49)-8142-66989-80 Email: sr at denx.de

-- 
/Horatiu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [U-Boot] MIPS (mt7688): EBase change in U-Boot breaks Linux
  2018-12-13  1:00 ` Daniel Schwierzeck
@ 2018-12-13 10:09   ` Stefan Roese
  2018-12-13 13:27     ` Daniel Schwierzeck
  0 siblings, 1 reply; 22+ messages in thread
From: Stefan Roese @ 2018-12-13 10:09 UTC (permalink / raw)
  To: u-boot

Hi Daniel,

On 13.12.18 02:00, Daniel Schwierzeck wrote:
> Am 12.12.18 um 09:18 schrieb Stefan Roese:
>> Hi!
>>
>> I've been hunting for a problem for quite some time, where Linux
>> hangs / crashes in userspace at some point on my MT7688 based
>> systems. I found that this problem can be avoided (worked around)
>> by not giving Linux the full memory (by using DT memory node fixup
>> or mem= kernel cmdline). When reducing this memory by the memory
>> used by U-Boot (stack pointer minus some KiB value as this is the
>> "lowest" memory used by U-Boot), then Linux runs just fine.
>>
>> My first idea here was, that this issue is cache related (most
>> likely I-cache). But all tests and debugging in this area did not
>> fix this issue (even running with caches disabled).
>>
>> Finally I found that this line in U-Boot makes Linux break:
>>
>> arch/mips/lib/traps.c:
>>
>> void trap_init(ulong reloc_addr)
>>      unsigned long ebase = gd->irq_sp;
>>      ...
>>      write_c0_ebase(ebase);
>>
>> This sets EBase to something like 0x87e9b000 on my system (128MiB).
>> And Linux then re-uses this value and copies the exceptions handlers
>> to this address, overwriting random code and leading to an unstable
>> system.
>>
>> So my questions now is, how should this be handled on the MT7688
>> platform instead? One way would be to set EBase back to the
>> original value (0x80000000) before booting into Linux. Another
>> solution would be to add some Linux code like board_ebase_setup()
>> to the MT7688 Linux port.
>>
>> Since I'm still no real MIPS expert yet, I would really like to get
>> some advise here on how to best solve this issue. Maybe I missed
>> something. Comments?
>>
>> Thanks,
>> Stefan
> 
> the relevant code is in arch/mips/kernel/traps.c:trap_init():
> 
> Within the branch if (cpu_has_veic || cpu_has_vint) the kernel will
> allocate memory for the exception vectors and resets ebase to that memory.

This branch currently is not taken on this SoC (Mediatek / Ralink).
  
> In the else branch ebase is statically assigned to CAC_BASE which should
> resolve to 0x80000000 on Ralink platform. The ebase is only read from
> CP0 for MIPS r6 CPUs.

Without CPU_MIPSR2_IRQ_VI being set (as its currently the case), this
is how this function is run:

	if (cpu_has_veic || cpu_has_vint) {
		...
	} else {
		*** this is true for Ralink / Mediatek
		...
		if (cpu_has_mips_r2_r6) {
			if (cpu_has_ebase_wg) {
				...
			} else {
				*** this is true for Ralink / Mediatek
				...

So in summary, ebase is not allocated but assigned to this value:

	ebase = CAC_BASE + read_c0_ebase() & 0x3ffff000;

which of course leads to this issues we observed.
  
> So the ebase set by U-Boot shouldn't be relevant for Ralink platform.

Why so?

> More likely some code at 0x80000000 is overwritten when installing the
> exception handlers because all Ralink SoCs except MT7621 have
> 0xffffffff80000000 defined as load address. So adding something like
> 0x1000 should fix your problem too.

Hmmm, not sure that I fully understand this. Could you please explain
again?
  
> AFAIK the CPU probing should detect and set cpu_has_veic accordingly.

Yes, I agree.

> Maybe it's a bug by Ralink to not set this bit. I guess that's why a
> platform could provide a cpu-feature-overrides.h. Or you could configure
> CPU_MIPSR2_IRQ_VI as Horatio stated in his response.

I just checked in decode_config3() and MIPS_CPU_VEIC is not set on
this SoC (config3=00002420 MIPS_CONF3_VEIC=00000040).
  
> @Paul regarding MIPS r6, is there some expectation of the bootloader to
> set ebase to a reasonable value or to not change the value at all? Maybe
> we need to fix U-Boot?

Yes, some advise on how to fix this would be very welcome. I can easily
add CPU_MIPSR2_IRQ_VI and send a patch for this as well.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [U-Boot] MIPS (mt7688): EBase change in U-Boot breaks Linux
  2018-12-13 10:09   ` Stefan Roese
@ 2018-12-13 13:27     ` Daniel Schwierzeck
  2018-12-13 13:35       ` Stefan Roese
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Schwierzeck @ 2018-12-13 13:27 UTC (permalink / raw)
  To: u-boot

Am Do., 13. Dez. 2018 um 11:09 Uhr schrieb Stefan Roese <sr@denx.de>:
>
> Hi Daniel,
>
> On 13.12.18 02:00, Daniel Schwierzeck wrote:
> > Am 12.12.18 um 09:18 schrieb Stefan Roese:
> >> Hi!
> >>
> >> I've been hunting for a problem for quite some time, where Linux
> >> hangs / crashes in userspace at some point on my MT7688 based
> >> systems. I found that this problem can be avoided (worked around)
> >> by not giving Linux the full memory (by using DT memory node fixup
> >> or mem= kernel cmdline). When reducing this memory by the memory
> >> used by U-Boot (stack pointer minus some KiB value as this is the
> >> "lowest" memory used by U-Boot), then Linux runs just fine.
> >>
> >> My first idea here was, that this issue is cache related (most
> >> likely I-cache). But all tests and debugging in this area did not
> >> fix this issue (even running with caches disabled).
> >>
> >> Finally I found that this line in U-Boot makes Linux break:
> >>
> >> arch/mips/lib/traps.c:
> >>
> >> void trap_init(ulong reloc_addr)
> >>      unsigned long ebase = gd->irq_sp;
> >>      ...
> >>      write_c0_ebase(ebase);
> >>
> >> This sets EBase to something like 0x87e9b000 on my system (128MiB).
> >> And Linux then re-uses this value and copies the exceptions handlers
> >> to this address, overwriting random code and leading to an unstable
> >> system.
> >>
> >> So my questions now is, how should this be handled on the MT7688
> >> platform instead? One way would be to set EBase back to the
> >> original value (0x80000000) before booting into Linux. Another
> >> solution would be to add some Linux code like board_ebase_setup()
> >> to the MT7688 Linux port.
> >>
> >> Since I'm still no real MIPS expert yet, I would really like to get
> >> some advise here on how to best solve this issue. Maybe I missed
> >> something. Comments?
> >>
> >> Thanks,
> >> Stefan
> >
> > the relevant code is in arch/mips/kernel/traps.c:trap_init():
> >
> > Within the branch if (cpu_has_veic || cpu_has_vint) the kernel will
> > allocate memory for the exception vectors and resets ebase to that memory.
>
> This branch currently is not taken on this SoC (Mediatek / Ralink).
>
> > In the else branch ebase is statically assigned to CAC_BASE which should
> > resolve to 0x80000000 on Ralink platform. The ebase is only read from
> > CP0 for MIPS r6 CPUs.
>
> Without CPU_MIPSR2_IRQ_VI being set (as its currently the case), this
> is how this function is run:
>
>         if (cpu_has_veic || cpu_has_vint) {
>                 ...
>         } else {
>                 *** this is true for Ralink / Mediatek
>                 ...
>                 if (cpu_has_mips_r2_r6) {
>                         if (cpu_has_ebase_wg) {
>                                 ...
>                         } else {
>                                 *** this is true for Ralink / Mediatek
>                                 ...
>
> So in summary, ebase is not allocated but assigned to this value:
>
>         ebase = CAC_BASE + read_c0_ebase() & 0x3ffff000;
>
> which of course leads to this issues we observed.
>
> > So the ebase set by U-Boot shouldn't be relevant for Ralink platform.
>
> Why so?
>
> > More likely some code at 0x80000000 is overwritten when installing the
> > exception handlers because all Ralink SoCs except MT7621 have
> > 0xffffffff80000000 defined as load address. So adding something like
> > 0x1000 should fix your problem too.
>
> Hmmm, not sure that I fully understand this. Could you please explain
> again?

oh sorry, I misread cpu_has_mips_r2_r6 to only catch MIPS r6 CPUs, but
obviously it
applies to MIPS r2 too.

>
> > AFAIK the CPU probing should detect and set cpu_has_veic accordingly.
>
> Yes, I agree.
>
> > Maybe it's a bug by Ralink to not set this bit. I guess that's why a
> > platform could provide a cpu-feature-overrides.h. Or you could configure
> > CPU_MIPSR2_IRQ_VI as Horatio stated in his response.
>
> I just checked in decode_config3() and MIPS_CPU_VEIC is not set on
> this SoC (config3=00002420 MIPS_CONF3_VEIC=00000040).

If vectored interrupt handlers are working on Ralink platform, than maybe this
should be enabled via cpu-feature-overrides.h like the Lantiq platform is doing.
AFAIU this should increase interrupt performance.

>
> > @Paul regarding MIPS r6, is there some expectation of the bootloader to
> > set ebase to a reasonable value or to not change the value at all? Maybe
> > we need to fix U-Boot?
>
> Yes, some advise on how to fix this would be very welcome. I can easily
> add CPU_MIPSR2_IRQ_VI and send a patch for this as well.
>

I could also prepare a U-Boot patch to restore the original ebase value before
handing the control over to the OS.

-- 
- Daniel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [U-Boot] MIPS (mt7688): EBase change in U-Boot breaks Linux
  2018-12-13 13:27     ` Daniel Schwierzeck
@ 2018-12-13 13:35       ` Stefan Roese
  2018-12-13 14:23         ` Daniel Schwierzeck
  0 siblings, 1 reply; 22+ messages in thread
From: Stefan Roese @ 2018-12-13 13:35 UTC (permalink / raw)
  To: u-boot

On 13.12.18 14:27, Daniel Schwierzeck wrote:
> Am Do., 13. Dez. 2018 um 11:09 Uhr schrieb Stefan Roese <sr@denx.de>:
>>
>> Hi Daniel,
>>
>> On 13.12.18 02:00, Daniel Schwierzeck wrote:
>>> Am 12.12.18 um 09:18 schrieb Stefan Roese:
>>>> Hi!
>>>>
>>>> I've been hunting for a problem for quite some time, where Linux
>>>> hangs / crashes in userspace at some point on my MT7688 based
>>>> systems. I found that this problem can be avoided (worked around)
>>>> by not giving Linux the full memory (by using DT memory node fixup
>>>> or mem= kernel cmdline). When reducing this memory by the memory
>>>> used by U-Boot (stack pointer minus some KiB value as this is the
>>>> "lowest" memory used by U-Boot), then Linux runs just fine.
>>>>
>>>> My first idea here was, that this issue is cache related (most
>>>> likely I-cache). But all tests and debugging in this area did not
>>>> fix this issue (even running with caches disabled).
>>>>
>>>> Finally I found that this line in U-Boot makes Linux break:
>>>>
>>>> arch/mips/lib/traps.c:
>>>>
>>>> void trap_init(ulong reloc_addr)
>>>>       unsigned long ebase = gd->irq_sp;
>>>>       ...
>>>>       write_c0_ebase(ebase);
>>>>
>>>> This sets EBase to something like 0x87e9b000 on my system (128MiB).
>>>> And Linux then re-uses this value and copies the exceptions handlers
>>>> to this address, overwriting random code and leading to an unstable
>>>> system.
>>>>
>>>> So my questions now is, how should this be handled on the MT7688
>>>> platform instead? One way would be to set EBase back to the
>>>> original value (0x80000000) before booting into Linux. Another
>>>> solution would be to add some Linux code like board_ebase_setup()
>>>> to the MT7688 Linux port.
>>>>
>>>> Since I'm still no real MIPS expert yet, I would really like to get
>>>> some advise here on how to best solve this issue. Maybe I missed
>>>> something. Comments?
>>>>
>>>> Thanks,
>>>> Stefan
>>>
>>> the relevant code is in arch/mips/kernel/traps.c:trap_init():
>>>
>>> Within the branch if (cpu_has_veic || cpu_has_vint) the kernel will
>>> allocate memory for the exception vectors and resets ebase to that memory.
>>
>> This branch currently is not taken on this SoC (Mediatek / Ralink).
>>
>>> In the else branch ebase is statically assigned to CAC_BASE which should
>>> resolve to 0x80000000 on Ralink platform. The ebase is only read from
>>> CP0 for MIPS r6 CPUs.
>>
>> Without CPU_MIPSR2_IRQ_VI being set (as its currently the case), this
>> is how this function is run:
>>
>>          if (cpu_has_veic || cpu_has_vint) {
>>                  ...
>>          } else {
>>                  *** this is true for Ralink / Mediatek
>>                  ...
>>                  if (cpu_has_mips_r2_r6) {
>>                          if (cpu_has_ebase_wg) {
>>                                  ...
>>                          } else {
>>                                  *** this is true for Ralink / Mediatek
>>                                  ...
>>
>> So in summary, ebase is not allocated but assigned to this value:
>>
>>          ebase = CAC_BASE + read_c0_ebase() & 0x3ffff000;
>>
>> which of course leads to this issues we observed.
>>
>>> So the ebase set by U-Boot shouldn't be relevant for Ralink platform.
>>
>> Why so?
>>
>>> More likely some code at 0x80000000 is overwritten when installing the
>>> exception handlers because all Ralink SoCs except MT7621 have
>>> 0xffffffff80000000 defined as load address. So adding something like
>>> 0x1000 should fix your problem too.
>>
>> Hmmm, not sure that I fully understand this. Could you please explain
>> again?
> 
> oh sorry, I misread cpu_has_mips_r2_r6 to only catch MIPS r6 CPUs, but
> obviously it
> applies to MIPS r2 too.
> 
>>
>>> AFAIK the CPU probing should detect and set cpu_has_veic accordingly.
>>
>> Yes, I agree.
>>
>>> Maybe it's a bug by Ralink to not set this bit. I guess that's why a
>>> platform could provide a cpu-feature-overrides.h. Or you could configure
>>> CPU_MIPSR2_IRQ_VI as Horatio stated in his response.
>>
>> I just checked in decode_config3() and MIPS_CPU_VEIC is not set on
>> this SoC (config3=00002420 MIPS_CONF3_VEIC=00000040).
> 
> If vectored interrupt handlers are working on Ralink platform, than maybe this
> should be enabled via cpu-feature-overrides.h like the Lantiq platform is doing.
> AFAIU this should increase interrupt performance.

Sure. If that's the preferred way to do it (compared to setting
CONFIG_CPU_MIPSR2_IRQ_VI), then I'll gladly submit a patch for it.
  
>>
>>> @Paul regarding MIPS r6, is there some expectation of the bootloader to
>>> set ebase to a reasonable value or to not change the value at all? Maybe
>>> we need to fix U-Boot?
>>
>> Yes, some advise on how to fix this would be very welcome. I can easily
>> add CPU_MIPSR2_IRQ_VI and send a patch for this as well.
>>
> 
> I could also prepare a U-Boot patch to restore the original ebase value before
> handing the control over to the OS.

I'm not so sure, if overwriting 0x80000000 (default value of EBase on
this SoC) with the exception handler is allowed. Is this address "zero"
handled somewhat specific in MIPS Linux? AFAICT, the complete DDR
area on my platform (0x8000.0000 - 0x87ff.ffff) is available for Linux.
So allocating some memory for this exception handler seems the right
way to go to me.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [U-Boot] MIPS (mt7688): EBase change in U-Boot breaks Linux
  2018-12-13 13:35       ` Stefan Roese
@ 2018-12-13 14:23         ` Daniel Schwierzeck
  2018-12-13 19:47             ` [U-Boot] " Paul Burton
  2018-12-15  4:40           ` Maciej W. Rozycki
  0 siblings, 2 replies; 22+ messages in thread
From: Daniel Schwierzeck @ 2018-12-13 14:23 UTC (permalink / raw)
  To: u-boot

Am Do., 13. Dez. 2018 um 14:35 Uhr schrieb Stefan Roese <sr@denx.de>:
>
> On 13.12.18 14:27, Daniel Schwierzeck wrote:
> > Am Do., 13. Dez. 2018 um 11:09 Uhr schrieb Stefan Roese <sr@denx.de>:
> >>
> >> Hi Daniel,
> >>
> >> On 13.12.18 02:00, Daniel Schwierzeck wrote:
> >>> Am 12.12.18 um 09:18 schrieb Stefan Roese:
> >>>> Hi!
> >>>>
> >>>> I've been hunting for a problem for quite some time, where Linux
> >>>> hangs / crashes in userspace at some point on my MT7688 based
> >>>> systems. I found that this problem can be avoided (worked around)
> >>>> by not giving Linux the full memory (by using DT memory node fixup
> >>>> or mem= kernel cmdline). When reducing this memory by the memory
> >>>> used by U-Boot (stack pointer minus some KiB value as this is the
> >>>> "lowest" memory used by U-Boot), then Linux runs just fine.
> >>>>
> >>>> My first idea here was, that this issue is cache related (most
> >>>> likely I-cache). But all tests and debugging in this area did not
> >>>> fix this issue (even running with caches disabled).
> >>>>
> >>>> Finally I found that this line in U-Boot makes Linux break:
> >>>>
> >>>> arch/mips/lib/traps.c:
> >>>>
> >>>> void trap_init(ulong reloc_addr)
> >>>>       unsigned long ebase = gd->irq_sp;
> >>>>       ...
> >>>>       write_c0_ebase(ebase);
> >>>>
> >>>> This sets EBase to something like 0x87e9b000 on my system (128MiB).
> >>>> And Linux then re-uses this value and copies the exceptions handlers
> >>>> to this address, overwriting random code and leading to an unstable
> >>>> system.
> >>>>
> >>>> So my questions now is, how should this be handled on the MT7688
> >>>> platform instead? One way would be to set EBase back to the
> >>>> original value (0x80000000) before booting into Linux. Another
> >>>> solution would be to add some Linux code like board_ebase_setup()
> >>>> to the MT7688 Linux port.
> >>>>
> >>>> Since I'm still no real MIPS expert yet, I would really like to get
> >>>> some advise here on how to best solve this issue. Maybe I missed
> >>>> something. Comments?
> >>>>
> >>>> Thanks,
> >>>> Stefan
> >>>
> >>> the relevant code is in arch/mips/kernel/traps.c:trap_init():
> >>>
> >>> Within the branch if (cpu_has_veic || cpu_has_vint) the kernel will
> >>> allocate memory for the exception vectors and resets ebase to that memory.
> >>
> >> This branch currently is not taken on this SoC (Mediatek / Ralink).
> >>
> >>> In the else branch ebase is statically assigned to CAC_BASE which should
> >>> resolve to 0x80000000 on Ralink platform. The ebase is only read from
> >>> CP0 for MIPS r6 CPUs.
> >>
> >> Without CPU_MIPSR2_IRQ_VI being set (as its currently the case), this
> >> is how this function is run:
> >>
> >>          if (cpu_has_veic || cpu_has_vint) {
> >>                  ...
> >>          } else {
> >>                  *** this is true for Ralink / Mediatek
> >>                  ...
> >>                  if (cpu_has_mips_r2_r6) {
> >>                          if (cpu_has_ebase_wg) {
> >>                                  ...
> >>                          } else {
> >>                                  *** this is true for Ralink / Mediatek
> >>                                  ...
> >>
> >> So in summary, ebase is not allocated but assigned to this value:
> >>
> >>          ebase = CAC_BASE + read_c0_ebase() & 0x3ffff000;
> >>
> >> which of course leads to this issues we observed.
> >>
> >>> So the ebase set by U-Boot shouldn't be relevant for Ralink platform.
> >>
> >> Why so?
> >>
> >>> More likely some code at 0x80000000 is overwritten when installing the
> >>> exception handlers because all Ralink SoCs except MT7621 have
> >>> 0xffffffff80000000 defined as load address. So adding something like
> >>> 0x1000 should fix your problem too.
> >>
> >> Hmmm, not sure that I fully understand this. Could you please explain
> >> again?
> >
> > oh sorry, I misread cpu_has_mips_r2_r6 to only catch MIPS r6 CPUs, but
> > obviously it
> > applies to MIPS r2 too.
> >
> >>
> >>> AFAIK the CPU probing should detect and set cpu_has_veic accordingly.
> >>
> >> Yes, I agree.
> >>
> >>> Maybe it's a bug by Ralink to not set this bit. I guess that's why a
> >>> platform could provide a cpu-feature-overrides.h. Or you could configure
> >>> CPU_MIPSR2_IRQ_VI as Horatio stated in his response.
> >>
> >> I just checked in decode_config3() and MIPS_CPU_VEIC is not set on
> >> this SoC (config3=00002420 MIPS_CONF3_VEIC=00000040).
> >
> > If vectored interrupt handlers are working on Ralink platform, than maybe this
> > should be enabled via cpu-feature-overrides.h like the Lantiq platform is doing.
> > AFAIU this should increase interrupt performance.
>
> Sure. If that's the preferred way to do it (compared to setting
> CONFIG_CPU_MIPSR2_IRQ_VI), then I'll gladly submit a patch for it.
>
> >>
> >>> @Paul regarding MIPS r6, is there some expectation of the bootloader to
> >>> set ebase to a reasonable value or to not change the value at all? Maybe
> >>> we need to fix U-Boot?
> >>
> >> Yes, some advise on how to fix this would be very welcome. I can easily
> >> add CPU_MIPSR2_IRQ_VI and send a patch for this as well.
> >>
> >
> > I could also prepare a U-Boot patch to restore the original ebase value before
> > handing the control over to the OS.
>
> I'm not so sure, if overwriting 0x80000000 (default value of EBase on
> this SoC) with the exception handler is allowed. Is this address "zero"
> handled somewhat specific in MIPS Linux? AFAICT, the complete DDR
> area on my platform (0x8000.0000 - 0x87ff.ffff) is available for Linux.
> So allocating some memory for this exception handler seems the right
> way to go to me.
>

maybe that's why some platforms define a load address of 0x80002000 or similar
to protect this area somehow.

-- 
- Daniel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: MIPS (mt7688): EBase change in U-Boot breaks Linux
  2018-12-13 14:23         ` Daniel Schwierzeck
@ 2018-12-13 19:47             ` Paul Burton
  2018-12-15  4:40           ` Maciej W. Rozycki
  1 sibling, 0 replies; 22+ messages in thread
From: Paul Burton @ 2018-12-13 19:47 UTC (permalink / raw)
  To: Daniel Schwierzeck, Stefan Roese
  Cc: U-Boot Mailing List, Horatiu Vultur, Linux-MIPS, linux-mips

Hello,

On Thu, Dec 13, 2018 at 03:23:39PM +0100, Daniel Schwierzeck wrote:
> > >>>> Finally I found that this line in U-Boot makes Linux break:
> > >>>>
> > >>>> arch/mips/lib/traps.c:
> > >>>>
> > >>>> void trap_init(ulong reloc_addr)
> > >>>>       unsigned long ebase = gd->irq_sp;
> > >>>>       ...
> > >>>>       write_c0_ebase(ebase);
> > >>>>
> > >>>> This sets EBase to something like 0x87e9b000 on my system (128MiB).
> > >>>> And Linux then re-uses this value and copies the exceptions handlers
> > >>>> to this address, overwriting random code and leading to an unstable
> > >>>> system.
> > >>>>
> > >>>> So my questions now is, how should this be handled on the MT7688
> > >>>> platform instead? One way would be to set EBase back to the
> > >>>> original value (0x80000000) before booting into Linux. Another
> > >>>> solution would be to add some Linux code like board_ebase_setup()
> > >>>> to the MT7688 Linux port.
>%
> > > I could also prepare a U-Boot patch to restore the original ebase value before
> > > handing the control over to the OS.
> >
> > I'm not so sure, if overwriting 0x80000000 (default value of EBase on
> > this SoC) with the exception handler is allowed. Is this address "zero"
> > handled somewhat specific in MIPS Linux? AFAICT, the complete DDR
> > area on my platform (0x8000.0000 - 0x87ff.ffff) is available for Linux.
> > So allocating some memory for this exception handler seems the right
> > way to go to me.
> 
> maybe that's why some platforms define a load address of 0x80002000 or similar
> to protect this area somehow.

Does this Linux patch help by any chance?

https://git.linux-mips.org/cgit/linux-mti.git/commit/?h=eng-v4.20&id=39e4d339a4540b66e9d9a8ea0da9ee41a21473b4

I'm not sure I remember why I didn't get that upstreamed yet, I probably
wanted to research what other systems were doing... Speaking for Malta,
the kernel's board support has reserved the start of kseg0 for longer
than I've been involved.

An alternative would be for Linux to allocate a page for use with the
exception vectors using memblock, and ignore the EBase value U-Boot left
us with. But just marking the area U-Boot used as reserved ought to do
the trick, and has the advantage of ensuring U-Boot's vectors don't get
overwritten before Linux sets up its own which sometimes allows U-Boot
to provide some useful output.

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [U-Boot] MIPS (mt7688): EBase change in U-Boot breaks Linux
@ 2018-12-13 19:47             ` Paul Burton
  0 siblings, 0 replies; 22+ messages in thread
From: Paul Burton @ 2018-12-13 19:47 UTC (permalink / raw)
  To: u-boot

Hello,

On Thu, Dec 13, 2018 at 03:23:39PM +0100, Daniel Schwierzeck wrote:
> > >>>> Finally I found that this line in U-Boot makes Linux break:
> > >>>>
> > >>>> arch/mips/lib/traps.c:
> > >>>>
> > >>>> void trap_init(ulong reloc_addr)
> > >>>>       unsigned long ebase = gd->irq_sp;
> > >>>>       ...
> > >>>>       write_c0_ebase(ebase);
> > >>>>
> > >>>> This sets EBase to something like 0x87e9b000 on my system (128MiB).
> > >>>> And Linux then re-uses this value and copies the exceptions handlers
> > >>>> to this address, overwriting random code and leading to an unstable
> > >>>> system.
> > >>>>
> > >>>> So my questions now is, how should this be handled on the MT7688
> > >>>> platform instead? One way would be to set EBase back to the
> > >>>> original value (0x80000000) before booting into Linux. Another
> > >>>> solution would be to add some Linux code like board_ebase_setup()
> > >>>> to the MT7688 Linux port.
>%
> > > I could also prepare a U-Boot patch to restore the original ebase value before
> > > handing the control over to the OS.
> >
> > I'm not so sure, if overwriting 0x80000000 (default value of EBase on
> > this SoC) with the exception handler is allowed. Is this address "zero"
> > handled somewhat specific in MIPS Linux? AFAICT, the complete DDR
> > area on my platform (0x8000.0000 - 0x87ff.ffff) is available for Linux.
> > So allocating some memory for this exception handler seems the right
> > way to go to me.
> 
> maybe that's why some platforms define a load address of 0x80002000 or similar
> to protect this area somehow.

Does this Linux patch help by any chance?

https://git.linux-mips.org/cgit/linux-mti.git/commit/?h=eng-v4.20&id=39e4d339a4540b66e9d9a8ea0da9ee41a21473b4

I'm not sure I remember why I didn't get that upstreamed yet, I probably
wanted to research what other systems were doing... Speaking for Malta,
the kernel's board support has reserved the start of kseg0 for longer
than I've been involved.

An alternative would be for Linux to allocate a page for use with the
exception vectors using memblock, and ignore the EBase value U-Boot left
us with. But just marking the area U-Boot used as reserved ought to do
the trick, and has the advantage of ensuring U-Boot's vectors don't get
overwritten before Linux sets up its own which sometimes allows U-Boot
to provide some useful output.

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: MIPS (mt7688): EBase change in U-Boot breaks Linux
  2018-12-13 19:47             ` [U-Boot] " Paul Burton
@ 2018-12-14  6:56               ` Stefan Roese
  -1 siblings, 0 replies; 22+ messages in thread
From: Stefan Roese @ 2018-12-14  6:56 UTC (permalink / raw)
  To: Paul Burton, Daniel Schwierzeck
  Cc: U-Boot Mailing List, Horatiu Vultur, Linux-MIPS, linux-mips

Hi Paul,

On 13.12.18 20:47, Paul Burton wrote:
> On Thu, Dec 13, 2018 at 03:23:39PM +0100, Daniel Schwierzeck wrote:
>>>>>>> Finally I found that this line in U-Boot makes Linux break:
>>>>>>>
>>>>>>> arch/mips/lib/traps.c:
>>>>>>>
>>>>>>> void trap_init(ulong reloc_addr)
>>>>>>>        unsigned long ebase = gd->irq_sp;
>>>>>>>        ...
>>>>>>>        write_c0_ebase(ebase);
>>>>>>>
>>>>>>> This sets EBase to something like 0x87e9b000 on my system (128MiB).
>>>>>>> And Linux then re-uses this value and copies the exceptions handlers
>>>>>>> to this address, overwriting random code and leading to an unstable
>>>>>>> system.
>>>>>>>
>>>>>>> So my questions now is, how should this be handled on the MT7688
>>>>>>> platform instead? One way would be to set EBase back to the
>>>>>>> original value (0x80000000) before booting into Linux. Another
>>>>>>> solution would be to add some Linux code like board_ebase_setup()
>>>>>>> to the MT7688 Linux port.
>> %
>>>> I could also prepare a U-Boot patch to restore the original ebase value before
>>>> handing the control over to the OS.
>>>
>>> I'm not so sure, if overwriting 0x80000000 (default value of EBase on
>>> this SoC) with the exception handler is allowed. Is this address "zero"
>>> handled somewhat specific in MIPS Linux? AFAICT, the complete DDR
>>> area on my platform (0x8000.0000 - 0x87ff.ffff) is available for Linux.
>>> So allocating some memory for this exception handler seems the right
>>> way to go to me.
>>
>> maybe that's why some platforms define a load address of 0x80002000 or similar
>> to protect this area somehow.
> 
> Does this Linux patch help by any chance?
> 
> https://git.linux-mips.org/cgit/linux-mti.git/commit/?h=eng-v4.20&id=39e4d339a4540b66e9d9a8ea0da9ee41a21473b4
> 
> I'm not sure I remember why I didn't get that upstreamed yet, I probably
> wanted to research what other systems were doing... Speaking for Malta,
> the kernel's board support has reserved the start of kseg0 for longer
> than I've been involved.

No, this patch does not solve this issue (bootup still hangs or crashes
while mounting the rootfs). I can only assume that its too late to try
to reserve this memory region as the memblock_reserve() call returns 0
(no error).
  
> An alternative would be for Linux to allocate a page for use with the
> exception vectors using memblock, and ignore the EBase value U-Boot left
> us with. But just marking the area U-Boot used as reserved ought to do
> the trick, and has the advantage of ensuring U-Boot's vectors don't get
> overwritten before Linux sets up its own which sometimes allows U-Boot
> to provide some useful output.

I agree that re-using the U-Boot value would be optimal for boot-time
error printing. But this does not seem to work on our platform AFAICT.
So how to proceed? Should I enable CONFIG_CPU_MIPSR2_IRQ_VI or #define
"cpu_has_veic" to 1 as Lantiq does?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [U-Boot] MIPS (mt7688): EBase change in U-Boot breaks Linux
@ 2018-12-14  6:56               ` Stefan Roese
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Roese @ 2018-12-14  6:56 UTC (permalink / raw)
  To: u-boot

Hi Paul,

On 13.12.18 20:47, Paul Burton wrote:
> On Thu, Dec 13, 2018 at 03:23:39PM +0100, Daniel Schwierzeck wrote:
>>>>>>> Finally I found that this line in U-Boot makes Linux break:
>>>>>>>
>>>>>>> arch/mips/lib/traps.c:
>>>>>>>
>>>>>>> void trap_init(ulong reloc_addr)
>>>>>>>        unsigned long ebase = gd->irq_sp;
>>>>>>>        ...
>>>>>>>        write_c0_ebase(ebase);
>>>>>>>
>>>>>>> This sets EBase to something like 0x87e9b000 on my system (128MiB).
>>>>>>> And Linux then re-uses this value and copies the exceptions handlers
>>>>>>> to this address, overwriting random code and leading to an unstable
>>>>>>> system.
>>>>>>>
>>>>>>> So my questions now is, how should this be handled on the MT7688
>>>>>>> platform instead? One way would be to set EBase back to the
>>>>>>> original value (0x80000000) before booting into Linux. Another
>>>>>>> solution would be to add some Linux code like board_ebase_setup()
>>>>>>> to the MT7688 Linux port.
>> %
>>>> I could also prepare a U-Boot patch to restore the original ebase value before
>>>> handing the control over to the OS.
>>>
>>> I'm not so sure, if overwriting 0x80000000 (default value of EBase on
>>> this SoC) with the exception handler is allowed. Is this address "zero"
>>> handled somewhat specific in MIPS Linux? AFAICT, the complete DDR
>>> area on my platform (0x8000.0000 - 0x87ff.ffff) is available for Linux.
>>> So allocating some memory for this exception handler seems the right
>>> way to go to me.
>>
>> maybe that's why some platforms define a load address of 0x80002000 or similar
>> to protect this area somehow.
> 
> Does this Linux patch help by any chance?
> 
> https://git.linux-mips.org/cgit/linux-mti.git/commit/?h=eng-v4.20&id=39e4d339a4540b66e9d9a8ea0da9ee41a21473b4
> 
> I'm not sure I remember why I didn't get that upstreamed yet, I probably
> wanted to research what other systems were doing... Speaking for Malta,
> the kernel's board support has reserved the start of kseg0 for longer
> than I've been involved.

No, this patch does not solve this issue (bootup still hangs or crashes
while mounting the rootfs). I can only assume that its too late to try
to reserve this memory region as the memblock_reserve() call returns 0
(no error).
  
> An alternative would be for Linux to allocate a page for use with the
> exception vectors using memblock, and ignore the EBase value U-Boot left
> us with. But just marking the area U-Boot used as reserved ought to do
> the trick, and has the advantage of ensuring U-Boot's vectors don't get
> overwritten before Linux sets up its own which sometimes allows U-Boot
> to provide some useful output.

I agree that re-using the U-Boot value would be optimal for boot-time
error printing. But this does not seem to work on our platform AFAICT.
So how to proceed? Should I enable CONFIG_CPU_MIPSR2_IRQ_VI or #define
"cpu_has_veic" to 1 as Lantiq does?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: MIPS (mt7688): EBase change in U-Boot breaks Linux
  2018-12-14  6:56               ` [U-Boot] " Stefan Roese
@ 2018-12-14 21:28                 ` Paul Burton
  -1 siblings, 0 replies; 22+ messages in thread
From: Paul Burton @ 2018-12-14 21:28 UTC (permalink / raw)
  To: Stefan Roese
  Cc: Daniel Schwierzeck, U-Boot Mailing List, Horatiu Vultur,
	Linux-MIPS, linux-mips

Hi Stefan,

On Fri, Dec 14, 2018 at 07:56:59AM +0100, Stefan Roese wrote:
> > Does this Linux patch help by any chance?
> > 
> > https://git.linux-mips.org/cgit/linux-mti.git/commit/?h=eng-v4.20&id=39e4d339a4540b66e9d9a8ea0da9ee41a21473b4
> > 
> > I'm not sure I remember why I didn't get that upstreamed yet, I probably
> > wanted to research what other systems were doing... Speaking for Malta,
> > the kernel's board support has reserved the start of kseg0 for longer
> > than I've been involved.
> 
> No, this patch does not solve this issue (bootup still hangs or crashes
> while mounting the rootfs). I can only assume that its too late to try
> to reserve this memory region as the memblock_reserve() call returns 0
> (no error).

Hmm, OK. Do you know what is getting overwritten? Is it part of the
kernel binary itself?

> > An alternative would be for Linux to allocate a page for use with the
> > exception vectors using memblock, and ignore the EBase value U-Boot left
> > us with. But just marking the area U-Boot used as reserved ought to do
> > the trick, and has the advantage of ensuring U-Boot's vectors don't get
> > overwritten before Linux sets up its own which sometimes allows U-Boot
> > to provide some useful output.
> 
> I agree that re-using the U-Boot value would be optimal for boot-time
> error printing. But this does not seem to work on our platform AFAICT.
> So how to proceed? Should I enable CONFIG_CPU_MIPSR2_IRQ_VI or #define
> "cpu_has_veic" to 1 as Lantiq does?

I think the answer to the question above will be helpful - if it's the
kernel binary itself getting overwritten then we have 2 options:

  1) Move the kernel, ie. change load-y in arch/mips/ralink/Platform.

  2) Have Linux recognize that the address in EBase is unsuitable &
     allocate a new page.

Or perhaps even both - having Linux recognize & avoid the problem seems
good for robustness, but if the kernel binary is overwriting the
exception vectors it might be useful to move the kernel anyway so that
we don't prevent U-Boot's vectors from working in between loading the
kernel & booting it.

If it's not the kernel binary overwriting the vectors & then being
overwritten, then I'd be interested in knowing what is in that memory.
We shouldn't have allocated much of anything this early, but a possible
fix might be to reserve the page EBase resides in from bootmem_init().

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [U-Boot] MIPS (mt7688): EBase change in U-Boot breaks Linux
@ 2018-12-14 21:28                 ` Paul Burton
  0 siblings, 0 replies; 22+ messages in thread
From: Paul Burton @ 2018-12-14 21:28 UTC (permalink / raw)
  To: u-boot

Hi Stefan,

On Fri, Dec 14, 2018 at 07:56:59AM +0100, Stefan Roese wrote:
> > Does this Linux patch help by any chance?
> > 
> > https://git.linux-mips.org/cgit/linux-mti.git/commit/?h=eng-v4.20&id=39e4d339a4540b66e9d9a8ea0da9ee41a21473b4
> > 
> > I'm not sure I remember why I didn't get that upstreamed yet, I probably
> > wanted to research what other systems were doing... Speaking for Malta,
> > the kernel's board support has reserved the start of kseg0 for longer
> > than I've been involved.
> 
> No, this patch does not solve this issue (bootup still hangs or crashes
> while mounting the rootfs). I can only assume that its too late to try
> to reserve this memory region as the memblock_reserve() call returns 0
> (no error).

Hmm, OK. Do you know what is getting overwritten? Is it part of the
kernel binary itself?

> > An alternative would be for Linux to allocate a page for use with the
> > exception vectors using memblock, and ignore the EBase value U-Boot left
> > us with. But just marking the area U-Boot used as reserved ought to do
> > the trick, and has the advantage of ensuring U-Boot's vectors don't get
> > overwritten before Linux sets up its own which sometimes allows U-Boot
> > to provide some useful output.
> 
> I agree that re-using the U-Boot value would be optimal for boot-time
> error printing. But this does not seem to work on our platform AFAICT.
> So how to proceed? Should I enable CONFIG_CPU_MIPSR2_IRQ_VI or #define
> "cpu_has_veic" to 1 as Lantiq does?

I think the answer to the question above will be helpful - if it's the
kernel binary itself getting overwritten then we have 2 options:

  1) Move the kernel, ie. change load-y in arch/mips/ralink/Platform.

  2) Have Linux recognize that the address in EBase is unsuitable &
     allocate a new page.

Or perhaps even both - having Linux recognize & avoid the problem seems
good for robustness, but if the kernel binary is overwriting the
exception vectors it might be useful to move the kernel anyway so that
we don't prevent U-Boot's vectors from working in between loading the
kernel & booting it.

If it's not the kernel binary overwriting the vectors & then being
overwritten, then I'd be interested in knowing what is in that memory.
We shouldn't have allocated much of anything this early, but a possible
fix might be to reserve the page EBase resides in from bootmem_init().

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: MIPS (mt7688): EBase change in U-Boot breaks Linux
  2018-12-14  6:56               ` [U-Boot] " Stefan Roese
@ 2018-12-14 21:31                 ` Paul Burton
  -1 siblings, 0 replies; 22+ messages in thread
From: Paul Burton @ 2018-12-14 21:31 UTC (permalink / raw)
  To: Stefan Roese
  Cc: Daniel Schwierzeck, U-Boot Mailing List, Horatiu Vultur,
	Linux-MIPS, linux-mips

Hi Stefan,

On Fri, Dec 14, 2018 at 07:56:59AM +0100, Stefan Roese wrote:
> So how to proceed? Should I enable CONFIG_CPU_MIPSR2_IRQ_VI or #define
> "cpu_has_veic" to 1 as Lantiq does?

...and on that point in particular, it really depends on your hardware.

You shouldn't need to do either of those things just to avoid this bug,
but if your hardware actually supports VI or EIC then it may be
beneficial to enable them.

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [U-Boot] MIPS (mt7688): EBase change in U-Boot breaks Linux
@ 2018-12-14 21:31                 ` Paul Burton
  0 siblings, 0 replies; 22+ messages in thread
From: Paul Burton @ 2018-12-14 21:31 UTC (permalink / raw)
  To: u-boot

Hi Stefan,

On Fri, Dec 14, 2018 at 07:56:59AM +0100, Stefan Roese wrote:
> So how to proceed? Should I enable CONFIG_CPU_MIPSR2_IRQ_VI or #define
> "cpu_has_veic" to 1 as Lantiq does?

...and on that point in particular, it really depends on your hardware.

You shouldn't need to do either of those things just to avoid this bug,
but if your hardware actually supports VI or EIC then it may be
beneficial to enable them.

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [U-Boot] MIPS (mt7688): EBase change in U-Boot breaks Linux
  2018-12-13 14:23         ` Daniel Schwierzeck
  2018-12-13 19:47             ` [U-Boot] " Paul Burton
@ 2018-12-15  4:40           ` Maciej W. Rozycki
  1 sibling, 0 replies; 22+ messages in thread
From: Maciej W. Rozycki @ 2018-12-15  4:40 UTC (permalink / raw)
  To: u-boot

On Thu, 13 Dec 2018, Daniel Schwierzeck wrote:

> > I'm not so sure, if overwriting 0x80000000 (default value of EBase on
> > this SoC) with the exception handler is allowed. Is this address "zero"
> > handled somewhat specific in MIPS Linux? AFAICT, the complete DDR
> > area on my platform (0x8000.0000 - 0x87ff.ffff) is available for Linux.
> > So allocating some memory for this exception handler seems the right
> > way to go to me.
> >
> 
> maybe that's why some platforms define a load address of 0x80002000 or similar
> to protect this area somehow.

 It is.  MIPS processors before r2 (i.e. r1 and all the legacy ones) did 
not have the CP0.EBase register and the (non-CP0.Status.BEV) exception 
vector base was hardwired to 0x80000000 or 0xffffffff80000000 for 32-bit 
and 64-bit implementations respectively.  Then bootstrap/console monitor 
firmware typically used some RAM right above the exception handler area 
for its own purposes.  Consequently the load address of any executable to 
be run by such firmware had to be set such as to avoid clobbering these 
areas.

 HTH,

  Maciej

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: MIPS (mt7688): EBase change in U-Boot breaks Linux
  2018-12-14 21:28                 ` [U-Boot] " Paul Burton
@ 2018-12-17  8:55                   ` Stefan Roese
  -1 siblings, 0 replies; 22+ messages in thread
From: Stefan Roese @ 2018-12-17  8:55 UTC (permalink / raw)
  To: Paul Burton
  Cc: Daniel Schwierzeck, U-Boot Mailing List, Horatiu Vultur,
	Linux-MIPS, linux-mips

Hi Paul.

On 14.12.18 22:28, Paul Burton wrote:
> On Fri, Dec 14, 2018 at 07:56:59AM +0100, Stefan Roese wrote:
>>> Does this Linux patch help by any chance?
>>>
>>> https://git.linux-mips.org/cgit/linux-mti.git/commit/?h=eng-v4.20&id=39e4d339a4540b66e9d9a8ea0da9ee41a21473b4
>>>
>>> I'm not sure I remember why I didn't get that upstreamed yet, I probably
>>> wanted to research what other systems were doing... Speaking for Malta,
>>> the kernel's board support has reserved the start of kseg0 for longer
>>> than I've been involved.
>>
>> No, this patch does not solve this issue (bootup still hangs or crashes
>> while mounting the rootfs). I can only assume that its too late to try
>> to reserve this memory region as the memblock_reserve() call returns 0
>> (no error).
> 
> Hmm, OK. Do you know what is getting overwritten? Is it part of the
> kernel binary itself?

Okay, I did a bit more research and debugging here. MIPS sets
CONFIG_ARCH_DISCARD_MEMBLOCK in general, which results in free'ing
the reserved memory region(s) via memblock_discard() at a later boot
stage.

I also changed arch/mips/Kconfig so that ARCH_DISCARD_MEMBLOCK is
not defined, but this did not solve the issue either. I'm not sure
why ARCH_DISCARD_MEMBLOCK is defined for MIPS. Here a log from
the system running with ARCH_DISCARD_MEMBLOCK disabled and EBase
set to some area where booting does work (for test purpose only):

root@mt7688:~# cat /sys/kernel/debug/memblock/
memory    reserved
root@mt7688:~# cat /sys/kernel/debug/memblock/memory
    0: 0x00000000..0x07ffffff
root@mt7688:~# cat /sys/kernel/debug/memblock/reserved
    0: 0x02220000..0x02220fff

memblock really only seems to be suitable for early memory handling.
Reserving memory for the complete OS lifetime does not work (AFAICT).
Perhaps moving to CMA would help here.

So back to your question: It's not kernel memory that is overwritten
at e.g. 0x06f5f000 (128MiB memory) but its some userspace memory allocated
dynamically when starting into the mount process of the rootfs. This
memory region is *not* revered at that stage any more. memblock does not
seem to be the correct way to reserve areas here.
  
>>> An alternative would be for Linux to allocate a page for use with the
>>> exception vectors using memblock, and ignore the EBase value U-Boot left
>>> us with. But just marking the area U-Boot used as reserved ought to do
>>> the trick, and has the advantage of ensuring U-Boot's vectors don't get
>>> overwritten before Linux sets up its own which sometimes allows U-Boot
>>> to provide some useful output.
>>
>> I agree that re-using the U-Boot value would be optimal for boot-time
>> error printing. But this does not seem to work on our platform AFAICT.
>> So how to proceed? Should I enable CONFIG_CPU_MIPSR2_IRQ_VI or #define
>> "cpu_has_veic" to 1 as Lantiq does?
> 
> I think the answer to the question above will be helpful - if it's the
> kernel binary itself getting overwritten then we have 2 options:
> 
>    1) Move the kernel, ie. change load-y in arch/mips/ralink/Platform.
> 
>    2) Have Linux recognize that the address in EBase is unsuitable &
>       allocate a new page.
> 
> Or perhaps even both - having Linux recognize & avoid the problem seems
> good for robustness, but if the kernel binary is overwriting the
> exception vectors it might be useful to move the kernel anyway so that
> we don't prevent U-Boot's vectors from working in between loading the
> kernel & booting it.
> 
> If it's not the kernel binary overwriting the vectors & then being
> overwritten, then I'd be interested in knowing what is in that memory.
> We shouldn't have allocated much of anything this early, but a possible
> fix might be to reserve the page EBase resides in from bootmem_init().

That does not help (see comments about memblock usage above). I could
add a check, if EBase resides in the system memory and if this is the
case, allocate a page and move EBase to this new location.

What do you think? Did I misinterpret this memblock usage on MIPS? Do
you have other ideas on how to solve this issue?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [U-Boot] MIPS (mt7688): EBase change in U-Boot breaks Linux
@ 2018-12-17  8:55                   ` Stefan Roese
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Roese @ 2018-12-17  8:55 UTC (permalink / raw)
  To: u-boot

Hi Paul.

On 14.12.18 22:28, Paul Burton wrote:
> On Fri, Dec 14, 2018 at 07:56:59AM +0100, Stefan Roese wrote:
>>> Does this Linux patch help by any chance?
>>>
>>> https://git.linux-mips.org/cgit/linux-mti.git/commit/?h=eng-v4.20&id=39e4d339a4540b66e9d9a8ea0da9ee41a21473b4
>>>
>>> I'm not sure I remember why I didn't get that upstreamed yet, I probably
>>> wanted to research what other systems were doing... Speaking for Malta,
>>> the kernel's board support has reserved the start of kseg0 for longer
>>> than I've been involved.
>>
>> No, this patch does not solve this issue (bootup still hangs or crashes
>> while mounting the rootfs). I can only assume that its too late to try
>> to reserve this memory region as the memblock_reserve() call returns 0
>> (no error).
> 
> Hmm, OK. Do you know what is getting overwritten? Is it part of the
> kernel binary itself?

Okay, I did a bit more research and debugging here. MIPS sets
CONFIG_ARCH_DISCARD_MEMBLOCK in general, which results in free'ing
the reserved memory region(s) via memblock_discard() at a later boot
stage.

I also changed arch/mips/Kconfig so that ARCH_DISCARD_MEMBLOCK is
not defined, but this did not solve the issue either. I'm not sure
why ARCH_DISCARD_MEMBLOCK is defined for MIPS. Here a log from
the system running with ARCH_DISCARD_MEMBLOCK disabled and EBase
set to some area where booting does work (for test purpose only):

root at mt7688:~# cat /sys/kernel/debug/memblock/
memory    reserved
root at mt7688:~# cat /sys/kernel/debug/memblock/memory
    0: 0x00000000..0x07ffffff
root at mt7688:~# cat /sys/kernel/debug/memblock/reserved
    0: 0x02220000..0x02220fff

memblock really only seems to be suitable for early memory handling.
Reserving memory for the complete OS lifetime does not work (AFAICT).
Perhaps moving to CMA would help here.

So back to your question: It's not kernel memory that is overwritten
at e.g. 0x06f5f000 (128MiB memory) but its some userspace memory allocated
dynamically when starting into the mount process of the rootfs. This
memory region is *not* revered at that stage any more. memblock does not
seem to be the correct way to reserve areas here.
  
>>> An alternative would be for Linux to allocate a page for use with the
>>> exception vectors using memblock, and ignore the EBase value U-Boot left
>>> us with. But just marking the area U-Boot used as reserved ought to do
>>> the trick, and has the advantage of ensuring U-Boot's vectors don't get
>>> overwritten before Linux sets up its own which sometimes allows U-Boot
>>> to provide some useful output.
>>
>> I agree that re-using the U-Boot value would be optimal for boot-time
>> error printing. But this does not seem to work on our platform AFAICT.
>> So how to proceed? Should I enable CONFIG_CPU_MIPSR2_IRQ_VI or #define
>> "cpu_has_veic" to 1 as Lantiq does?
> 
> I think the answer to the question above will be helpful - if it's the
> kernel binary itself getting overwritten then we have 2 options:
> 
>    1) Move the kernel, ie. change load-y in arch/mips/ralink/Platform.
> 
>    2) Have Linux recognize that the address in EBase is unsuitable &
>       allocate a new page.
> 
> Or perhaps even both - having Linux recognize & avoid the problem seems
> good for robustness, but if the kernel binary is overwriting the
> exception vectors it might be useful to move the kernel anyway so that
> we don't prevent U-Boot's vectors from working in between loading the
> kernel & booting it.
> 
> If it's not the kernel binary overwriting the vectors & then being
> overwritten, then I'd be interested in knowing what is in that memory.
> We shouldn't have allocated much of anything this early, but a possible
> fix might be to reserve the page EBase resides in from bootmem_init().

That does not help (see comments about memblock usage above). I could
add a check, if EBase resides in the system memory and if this is the
case, allocate a page and move EBase to this new location.

What do you think? Did I misinterpret this memblock usage on MIPS? Do
you have other ideas on how to solve this issue?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: MIPS (mt7688): EBase change in U-Boot breaks Linux
  2018-12-14 21:31                 ` [U-Boot] " Paul Burton
@ 2018-12-17  9:13                   ` Stefan Roese
  -1 siblings, 0 replies; 22+ messages in thread
From: Stefan Roese @ 2018-12-17  9:13 UTC (permalink / raw)
  To: Paul Burton
  Cc: Daniel Schwierzeck, U-Boot Mailing List, Horatiu Vultur,
	Linux-MIPS, linux-mips

Hi Paul,

On 14.12.18 22:31, Paul Burton wrote:
> On Fri, Dec 14, 2018 at 07:56:59AM +0100, Stefan Roese wrote:
>> So how to proceed? Should I enable CONFIG_CPU_MIPSR2_IRQ_VI or #define
>> "cpu_has_veic" to 1 as Lantiq does?
> 
> ...and on that point in particular, it really depends on your hardware.
> 
> You shouldn't need to do either of those things just to avoid this bug,
> but if your hardware actually supports VI or EIC then it may be
> beneficial to enable them.

Checking again, the MT7688 supports VI. config3=00002420, so VInt (Bit 5)
is set. But without CONFIG_CPU_MIPSR2_IRQ_VI being set, cpu_has_vint will
stay set to zero. So it seems that I need set CONFIG_CPU_MIPSR2_IRQ_VI
at least for this SoC (CONFIG_SOC_MT7620) if not even for all Ralink
based SoC's.

If nobody objects, I'll submit a patch enabling CONFIG_CPU_MIPSR2_IRQ_VI
for CONFIG_SOC_MT7620.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [U-Boot] MIPS (mt7688): EBase change in U-Boot breaks Linux
@ 2018-12-17  9:13                   ` Stefan Roese
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Roese @ 2018-12-17  9:13 UTC (permalink / raw)
  To: u-boot

Hi Paul,

On 14.12.18 22:31, Paul Burton wrote:
> On Fri, Dec 14, 2018 at 07:56:59AM +0100, Stefan Roese wrote:
>> So how to proceed? Should I enable CONFIG_CPU_MIPSR2_IRQ_VI or #define
>> "cpu_has_veic" to 1 as Lantiq does?
> 
> ...and on that point in particular, it really depends on your hardware.
> 
> You shouldn't need to do either of those things just to avoid this bug,
> but if your hardware actually supports VI or EIC then it may be
> beneficial to enable them.

Checking again, the MT7688 supports VI. config3=00002420, so VInt (Bit 5)
is set. But without CONFIG_CPU_MIPSR2_IRQ_VI being set, cpu_has_vint will
stay set to zero. So it seems that I need set CONFIG_CPU_MIPSR2_IRQ_VI
at least for this SoC (CONFIG_SOC_MT7620) if not even for all Ralink
based SoC's.

If nobody objects, I'll submit a patch enabling CONFIG_CPU_MIPSR2_IRQ_VI
for CONFIG_SOC_MT7620.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-12-17  9:13 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-12  8:18 [U-Boot] MIPS (mt7688): EBase change in U-Boot breaks Linux Stefan Roese
2018-12-12 11:21 ` Horatiu Vultur
2018-12-12 11:41   ` Stefan Roese
2018-12-13  9:42     ` Horatiu Vultur
2018-12-13  1:00 ` Daniel Schwierzeck
2018-12-13 10:09   ` Stefan Roese
2018-12-13 13:27     ` Daniel Schwierzeck
2018-12-13 13:35       ` Stefan Roese
2018-12-13 14:23         ` Daniel Schwierzeck
2018-12-13 19:47           ` Paul Burton
2018-12-13 19:47             ` [U-Boot] " Paul Burton
2018-12-14  6:56             ` Stefan Roese
2018-12-14  6:56               ` [U-Boot] " Stefan Roese
2018-12-14 21:28               ` Paul Burton
2018-12-14 21:28                 ` [U-Boot] " Paul Burton
2018-12-17  8:55                 ` Stefan Roese
2018-12-17  8:55                   ` [U-Boot] " Stefan Roese
2018-12-14 21:31               ` Paul Burton
2018-12-14 21:31                 ` [U-Boot] " Paul Burton
2018-12-17  9:13                 ` Stefan Roese
2018-12-17  9:13                   ` [U-Boot] " Stefan Roese
2018-12-15  4:40           ` Maciej W. Rozycki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.