All of lore.kernel.org
 help / color / mirror / Atom feed
* arm64: kdump broken on a large CPU system
@ 2018-12-10 22:30 Qian Cai
  2018-12-11 10:09 ` Marc Zyngier
  0 siblings, 1 reply; 41+ messages in thread
From: Qian Cai @ 2018-12-10 22:30 UTC (permalink / raw)
  To: Marc Zyngier, Ard Biesheuvel, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel

On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash dump just
hung (4.20-rc6 as well as 4.18). It was confirmed that the executing went as far
as entering __cpu_soft_restart(),

__crash_kexec
  machine_kexec
    cpu_soft_restart
      restart
        __cpu_soft_restart

The earlycon was enabled but had no output from the 2nd kernel, so it was pretty
much stuck in all those assembly code in arm64/kernel/head.S or the early part
of start_kernel() before earlycon was initialized.

It turned out this has something to do with nr_cpus in the 1st kernel, although
the 2nd kernel always has nr_cpus=1 [1]. It was tested with both
crashkernel=512M or 768M.

nr_cpus <= 96  GOOD (2nd kernel was up in 2-3 mins.)
nr_cpus=256    BAD  (2nd kernel was NOT up after 1 hour.)
nr_cpus=127    BAD  (2nd kernel was NOT up after 10 mins.)

I did also test with and without CONFIG_ARM64_VHE (i.e., el2_switch) made no
difference.

[1] KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1 swiotlb=noforce reset_devices"

I am still figuring out a way to debug those assembly code to where it actually
hung, but the server was hooked up with a conserver that was not able to
generate any sysrq and I have no shell access to the conserver, so seems a bit
difficult to use kgdb or kdb in this case.

CPU information,

# lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              256
On-line CPU(s) list: 0-255
Thread(s) per core:  4
Core(s) per socket:  32
Socket(s):           2
NUMA node(s):        2
Vendor ID:           Cavium
Model:               1
Model name:          ThunderX2 99xx
Stepping:            0x1
BogoMIPS:            400.00
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            32768K
NUMA node0 CPU(s):   0-127
NUMA node1 CPU(s):   128-255
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid
asimdrdm

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: arm64: kdump broken on a large CPU system
  2018-12-10 22:30 arm64: kdump broken on a large CPU system Qian Cai
@ 2018-12-11 10:09 ` Marc Zyngier
  2018-12-11 11:34   ` James Morse
  0 siblings, 1 reply; 41+ messages in thread
From: Marc Zyngier @ 2018-12-11 10:09 UTC (permalink / raw)
  To: Qian Cai, Ard Biesheuvel, Catalin Marinas, Will Deacon
  Cc: AKASHI, Takahiro, James Morse, linux-arm-kernel

[+ James and Takahiro]

Hi Qian,

On 10/12/2018 22:30, Qian Cai wrote:
> On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash dump just
> hung (4.20-rc6 as well as 4.18). It was confirmed that the executing went as far
> as entering __cpu_soft_restart(),

You can forget about 4.18 altogether, it will never correctly kexec.
I've used 4.20 + kexec on a TX2 system though, and although it takes
absolutely ages, it reliably works.

> 
> __crash_kexec
>   machine_kexec
>     cpu_soft_restart
>       restart
>         __cpu_soft_restart
> 
> The earlycon was enabled but had no output from the 2nd kernel, so it was pretty
> much stuck in all those assembly code in arm64/kernel/head.S or the early part
> of start_kernel() before earlycon was initialized.

Could it instead be in the purgatory code provided by userspace?

> 
> It turned out this has something to do with nr_cpus in the 1st kernel, although
> the 2nd kernel always has nr_cpus=1 [1]. It was tested with both
> crashkernel=512M or 768M.

James was saying something about a timeout, which may or may not be long
enough.

> 
> nr_cpus <= 96  GOOD (2nd kernel was up in 2-3 mins.)
> nr_cpus=256    BAD  (2nd kernel was NOT up after 1 hour.)
> nr_cpus=127    BAD  (2nd kernel was NOT up after 10 mins.)
> 
> I did also test with and without CONFIG_ARM64_VHE (i.e., el2_switch) made no
> difference.
> 
> [1] KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1 swiotlb=noforce reset_devices"
> 
> I am still figuring out a way to debug those assembly code to where it actually
> hung, but the server was hooked up with a conserver that was not able to
> generate any sysrq and I have no shell access to the conserver, so seems a bit
> difficult to use kgdb or kdb in this case.
> 
> CPU information,
> 
> # lscpu
> Architecture:        aarch64
> Byte Order:          Little Endian
> CPU(s):              256
> On-line CPU(s) list: 0-255
> Thread(s) per core:  4
> Core(s) per socket:  32
> Socket(s):           2
> NUMA node(s):        2
> Vendor ID:           Cavium
> Model:               1
> Model name:          ThunderX2 99xx
> Stepping:            0x1
> BogoMIPS:            400.00
> L1d cache:           32K
> L1i cache:           32K
> L2 cache:            256K
> L3 cache:            32768K
> NUMA node0 CPU(s):   0-127
> NUMA node1 CPU(s):   128-255
> Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid
> asimdrdm
> 

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: arm64: kdump broken on a large CPU system
  2018-12-11 10:09 ` Marc Zyngier
@ 2018-12-11 11:34   ` James Morse
  2018-12-12  2:51     ` AKASHI, Takahiro
  0 siblings, 1 reply; 41+ messages in thread
From: James Morse @ 2018-12-11 11:34 UTC (permalink / raw)
  To: Marc Zyngier, Qian Cai
  Cc: Catalin Marinas, AKASHI, Takahiro, Will Deacon, linux-arm-kernel,
	Ard Biesheuvel

Hi Qian, Marc,

On 11/12/2018 10:09, Marc Zyngier wrote:
> On 10/12/2018 22:30, Qian Cai wrote:
>> On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash dump just
>> hung (4.20-rc6 as well as 4.18). It was confirmed that the executing went as far
>> as entering __cpu_soft_restart(),
> 
> You can forget about 4.18 altogether, it will never correctly kexec.
> I've used 4.20 + kexec on a TX2 system though, and although it takes
> absolutely ages, it reliably works.

>> __crash_kexec
>>   machine_kexec
>>     cpu_soft_restart
>>       restart
>>         __cpu_soft_restart
>>
>> The earlycon was enabled but had no output from the 2nd kernel, so it was pretty
>> much stuck in all those assembly code in arm64/kernel/head.S or the early part
>> of start_kernel() before earlycon was initialized.
> 
> Could it instead be in the purgatory code provided by userspace?

Yes, this could be anything between entering __cpu_soft_restart(), purgatory and
the earlycon driver in the new kernel.


>> It turned out this has something to do with nr_cpus in the 1st kernel, although
>> the 2nd kernel always has nr_cpus=1 [1]. It was tested with both
>> crashkernel=512M or 768M.
> 
> James was saying something about a timeout, which may or may not be long
> enough.

This comes from arch/arm64/kernel/smp.c:crash_smp_send_stop()
It sends IPIs to all other CPUs, then waits one second before timing-out.
This may not be enough time for a system with hundreds of CPUs.

Increasing the timeout may help, but I don't understand why extra CPUs would
matter if we're getting as far as __cpu_soft_restart().


>> nr_cpus <= 96  GOOD (2nd kernel was up in 2-3 mins.)
>> nr_cpus=256    BAD  (2nd kernel was NOT up after 1 hour.)
>> nr_cpus=127    BAD  (2nd kernel was NOT up after 10 mins.)
>>
>> I did also test with and without CONFIG_ARM64_VHE (i.e., el2_switch) made no
>> difference.

>> [1] KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1 swiotlb=noforce reset_devices"

>> I am still figuring out a way to debug those assembly code to where it actually
>> hung, 

There were some earlier patches to purgatory to let it write the console, but
this didn't scale as purgatory isn't an operating-system. (Reducing purgatory to
be as simple as possible is better, with kexec_file_load() we don't use it all.)

If kexec-tools still has a 'ARM64_DEBUG_PORT' you may be able to get it to write
to your uart. (no idea which uarts it supports, or how it tells pl011 and 8250
apart).

Some threads to pull on:
https://patchwork.kernel.org/patch/6121951/
https://patchwork.kernel.org/patch/9238475/
(search for 'TX as the first port?' in the last one)


>> but the server was hooked up with a conserver that was not able to
>> generate any sysrq and I have no shell access to the conserver, so seems a bit
>> difficult to use kgdb or kdb in this case.

More recent kexec tools has a 'lite' or 'no-checks' option that tells it not to
bother checksumming the kdump kernel. This is what takes a long time as its done
without the MMU+caches enabled.
It shouldn't be possible for the old-kernel to corrupt it, as its not mapped
unless its being loaded (or save/restored by hibernate). I'm not sure how the
crash-regs get written to the elfcore header though...


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: arm64: kdump broken on a large CPU system
  2018-12-11 11:34   ` James Morse
@ 2018-12-12  2:51     ` AKASHI, Takahiro
  2018-12-12  4:39         ` Qian Cai
  0 siblings, 1 reply; 41+ messages in thread
From: AKASHI, Takahiro @ 2018-12-12  2:51 UTC (permalink / raw)
  To: James Morse
  Cc: Ard Biesheuvel, Marc Zyngier, Catalin Marinas, Will Deacon,
	Qian Cai, linux-arm-kernel

On Tue, Dec 11, 2018 at 11:34:22AM +0000, James Morse wrote:
> Hi Qian, Marc,
> 
> On 11/12/2018 10:09, Marc Zyngier wrote:
> > On 10/12/2018 22:30, Qian Cai wrote:
> >> On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash dump just
> >> hung (4.20-rc6 as well as 4.18). It was confirmed that the executing went as far
> >> as entering __cpu_soft_restart(),
> > 
> > You can forget about 4.18 altogether, it will never correctly kexec.
> > I've used 4.20 + kexec on a TX2 system though, and although it takes
> > absolutely ages, it reliably works.
> 
> >> __crash_kexec
> >>   machine_kexec
> >>     cpu_soft_restart
> >>       restart
> >>         __cpu_soft_restart

@Qian, how did you confirm that you reached here?

> >>
> >> The earlycon was enabled but had no output from the 2nd kernel, so it was pretty
> >> much stuck in all those assembly code in arm64/kernel/head.S or the early part
> >> of start_kernel() before earlycon was initialized.
> > 
> > Could it instead be in the purgatory code provided by userspace?
> 
> Yes, this could be anything between entering __cpu_soft_restart(), purgatory and
> the earlycon driver in the new kernel.

To be in purgatory, or not to be, that is the question.
(I'm serious.)

> 
> >> It turned out this has something to do with nr_cpus in the 1st kernel, although
> >> the 2nd kernel always has nr_cpus=1 [1]. It was tested with both
> >> crashkernel=512M or 768M.
> > 
> > James was saying something about a timeout, which may or may not be long
> > enough.
> 
> This comes from arch/arm64/kernel/smp.c:crash_smp_send_stop()
> It sends IPIs to all other CPUs, then waits one second before timing-out.
> This may not be enough time for a system with hundreds of CPUs.
> 
> Increasing the timeout may help, but I don't understand why extra CPUs would
> matter if we're getting as far as __cpu_soft_restart().

Indeed.

> 
> >> nr_cpus <= 96  GOOD (2nd kernel was up in 2-3 mins.)
> >> nr_cpus=256    BAD  (2nd kernel was NOT up after 1 hour.)
> >> nr_cpus=127    BAD  (2nd kernel was NOT up after 10 mins.)
> >>
> >> I did also test with and without CONFIG_ARM64_VHE (i.e., el2_switch) made no
> >> difference.
> 
> >> [1] KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1 swiotlb=noforce reset_devices"
> 
> >> I am still figuring out a way to debug those assembly code to where it actually
> >> hung, 

In my experiences, I have used the patch I mention below
as well as a hw debugger, DS-5 with FVP in my case, for examining
purgatory-related issues.


> There were some earlier patches to purgatory to let it write the console, but
> this didn't scale as purgatory isn't an operating-system. (Reducing purgatory to
> be as simple as possible is better, with kexec_file_load() we don't use it all.)
> 
> If kexec-tools still has a 'ARM64_DEBUG_PORT' you may be able to get it to write
> to your uart. (no idea which uarts it supports, or how it tells pl011 and 8250
> apart).

@James, are you sure? I don't see it.
@Qian, I can give you a small patch of enabling printf in purgatory,
although it's quite hacky, if you want.
(As thunder X2 has a pl011, the patch should work.)

> Some threads to pull on:
> https://patchwork.kernel.org/patch/6121951/
> https://patchwork.kernel.org/patch/9238475/
> (search for 'TX as the first port?' in the last one)
> 
> 
> >> but the server was hooked up with a conserver that was not able to
> >> generate any sysrq and I have no shell access to the conserver, so seems a bit
> >> difficult to use kgdb or kdb in this case.
> 
> More recent kexec tools has a 'lite' or 'no-checks' option that tells it not to
> bother checksumming the kdump kernel.

Are you sure? I remember that Geoff's original patch was rejected.

> This is what takes a long time as its done
> without the MMU+caches enabled.

Pratyush has a patch of enabling MMU in purgatory, but again
it was rejected.

Thanks,
-Takahiro Akashi

> It shouldn't be possible for the old-kernel to corrupt it, as its not mapped
> unless its being loaded (or save/restored by hibernate). I'm not sure how the
> crash-regs get written to the elfcore header though...
> 
> 
> Thanks,
> 
> James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: arm64: kdump broken on a large CPU system
  2018-12-12  2:51     ` AKASHI, Takahiro
@ 2018-12-12  4:39         ` Qian Cai
  0 siblings, 0 replies; 41+ messages in thread
From: Qian Cai @ 2018-12-12  4:39 UTC (permalink / raw)
  To: AKASHI, Takahiro, James Morse, Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, kexec, linux-arm-kernel, Ard Biesheuvel

[+ kexec@lists.infradead.org]

The debugging progress so far...

Wait up to 5 minutes for other CPUs to stop in crash_smp_send_stop() made no
difference.

With "dev" branch of this tree [1], it is possible to print out messages from
purgatory when passing something like "--port=0x602B0000
--port-lsr=0x602B0000,0x80" to kexec. However, even enable_dcache() in
setup_arch() will hung like forever on this machine (working fine on another
arm64 server - Cortex-A72). After removed only enable_dcache() /
disable_dcache() from setup_arch() etc without removing printf() lines, it did
print out,

I'm in purgatory
purgatory: entry=0000000090080000
purgatory: dtb=0000000092d50000
purgatory: D-cache Enabled before SHA verification
purgatory: D-cache Disabled after SHA verification

So, it confirmed that it must hung somewhere in arm64/kernel/head.S (.stext) or
the early part of start_kernel() before earlycon was initialized.

Also confirmed that passing nr_cpus=64 in the first kernel would again make
everything work fine with this new kexec.

Since enable_dcache() would hung as well, I suspect this has something to do
with enabling MMU (i.e, .stext -> __primary_switch -> __enable_mmu) coupling
with some sort of per-CPU data where the number of CPUs matters.

Right now, I think I need to find a way to print directly to pl011 serial
console while debugging those assembly code like CONFIG_DEBUG_LL for arm64, so
it can be used to locate where exactly it hung. Otherwise, I am shooting in the
dark.

[1] https://github.com/pratyushanand/kexec-tools

=== original email ===

On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash dump just
hung (4.20-rc6 as well as 4.18). It was confirmed that the executing went as far
as entering __cpu_soft_restart(),

__crash_kexec
  machine_kexec
    cpu_soft_restart
      restart
        __cpu_soft_restart

The earlycon was enabled but had no output from the 2nd kernel, so it was pretty
much stuck in all those assembly code in arm64/kernel/head.S or the early part
of start_kernel() before earlycon was initialized.

It turned out this has something to do with nr_cpus in the 1st kernel, although
the 2nd kernel always has nr_cpus=1 [1]. It was tested with both
crashkernel=512M or 768M.

nr_cpus <= 96  GOOD (2nd kernel was up in 2-3 mins.)
nr_cpus=256    BAD  (2nd kernel was NOT up after 1 hour.)
nr_cpus=127    BAD  (2nd kernel was NOT up after 10 mins.)

I did also test with and without CONFIG_ARM64_VHE (i.e., el2_switch) made no
difference.

[1] KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1 swiotlb=noforce reset_devices"

I am still figuring out a way to debug those assembly code to where it actually
hung, but the server was hooked up with a conserver that was not able to
generate any sysrq and I have no shell access to the conserver, so seems a bit
difficult to use kgdb or kdb in this case.

CPU information,

# lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              256
On-line CPU(s) list: 0-255
Thread(s) per core:  4
Core(s) per socket:  32
Socket(s):           2
NUMA node(s):        2
Vendor ID:           Cavium
Model:               1
Model name:          ThunderX2 99xx
Stepping:            0x1
BogoMIPS:            400.00
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            32768K
NUMA node0 CPU(s):   0-127
NUMA node1 CPU(s):   128-255
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid
asimdrdm

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: arm64: kdump broken on a large CPU system
@ 2018-12-12  4:39         ` Qian Cai
  0 siblings, 0 replies; 41+ messages in thread
From: Qian Cai @ 2018-12-12  4:39 UTC (permalink / raw)
  To: AKASHI, Takahiro, James Morse, Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, kexec, linux-arm-kernel, Ard Biesheuvel

[+ kexec@lists.infradead.org]

The debugging progress so far...

Wait up to 5 minutes for other CPUs to stop in crash_smp_send_stop() made no
difference.

With "dev" branch of this tree [1], it is possible to print out messages from
purgatory when passing something like "--port=0x602B0000
--port-lsr=0x602B0000,0x80" to kexec. However, even enable_dcache() in
setup_arch() will hung like forever on this machine (working fine on another
arm64 server - Cortex-A72). After removed only enable_dcache() /
disable_dcache() from setup_arch() etc without removing printf() lines, it did
print out,

I'm in purgatory
purgatory: entry=0000000090080000
purgatory: dtb=0000000092d50000
purgatory: D-cache Enabled before SHA verification
purgatory: D-cache Disabled after SHA verification

So, it confirmed that it must hung somewhere in arm64/kernel/head.S (.stext) or
the early part of start_kernel() before earlycon was initialized.

Also confirmed that passing nr_cpus=64 in the first kernel would again make
everything work fine with this new kexec.

Since enable_dcache() would hung as well, I suspect this has something to do
with enabling MMU (i.e, .stext -> __primary_switch -> __enable_mmu) coupling
with some sort of per-CPU data where the number of CPUs matters.

Right now, I think I need to find a way to print directly to pl011 serial
console while debugging those assembly code like CONFIG_DEBUG_LL for arm64, so
it can be used to locate where exactly it hung. Otherwise, I am shooting in the
dark.

[1] https://github.com/pratyushanand/kexec-tools

=== original email ===

On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash dump just
hung (4.20-rc6 as well as 4.18). It was confirmed that the executing went as far
as entering __cpu_soft_restart(),

__crash_kexec
  machine_kexec
    cpu_soft_restart
      restart
        __cpu_soft_restart

The earlycon was enabled but had no output from the 2nd kernel, so it was pretty
much stuck in all those assembly code in arm64/kernel/head.S or the early part
of start_kernel() before earlycon was initialized.

It turned out this has something to do with nr_cpus in the 1st kernel, although
the 2nd kernel always has nr_cpus=1 [1]. It was tested with both
crashkernel=512M or 768M.

nr_cpus <= 96  GOOD (2nd kernel was up in 2-3 mins.)
nr_cpus=256    BAD  (2nd kernel was NOT up after 1 hour.)
nr_cpus=127    BAD  (2nd kernel was NOT up after 10 mins.)

I did also test with and without CONFIG_ARM64_VHE (i.e., el2_switch) made no
difference.

[1] KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1 swiotlb=noforce reset_devices"

I am still figuring out a way to debug those assembly code to where it actually
hung, but the server was hooked up with a conserver that was not able to
generate any sysrq and I have no shell access to the conserver, so seems a bit
difficult to use kgdb or kdb in this case.

CPU information,

# lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              256
On-line CPU(s) list: 0-255
Thread(s) per core:  4
Core(s) per socket:  32
Socket(s):           2
NUMA node(s):        2
Vendor ID:           Cavium
Model:               1
Model name:          ThunderX2 99xx
Stepping:            0x1
BogoMIPS:            400.00
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            32768K
NUMA node0 CPU(s):   0-127
NUMA node1 CPU(s):   128-255
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid
asimdrdm

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: arm64: kdump broken on a large CPU system
  2018-12-12  4:39         ` Qian Cai
@ 2018-12-12 22:37           ` Qian Cai
  -1 siblings, 0 replies; 41+ messages in thread
From: Qian Cai @ 2018-12-12 22:37 UTC (permalink / raw)
  To: AKASHI, Takahiro, James Morse, Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, kexec, linux-arm-kernel, Ard Biesheuvel

On Tue, 2018-12-11 at 23:39 -0500, Qian Cai wrote:
> [+ kexec@lists.infradead.org]
> 
> The debugging progress so far...
> 
> Wait up to 5 minutes for other CPUs to stop in crash_smp_send_stop() made no
> difference.
> 
> With "dev" branch of this tree [1], it is possible to print out messages from
> purgatory when passing something like "--port=0x602B0000
> --port-lsr=0x602B0000,0x80" to kexec. However, even enable_dcache() in
> setup_arch() will hung like forever on this machine (working fine on another
> arm64 server - Cortex-A72). After removed only enable_dcache() /
> disable_dcache() from setup_arch() etc without removing printf() lines, it did
> print out,
> 
> I'm in purgatory
> purgatory: entry=0000000090080000
> purgatory: dtb=0000000092d50000
> purgatory: D-cache Enabled before SHA verification
> purgatory: D-cache Disabled after SHA verification
> 
> So, it confirmed that it must hung somewhere in arm64/kernel/head.S (.stext)
> or
> the early part of start_kernel() before earlycon was initialized.
> 
> Also confirmed that passing nr_cpus=64 in the first kernel would again make
> everything work fine with this new kexec.
> 
> Since enable_dcache() would hung as well, I suspect this has something to do
> with enabling MMU (i.e, .stext -> __primary_switch -> __enable_mmu) coupling
> with some sort of per-CPU data where the number of CPUs matters.

Still debugging a hung to enable MMU (enable_dcache) in purgatory [1] which may
provide some clues for the hung later in the 2nd kernel.

dsb	nshst
tlbi	alle2
dsb	nsh
isb
bl	get_ips_bits
lsl	x1, x0, #TCR_IPS_EL2_SHIFT
orr	x1, x1, x7
mov	x0, x6
ldr	x2, =MEMORY_ATTRIBUTES
msr	mair_el2, x2
msr	tcr_el2, x1
msr	ttbr0_el2, x0
isb
mrs	x0, sctlr_el2
ldr	x3, =SCTLR_ELx_FLAGS
orr	x0, x0, x3
msr	sctlr_el2, x0     <--- hung right on this instruction.

Without CONFIG_ARM64_VHE (i.e., running in EL1), it is able to run
enable_dcache() but it still hung later in the 2nd kernel somewhere.

dsb	nshst
tlbi	vmalle1
dsb	nsh
isb
bl	get_ips_bits
lsl	x1, x0, #TCR_IPS_EL1_SHIFT
orr	x1, x1, x7
mov	x0, x6
ldr	x2, =MEMORY_ATTRIBUTES
msr	mair_el1, x2
msr	tcr_el1, x1
msr	ttbr0_el1, x0
isb
mrs	x0, sctlr_el1
ldr	x3, =SCTLR_ELx_FLAGS
orr	x0, x0, x3
msr	sctlr_el1, x0
isb

One data point of this system is that it has 4 threads on each core. Each 2-core 
share a same L1 and L2 caches, so that is 8 CPUs shares them each. All CPUs
share a same L3 cache.

Hence, I wonder if this is because of incomplete cache/TLB invalidation that had
stale entries (or uninitialised junk which just happens to look valid) present
before turning the MMU on.

[1] https://github.com/pratyushanand/kexec-tools/blob/devel/purgatory/arch/\
arm64/cache.S

> 
> Right now, I think I need to find a way to print directly to pl011 serial
> console while debugging those assembly code like CONFIG_DEBUG_LL for arm64, so
> it can be used to locate where exactly it hung. Otherwise, I am shooting in
> the
> dark.
> 
> [1] https://github.com/pratyushanand/kexec-tools
> 
> === original email ===
> 
> On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash dump just
> hung (4.20-rc6 as well as 4.18). It was confirmed that the executing went as
> far
> as entering __cpu_soft_restart(),
> 
> __crash_kexec
>   machine_kexec
>     cpu_soft_restart
>       restart
>         __cpu_soft_restart
> 
> The earlycon was enabled but had no output from the 2nd kernel, so it was
> pretty
> much stuck in all those assembly code in arm64/kernel/head.S or the early part
> of start_kernel() before earlycon was initialized.
> 
> It turned out this has something to do with nr_cpus in the 1st kernel,
> although
> the 2nd kernel always has nr_cpus=1 [1]. It was tested with both
> crashkernel=512M or 768M.
> 
> nr_cpus <= 96  GOOD (2nd kernel was up in 2-3 mins.)
> nr_cpus=256    BAD  (2nd kernel was NOT up after 1 hour.)
> nr_cpus=127    BAD  (2nd kernel was NOT up after 10 mins.)
> 
> I did also test with and without CONFIG_ARM64_VHE (i.e., el2_switch) made no
> difference.
> 
> [1] KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1 swiotlb=noforce reset_devices"
> 
> I am still figuring out a way to debug those assembly code to where it
> actually
> hung, but the server was hooked up with a conserver that was not able to
> generate any sysrq and I have no shell access to the conserver, so seems a bit
> difficult to use kgdb or kdb in this case.
> 
> CPU information,
> 
> # lscpu
> Architecture:        aarch64
> Byte Order:          Little Endian
> CPU(s):              256
> On-line CPU(s) list: 0-255
> Thread(s) per core:  4
> Core(s) per socket:  32
> Socket(s):           2
> NUMA node(s):        2
> Vendor ID:           Cavium
> Model:               1
> Model name:          ThunderX2 99xx
> Stepping:            0x1
> BogoMIPS:            400.00
> L1d cache:           32K
> L1i cache:           32K
> L2 cache:            256K
> L3 cache:            32768K
> NUMA node0 CPU(s):   0-127
> NUMA node1 CPU(s):   128-255
> Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid
> asimdrdm

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: arm64: kdump broken on a large CPU system
@ 2018-12-12 22:37           ` Qian Cai
  0 siblings, 0 replies; 41+ messages in thread
From: Qian Cai @ 2018-12-12 22:37 UTC (permalink / raw)
  To: AKASHI, Takahiro, James Morse, Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, kexec, linux-arm-kernel, Ard Biesheuvel

On Tue, 2018-12-11 at 23:39 -0500, Qian Cai wrote:
> [+ kexec@lists.infradead.org]
> 
> The debugging progress so far...
> 
> Wait up to 5 minutes for other CPUs to stop in crash_smp_send_stop() made no
> difference.
> 
> With "dev" branch of this tree [1], it is possible to print out messages from
> purgatory when passing something like "--port=0x602B0000
> --port-lsr=0x602B0000,0x80" to kexec. However, even enable_dcache() in
> setup_arch() will hung like forever on this machine (working fine on another
> arm64 server - Cortex-A72). After removed only enable_dcache() /
> disable_dcache() from setup_arch() etc without removing printf() lines, it did
> print out,
> 
> I'm in purgatory
> purgatory: entry=0000000090080000
> purgatory: dtb=0000000092d50000
> purgatory: D-cache Enabled before SHA verification
> purgatory: D-cache Disabled after SHA verification
> 
> So, it confirmed that it must hung somewhere in arm64/kernel/head.S (.stext)
> or
> the early part of start_kernel() before earlycon was initialized.
> 
> Also confirmed that passing nr_cpus=64 in the first kernel would again make
> everything work fine with this new kexec.
> 
> Since enable_dcache() would hung as well, I suspect this has something to do
> with enabling MMU (i.e, .stext -> __primary_switch -> __enable_mmu) coupling
> with some sort of per-CPU data where the number of CPUs matters.

Still debugging a hung to enable MMU (enable_dcache) in purgatory [1] which may
provide some clues for the hung later in the 2nd kernel.

dsb	nshst
tlbi	alle2
dsb	nsh
isb
bl	get_ips_bits
lsl	x1, x0, #TCR_IPS_EL2_SHIFT
orr	x1, x1, x7
mov	x0, x6
ldr	x2, =MEMORY_ATTRIBUTES
msr	mair_el2, x2
msr	tcr_el2, x1
msr	ttbr0_el2, x0
isb
mrs	x0, sctlr_el2
ldr	x3, =SCTLR_ELx_FLAGS
orr	x0, x0, x3
msr	sctlr_el2, x0     <--- hung right on this instruction.

Without CONFIG_ARM64_VHE (i.e., running in EL1), it is able to run
enable_dcache() but it still hung later in the 2nd kernel somewhere.

dsb	nshst
tlbi	vmalle1
dsb	nsh
isb
bl	get_ips_bits
lsl	x1, x0, #TCR_IPS_EL1_SHIFT
orr	x1, x1, x7
mov	x0, x6
ldr	x2, =MEMORY_ATTRIBUTES
msr	mair_el1, x2
msr	tcr_el1, x1
msr	ttbr0_el1, x0
isb
mrs	x0, sctlr_el1
ldr	x3, =SCTLR_ELx_FLAGS
orr	x0, x0, x3
msr	sctlr_el1, x0
isb

One data point of this system is that it has 4 threads on each core. Each 2-core 
share a same L1 and L2 caches, so that is 8 CPUs shares them each. All CPUs
share a same L3 cache.

Hence, I wonder if this is because of incomplete cache/TLB invalidation that had
stale entries (or uninitialised junk which just happens to look valid) present
before turning the MMU on.

[1] https://github.com/pratyushanand/kexec-tools/blob/devel/purgatory/arch/\
arm64/cache.S

> 
> Right now, I think I need to find a way to print directly to pl011 serial
> console while debugging those assembly code like CONFIG_DEBUG_LL for arm64, so
> it can be used to locate where exactly it hung. Otherwise, I am shooting in
> the
> dark.
> 
> [1] https://github.com/pratyushanand/kexec-tools
> 
> === original email ===
> 
> On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash dump just
> hung (4.20-rc6 as well as 4.18). It was confirmed that the executing went as
> far
> as entering __cpu_soft_restart(),
> 
> __crash_kexec
>   machine_kexec
>     cpu_soft_restart
>       restart
>         __cpu_soft_restart
> 
> The earlycon was enabled but had no output from the 2nd kernel, so it was
> pretty
> much stuck in all those assembly code in arm64/kernel/head.S or the early part
> of start_kernel() before earlycon was initialized.
> 
> It turned out this has something to do with nr_cpus in the 1st kernel,
> although
> the 2nd kernel always has nr_cpus=1 [1]. It was tested with both
> crashkernel=512M or 768M.
> 
> nr_cpus <= 96  GOOD (2nd kernel was up in 2-3 mins.)
> nr_cpus=256    BAD  (2nd kernel was NOT up after 1 hour.)
> nr_cpus=127    BAD  (2nd kernel was NOT up after 10 mins.)
> 
> I did also test with and without CONFIG_ARM64_VHE (i.e., el2_switch) made no
> difference.
> 
> [1] KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1 swiotlb=noforce reset_devices"
> 
> I am still figuring out a way to debug those assembly code to where it
> actually
> hung, but the server was hooked up with a conserver that was not able to
> generate any sysrq and I have no shell access to the conserver, so seems a bit
> difficult to use kgdb or kdb in this case.
> 
> CPU information,
> 
> # lscpu
> Architecture:        aarch64
> Byte Order:          Little Endian
> CPU(s):              256
> On-line CPU(s) list: 0-255
> Thread(s) per core:  4
> Core(s) per socket:  32
> Socket(s):           2
> NUMA node(s):        2
> Vendor ID:           Cavium
> Model:               1
> Model name:          ThunderX2 99xx
> Stepping:            0x1
> BogoMIPS:            400.00
> L1d cache:           32K
> L1i cache:           32K
> L2 cache:            256K
> L3 cache:            32768K
> NUMA node0 CPU(s):   0-127
> NUMA node1 CPU(s):   128-255
> Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid
> asimdrdm

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] arm64: invalidate TLB before turning MMU on
  2018-12-12 22:37           ` Qian Cai
  (?)
@ 2018-12-13  5:22             ` Qian Cai
  -1 siblings, 0 replies; 41+ messages in thread
From: Qian Cai @ 2018-12-13  5:22 UTC (permalink / raw)
  To: catalin.marinas, will.deacon
  Cc: marc.zyngier, james.morse, takahiro.akashi, ard.biesheuvel,
	linux-arm-kernel, kexec, linux-kernel, Qian Cai

On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
dump just hung. It has 4 threads on each core. Each 2-core share a same
L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
L3 cache.

It turned out that this was due to the TLB contained stale entries (or
uninitialized junk which just happened to look valid) from the first
kernel before turning the MMU on in the second kernel which caused this
instruction hung,

msr	sctlr_el1, x0

Signed-off-by: Qian Cai <cai@lca.pw>
---
 arch/arm64/kernel/head.S | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 4471f570a295..5196f3d729de 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
 	msr	ttbr0_el1, x2			// load TTBR0
 	msr	ttbr1_el1, x1			// load TTBR1
 	isb
+	dsb	nshst
+	tlbi	vmalle1				// invalidate TLB
+	dsb	nsh
+	isb
 	msr	sctlr_el1, x0
 	isb
 	/*
-- 
2.17.2 (Apple Git-113)


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH] arm64: invalidate TLB before turning MMU on
@ 2018-12-13  5:22             ` Qian Cai
  0 siblings, 0 replies; 41+ messages in thread
From: Qian Cai @ 2018-12-13  5:22 UTC (permalink / raw)
  To: catalin.marinas, will.deacon
  Cc: ard.biesheuvel, marc.zyngier, kexec, linux-kernel,
	takahiro.akashi, james.morse, Qian Cai, linux-arm-kernel

On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
dump just hung. It has 4 threads on each core. Each 2-core share a same
L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
L3 cache.

It turned out that this was due to the TLB contained stale entries (or
uninitialized junk which just happened to look valid) from the first
kernel before turning the MMU on in the second kernel which caused this
instruction hung,

msr	sctlr_el1, x0

Signed-off-by: Qian Cai <cai@lca.pw>
---
 arch/arm64/kernel/head.S | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 4471f570a295..5196f3d729de 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
 	msr	ttbr0_el1, x2			// load TTBR0
 	msr	ttbr1_el1, x1			// load TTBR1
 	isb
+	dsb	nshst
+	tlbi	vmalle1				// invalidate TLB
+	dsb	nsh
+	isb
 	msr	sctlr_el1, x0
 	isb
 	/*
-- 
2.17.2 (Apple Git-113)


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH] arm64: invalidate TLB before turning MMU on
@ 2018-12-13  5:22             ` Qian Cai
  0 siblings, 0 replies; 41+ messages in thread
From: Qian Cai @ 2018-12-13  5:22 UTC (permalink / raw)
  To: catalin.marinas, will.deacon
  Cc: ard.biesheuvel, marc.zyngier, kexec, linux-kernel,
	takahiro.akashi, james.morse, Qian Cai, linux-arm-kernel

On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
dump just hung. It has 4 threads on each core. Each 2-core share a same
L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
L3 cache.

It turned out that this was due to the TLB contained stale entries (or
uninitialized junk which just happened to look valid) from the first
kernel before turning the MMU on in the second kernel which caused this
instruction hung,

msr	sctlr_el1, x0

Signed-off-by: Qian Cai <cai@lca.pw>
---
 arch/arm64/kernel/head.S | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 4471f570a295..5196f3d729de 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
 	msr	ttbr0_el1, x2			// load TTBR0
 	msr	ttbr1_el1, x1			// load TTBR1
 	isb
+	dsb	nshst
+	tlbi	vmalle1				// invalidate TLB
+	dsb	nsh
+	isb
 	msr	sctlr_el1, x0
 	isb
 	/*
-- 
2.17.2 (Apple Git-113)


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH] arm64: invalidate TLB before turning MMU on
  2018-12-13  5:22             ` Qian Cai
  (?)
@ 2018-12-13  5:40               ` Bhupesh Sharma
  -1 siblings, 0 replies; 41+ messages in thread
From: Bhupesh Sharma @ 2018-12-13  5:40 UTC (permalink / raw)
  To: cai
  Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, Marc Zyngier,
	kexec mailing list, Linux Kernel Mailing List, AKASHI Takahiro,
	James Morse, linux-arm-kernel, Bhupesh Sharma, Bhupesh SHARMA

Hi Qian Cai,

On Thu, Dec 13, 2018 at 10:53 AM Qian Cai <cai@lca.pw> wrote:
>
> On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
> dump just hung. It has 4 threads on each core. Each 2-core share a same
> L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
> L3 cache.
>
> It turned out that this was due to the TLB contained stale entries (or
> uninitialized junk which just happened to look valid) from the first
> kernel before turning the MMU on in the second kernel which caused this
> instruction hung,
>
> msr     sctlr_el1, x0
>
> Signed-off-by: Qian Cai <cai@lca.pw>
> ---
>  arch/arm64/kernel/head.S | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 4471f570a295..5196f3d729de 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
>         msr     ttbr0_el1, x2                   // load TTBR0
>         msr     ttbr1_el1, x1                   // load TTBR1
>         isb
> +       dsb     nshst
> +       tlbi    vmalle1                         // invalidate TLB
> +       dsb     nsh
> +       isb

This will be executed both for the primary and kdump kernel, right? I
don't think we really want to invalidate the TLB when booting the
primary kernel.
It would be too slow and considering that we need to minimize boot
timings on embedded arm64 devices, I think it would not be a good
idea.

>         msr     sctlr_el1, x0
>         isb
>         /*
> --
> 2.17.2 (Apple Git-113)
>

Also did you check this issue I reported on the HPE apollo machines
some days back with the kdump kernel boot
<https://www.spinics.net/lists/kexec/msg21750.html>.
Can you please confirm that you are not facing the same issue (as I
suspect from reading your earlier Bug Report) on the HPE apollo
machine. Also adding 'earlycon' to the bootargs being passed to the
kdump kernel you can see if you are able to atleast get some console
output from the kdump kernel.

Thanks,
Bhupesh

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] arm64: invalidate TLB before turning MMU on
@ 2018-12-13  5:40               ` Bhupesh Sharma
  0 siblings, 0 replies; 41+ messages in thread
From: Bhupesh Sharma @ 2018-12-13  5:40 UTC (permalink / raw)
  To: cai
  Cc: Ard Biesheuvel, Marc Zyngier, Catalin Marinas, Bhupesh Sharma,
	Will Deacon, Linux Kernel Mailing List, AKASHI Takahiro,
	James Morse, Bhupesh SHARMA, kexec mailing list,
	linux-arm-kernel

Hi Qian Cai,

On Thu, Dec 13, 2018 at 10:53 AM Qian Cai <cai@lca.pw> wrote:
>
> On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
> dump just hung. It has 4 threads on each core. Each 2-core share a same
> L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
> L3 cache.
>
> It turned out that this was due to the TLB contained stale entries (or
> uninitialized junk which just happened to look valid) from the first
> kernel before turning the MMU on in the second kernel which caused this
> instruction hung,
>
> msr     sctlr_el1, x0
>
> Signed-off-by: Qian Cai <cai@lca.pw>
> ---
>  arch/arm64/kernel/head.S | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 4471f570a295..5196f3d729de 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
>         msr     ttbr0_el1, x2                   // load TTBR0
>         msr     ttbr1_el1, x1                   // load TTBR1
>         isb
> +       dsb     nshst
> +       tlbi    vmalle1                         // invalidate TLB
> +       dsb     nsh
> +       isb

This will be executed both for the primary and kdump kernel, right? I
don't think we really want to invalidate the TLB when booting the
primary kernel.
It would be too slow and considering that we need to minimize boot
timings on embedded arm64 devices, I think it would not be a good
idea.

>         msr     sctlr_el1, x0
>         isb
>         /*
> --
> 2.17.2 (Apple Git-113)
>

Also did you check this issue I reported on the HPE apollo machines
some days back with the kdump kernel boot
<https://www.spinics.net/lists/kexec/msg21750.html>.
Can you please confirm that you are not facing the same issue (as I
suspect from reading your earlier Bug Report) on the HPE apollo
machine. Also adding 'earlycon' to the bootargs being passed to the
kdump kernel you can see if you are able to atleast get some console
output from the kdump kernel.

Thanks,
Bhupesh

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] arm64: invalidate TLB before turning MMU on
@ 2018-12-13  5:40               ` Bhupesh Sharma
  0 siblings, 0 replies; 41+ messages in thread
From: Bhupesh Sharma @ 2018-12-13  5:40 UTC (permalink / raw)
  To: cai
  Cc: Ard Biesheuvel, Marc Zyngier, Catalin Marinas, Bhupesh Sharma,
	Will Deacon, Linux Kernel Mailing List, AKASHI Takahiro,
	James Morse, Bhupesh SHARMA, kexec mailing list,
	linux-arm-kernel

Hi Qian Cai,

On Thu, Dec 13, 2018 at 10:53 AM Qian Cai <cai@lca.pw> wrote:
>
> On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
> dump just hung. It has 4 threads on each core. Each 2-core share a same
> L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
> L3 cache.
>
> It turned out that this was due to the TLB contained stale entries (or
> uninitialized junk which just happened to look valid) from the first
> kernel before turning the MMU on in the second kernel which caused this
> instruction hung,
>
> msr     sctlr_el1, x0
>
> Signed-off-by: Qian Cai <cai@lca.pw>
> ---
>  arch/arm64/kernel/head.S | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 4471f570a295..5196f3d729de 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
>         msr     ttbr0_el1, x2                   // load TTBR0
>         msr     ttbr1_el1, x1                   // load TTBR1
>         isb
> +       dsb     nshst
> +       tlbi    vmalle1                         // invalidate TLB
> +       dsb     nsh
> +       isb

This will be executed both for the primary and kdump kernel, right? I
don't think we really want to invalidate the TLB when booting the
primary kernel.
It would be too slow and considering that we need to minimize boot
timings on embedded arm64 devices, I think it would not be a good
idea.

>         msr     sctlr_el1, x0
>         isb
>         /*
> --
> 2.17.2 (Apple Git-113)
>

Also did you check this issue I reported on the HPE apollo machines
some days back with the kdump kernel boot
<https://www.spinics.net/lists/kexec/msg21750.html>.
Can you please confirm that you are not facing the same issue (as I
suspect from reading your earlier Bug Report) on the HPE apollo
machine. Also adding 'earlycon' to the bootargs being passed to the
kdump kernel you can see if you are able to atleast get some console
output from the kdump kernel.

Thanks,
Bhupesh

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] arm64: invalidate TLB before turning MMU on
  2018-12-13  5:22             ` Qian Cai
  (?)
@ 2018-12-13 10:44               ` James Morse
  -1 siblings, 0 replies; 41+ messages in thread
From: James Morse @ 2018-12-13 10:44 UTC (permalink / raw)
  To: Qian Cai
  Cc: catalin.marinas, will.deacon, marc.zyngier, takahiro.akashi,
	ard.biesheuvel, linux-arm-kernel, kexec, linux-kernel

Hi Qian,

On 13/12/2018 05:22, Qian Cai wrote:
> On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
> dump just hung. It has 4 threads on each core. Each 2-core share a same
> L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
> L3 cache.
> 
> It turned out that this was due to the TLB contained stale entries (or
> uninitialized junk which just happened to look valid) from the first
> kernel before turning the MMU on in the second kernel which caused this
> instruction hung,

This is a great find, thanks for debugging this!

The kernel should already handle this, as we don't trust the bootloader to clean
up either.

In arch/arm64/mm/proc.S::__cpu_setup()
|/*
| *	__cpu_setup
| *
| *	Initialise the processor for turning the MMU on.  Return in x0 the
| *	value of the SCTLR_EL1 register.
| */
| 	.pushsection ".idmap.text", "awx"
| ENTRY(__cpu_setup)
| 	tlbi	vmalle1				// Invalidate local TLB
| 	dsb	nsh

This is called from stext, which then branches to __primary_switch(), which
calls __enable_mmu() where you see this problem. It shouldn't not be possible to
allocate new tlb entries between these points...

Do you have CONFIG_RANDOMIZE_BASE disabled? This causes enable_mmu() to be
called twice, the extra tlb maintenance is in __primary_switch.
(if it works with this turned off, it points to the extra off/tlbi/on sequence).


> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 4471f570a295..5196f3d729de 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
>  	msr	ttbr0_el1, x2			// load TTBR0
>  	msr	ttbr1_el1, x1			// load TTBR1
>  	isb
> +	dsb	nshst
> +	tlbi	vmalle1				// invalidate TLB
> +	dsb	nsh
> +	isb
>  	msr	sctlr_el1, x0
>  	isb

The overall change here is that we do extra maintenance later.

Can move this around to bisect where the TLB entries are either coming from, or
failing-to-be invalidated?
Do your first and kdump kernels have the same VA_BITS/PAGE_SIZE?
As a stab in the dark, (totally untested):
------------------------------%<------------------------------
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 2c75b0b903ae..a5f3b7314bda 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -406,9 +406,6 @@ ENDPROC(idmap_kpti_install_ng_mappings)
  */
        .pushsection ".idmap.text", "awx"
 ENTRY(__cpu_setup)
-       tlbi    vmalle1                         // Invalidate local TLB
-       dsb     nsh
-
        mov     x0, #3 << 20
        msr     cpacr_el1, x0                   // Enable FP/ASIMD
        mov     x0, #1 << 12                    // Reset mdscr_el1 and disable
@@ -465,5 +462,10 @@ ENTRY(__cpu_setup)
 1:
 #endif /* CONFIG_ARM64_HW_AFDBM */
        msr     tcr_el1, x10
+       isb
+
+       tlbi    vmalle1                         // Invalidate local TLB
+       dsb     nsh
+
        ret                                     // return to head.S
 ENDPROC(__cpu_setup)
------------------------------%<------------------------------


Thanks,

James

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH] arm64: invalidate TLB before turning MMU on
@ 2018-12-13 10:44               ` James Morse
  0 siblings, 0 replies; 41+ messages in thread
From: James Morse @ 2018-12-13 10:44 UTC (permalink / raw)
  To: Qian Cai
  Cc: ard.biesheuvel, marc.zyngier, catalin.marinas, will.deacon,
	linux-kernel, takahiro.akashi, kexec, linux-arm-kernel

Hi Qian,

On 13/12/2018 05:22, Qian Cai wrote:
> On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
> dump just hung. It has 4 threads on each core. Each 2-core share a same
> L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
> L3 cache.
> 
> It turned out that this was due to the TLB contained stale entries (or
> uninitialized junk which just happened to look valid) from the first
> kernel before turning the MMU on in the second kernel which caused this
> instruction hung,

This is a great find, thanks for debugging this!

The kernel should already handle this, as we don't trust the bootloader to clean
up either.

In arch/arm64/mm/proc.S::__cpu_setup()
|/*
| *	__cpu_setup
| *
| *	Initialise the processor for turning the MMU on.  Return in x0 the
| *	value of the SCTLR_EL1 register.
| */
| 	.pushsection ".idmap.text", "awx"
| ENTRY(__cpu_setup)
| 	tlbi	vmalle1				// Invalidate local TLB
| 	dsb	nsh

This is called from stext, which then branches to __primary_switch(), which
calls __enable_mmu() where you see this problem. It shouldn't not be possible to
allocate new tlb entries between these points...

Do you have CONFIG_RANDOMIZE_BASE disabled? This causes enable_mmu() to be
called twice, the extra tlb maintenance is in __primary_switch.
(if it works with this turned off, it points to the extra off/tlbi/on sequence).


> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 4471f570a295..5196f3d729de 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
>  	msr	ttbr0_el1, x2			// load TTBR0
>  	msr	ttbr1_el1, x1			// load TTBR1
>  	isb
> +	dsb	nshst
> +	tlbi	vmalle1				// invalidate TLB
> +	dsb	nsh
> +	isb
>  	msr	sctlr_el1, x0
>  	isb

The overall change here is that we do extra maintenance later.

Can move this around to bisect where the TLB entries are either coming from, or
failing-to-be invalidated?
Do your first and kdump kernels have the same VA_BITS/PAGE_SIZE?
As a stab in the dark, (totally untested):
------------------------------%<------------------------------
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 2c75b0b903ae..a5f3b7314bda 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -406,9 +406,6 @@ ENDPROC(idmap_kpti_install_ng_mappings)
  */
        .pushsection ".idmap.text", "awx"
 ENTRY(__cpu_setup)
-       tlbi    vmalle1                         // Invalidate local TLB
-       dsb     nsh
-
        mov     x0, #3 << 20
        msr     cpacr_el1, x0                   // Enable FP/ASIMD
        mov     x0, #1 << 12                    // Reset mdscr_el1 and disable
@@ -465,5 +462,10 @@ ENTRY(__cpu_setup)
 1:
 #endif /* CONFIG_ARM64_HW_AFDBM */
        msr     tcr_el1, x10
+       isb
+
+       tlbi    vmalle1                         // Invalidate local TLB
+       dsb     nsh
+
        ret                                     // return to head.S
 ENDPROC(__cpu_setup)
------------------------------%<------------------------------


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH] arm64: invalidate TLB before turning MMU on
@ 2018-12-13 10:44               ` James Morse
  0 siblings, 0 replies; 41+ messages in thread
From: James Morse @ 2018-12-13 10:44 UTC (permalink / raw)
  To: Qian Cai
  Cc: ard.biesheuvel, marc.zyngier, catalin.marinas, will.deacon,
	linux-kernel, takahiro.akashi, kexec, linux-arm-kernel

Hi Qian,

On 13/12/2018 05:22, Qian Cai wrote:
> On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
> dump just hung. It has 4 threads on each core. Each 2-core share a same
> L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
> L3 cache.
> 
> It turned out that this was due to the TLB contained stale entries (or
> uninitialized junk which just happened to look valid) from the first
> kernel before turning the MMU on in the second kernel which caused this
> instruction hung,

This is a great find, thanks for debugging this!

The kernel should already handle this, as we don't trust the bootloader to clean
up either.

In arch/arm64/mm/proc.S::__cpu_setup()
|/*
| *	__cpu_setup
| *
| *	Initialise the processor for turning the MMU on.  Return in x0 the
| *	value of the SCTLR_EL1 register.
| */
| 	.pushsection ".idmap.text", "awx"
| ENTRY(__cpu_setup)
| 	tlbi	vmalle1				// Invalidate local TLB
| 	dsb	nsh

This is called from stext, which then branches to __primary_switch(), which
calls __enable_mmu() where you see this problem. It shouldn't not be possible to
allocate new tlb entries between these points...

Do you have CONFIG_RANDOMIZE_BASE disabled? This causes enable_mmu() to be
called twice, the extra tlb maintenance is in __primary_switch.
(if it works with this turned off, it points to the extra off/tlbi/on sequence).


> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 4471f570a295..5196f3d729de 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
>  	msr	ttbr0_el1, x2			// load TTBR0
>  	msr	ttbr1_el1, x1			// load TTBR1
>  	isb
> +	dsb	nshst
> +	tlbi	vmalle1				// invalidate TLB
> +	dsb	nsh
> +	isb
>  	msr	sctlr_el1, x0
>  	isb

The overall change here is that we do extra maintenance later.

Can move this around to bisect where the TLB entries are either coming from, or
failing-to-be invalidated?
Do your first and kdump kernels have the same VA_BITS/PAGE_SIZE?
As a stab in the dark, (totally untested):
------------------------------%<------------------------------
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 2c75b0b903ae..a5f3b7314bda 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -406,9 +406,6 @@ ENDPROC(idmap_kpti_install_ng_mappings)
  */
        .pushsection ".idmap.text", "awx"
 ENTRY(__cpu_setup)
-       tlbi    vmalle1                         // Invalidate local TLB
-       dsb     nsh
-
        mov     x0, #3 << 20
        msr     cpacr_el1, x0                   // Enable FP/ASIMD
        mov     x0, #1 << 12                    // Reset mdscr_el1 and disable
@@ -465,5 +462,10 @@ ENTRY(__cpu_setup)
 1:
 #endif /* CONFIG_ARM64_HW_AFDBM */
        msr     tcr_el1, x10
+       isb
+
+       tlbi    vmalle1                         // Invalidate local TLB
+       dsb     nsh
+
        ret                                     // return to head.S
 ENDPROC(__cpu_setup)
------------------------------%<------------------------------


Thanks,

James

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH] arm64: invalidate TLB before turning MMU on
  2018-12-13  5:40               ` Bhupesh Sharma
  (?)
@ 2018-12-13 13:39                 ` Qian Cai
  -1 siblings, 0 replies; 41+ messages in thread
From: Qian Cai @ 2018-12-13 13:39 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, Marc Zyngier,
	kexec mailing list, Linux Kernel Mailing List, AKASHI Takahiro,
	James Morse, linux-arm-kernel, Bhupesh SHARMA

On Thu, 2018-12-13 at 11:10 +0530, Bhupesh Sharma wrote:
> Hi Qian Cai,
> 
> On Thu, Dec 13, 2018 at 10:53 AM Qian Cai <cai@lca.pw> wrote:
> > 
> > On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
> > dump just hung. It has 4 threads on each core. Each 2-core share a same
> > L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
> > L3 cache.
> > 
> > It turned out that this was due to the TLB contained stale entries (or
> > uninitialized junk which just happened to look valid) from the first
> > kernel before turning the MMU on in the second kernel which caused this
> > instruction hung,
> > 
> > msr     sctlr_el1, x0
> > 
> > Signed-off-by: Qian Cai <cai@lca.pw>
> > ---
> >  arch/arm64/kernel/head.S | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> > index 4471f570a295..5196f3d729de 100644
> > --- a/arch/arm64/kernel/head.S
> > +++ b/arch/arm64/kernel/head.S
> > @@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
> >         msr     ttbr0_el1, x2                   // load TTBR0
> >         msr     ttbr1_el1, x1                   // load TTBR1
> >         isb
> > +       dsb     nshst
> > +       tlbi    vmalle1                         // invalidate TLB
> > +       dsb     nsh
> > +       isb
> 
> This will be executed both for the primary and kdump kernel, right? I
> don't think we really want to invalidate the TLB when booting the
> primary kernel.
> It would be too slow and considering that we need to minimize boot
> timings on embedded arm64 devices, I think it would not be a good
> idea.

Yes, it will be executed for the first kernel as well. As James mentioned, it
needs to be done to invalidate TLB that might be used by bootloader anyway.

> 
> >         msr     sctlr_el1, x0
> >         isb
> >         /*
> > --
> > 2.17.2 (Apple Git-113)
> > 
> 
> Also did you check this issue I reported on the HPE apollo machines
> some days back with the kdump kernel boot
> <https://www.spinics.net/lists/kexec/msg21750.html>.
> Can you please confirm that you are not facing the same issue (as I
> suspect from reading your earlier Bug Report) on the HPE apollo
> machine. Also adding 'earlycon' to the bootargs being passed to the
> kdump kernel you can see if you are able to atleast get some console
> output from the kdump kernel.

No, here did not encounter the problem you mentioned.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] arm64: invalidate TLB before turning MMU on
@ 2018-12-13 13:39                 ` Qian Cai
  0 siblings, 0 replies; 41+ messages in thread
From: Qian Cai @ 2018-12-13 13:39 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Ard Biesheuvel, Marc Zyngier, Catalin Marinas, Will Deacon,
	Linux Kernel Mailing List, AKASHI Takahiro, James Morse,
	Bhupesh SHARMA, kexec mailing list, linux-arm-kernel

On Thu, 2018-12-13 at 11:10 +0530, Bhupesh Sharma wrote:
> Hi Qian Cai,
> 
> On Thu, Dec 13, 2018 at 10:53 AM Qian Cai <cai@lca.pw> wrote:
> > 
> > On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
> > dump just hung. It has 4 threads on each core. Each 2-core share a same
> > L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
> > L3 cache.
> > 
> > It turned out that this was due to the TLB contained stale entries (or
> > uninitialized junk which just happened to look valid) from the first
> > kernel before turning the MMU on in the second kernel which caused this
> > instruction hung,
> > 
> > msr     sctlr_el1, x0
> > 
> > Signed-off-by: Qian Cai <cai@lca.pw>
> > ---
> >  arch/arm64/kernel/head.S | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> > index 4471f570a295..5196f3d729de 100644
> > --- a/arch/arm64/kernel/head.S
> > +++ b/arch/arm64/kernel/head.S
> > @@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
> >         msr     ttbr0_el1, x2                   // load TTBR0
> >         msr     ttbr1_el1, x1                   // load TTBR1
> >         isb
> > +       dsb     nshst
> > +       tlbi    vmalle1                         // invalidate TLB
> > +       dsb     nsh
> > +       isb
> 
> This will be executed both for the primary and kdump kernel, right? I
> don't think we really want to invalidate the TLB when booting the
> primary kernel.
> It would be too slow and considering that we need to minimize boot
> timings on embedded arm64 devices, I think it would not be a good
> idea.

Yes, it will be executed for the first kernel as well. As James mentioned, it
needs to be done to invalidate TLB that might be used by bootloader anyway.

> 
> >         msr     sctlr_el1, x0
> >         isb
> >         /*
> > --
> > 2.17.2 (Apple Git-113)
> > 
> 
> Also did you check this issue I reported on the HPE apollo machines
> some days back with the kdump kernel boot
> <https://www.spinics.net/lists/kexec/msg21750.html>.
> Can you please confirm that you are not facing the same issue (as I
> suspect from reading your earlier Bug Report) on the HPE apollo
> machine. Also adding 'earlycon' to the bootargs being passed to the
> kdump kernel you can see if you are able to atleast get some console
> output from the kdump kernel.

No, here did not encounter the problem you mentioned.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] arm64: invalidate TLB before turning MMU on
@ 2018-12-13 13:39                 ` Qian Cai
  0 siblings, 0 replies; 41+ messages in thread
From: Qian Cai @ 2018-12-13 13:39 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Ard Biesheuvel, Marc Zyngier, Catalin Marinas, Will Deacon,
	Linux Kernel Mailing List, AKASHI Takahiro, James Morse,
	Bhupesh SHARMA, kexec mailing list, linux-arm-kernel

On Thu, 2018-12-13 at 11:10 +0530, Bhupesh Sharma wrote:
> Hi Qian Cai,
> 
> On Thu, Dec 13, 2018 at 10:53 AM Qian Cai <cai@lca.pw> wrote:
> > 
> > On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
> > dump just hung. It has 4 threads on each core. Each 2-core share a same
> > L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
> > L3 cache.
> > 
> > It turned out that this was due to the TLB contained stale entries (or
> > uninitialized junk which just happened to look valid) from the first
> > kernel before turning the MMU on in the second kernel which caused this
> > instruction hung,
> > 
> > msr     sctlr_el1, x0
> > 
> > Signed-off-by: Qian Cai <cai@lca.pw>
> > ---
> >  arch/arm64/kernel/head.S | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> > index 4471f570a295..5196f3d729de 100644
> > --- a/arch/arm64/kernel/head.S
> > +++ b/arch/arm64/kernel/head.S
> > @@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
> >         msr     ttbr0_el1, x2                   // load TTBR0
> >         msr     ttbr1_el1, x1                   // load TTBR1
> >         isb
> > +       dsb     nshst
> > +       tlbi    vmalle1                         // invalidate TLB
> > +       dsb     nsh
> > +       isb
> 
> This will be executed both for the primary and kdump kernel, right? I
> don't think we really want to invalidate the TLB when booting the
> primary kernel.
> It would be too slow and considering that we need to minimize boot
> timings on embedded arm64 devices, I think it would not be a good
> idea.

Yes, it will be executed for the first kernel as well. As James mentioned, it
needs to be done to invalidate TLB that might be used by bootloader anyway.

> 
> >         msr     sctlr_el1, x0
> >         isb
> >         /*
> > --
> > 2.17.2 (Apple Git-113)
> > 
> 
> Also did you check this issue I reported on the HPE apollo machines
> some days back with the kdump kernel boot
> <https://www.spinics.net/lists/kexec/msg21750.html>.
> Can you please confirm that you are not facing the same issue (as I
> suspect from reading your earlier Bug Report) on the HPE apollo
> machine. Also adding 'earlycon' to the bootargs being passed to the
> kdump kernel you can see if you are able to atleast get some console
> output from the kdump kernel.

No, here did not encounter the problem you mentioned.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] arm64: invalidate TLB before turning MMU on
  2018-12-13 10:44               ` James Morse
  (?)
@ 2018-12-13 13:44                 ` Qian Cai
  -1 siblings, 0 replies; 41+ messages in thread
From: Qian Cai @ 2018-12-13 13:44 UTC (permalink / raw)
  To: James Morse
  Cc: catalin.marinas, will.deacon, marc.zyngier, takahiro.akashi,
	ard.biesheuvel, linux-arm-kernel, kexec, linux-kernel

On Thu, 2018-12-13 at 10:44 +0000, James Morse wrote:
> The kernel should already handle this, as we don't trust the bootloader to
> clean
> up either.
> 
> In arch/arm64/mm/proc.S::__cpu_setup()
> > /*
> > *	__cpu_setup
> > *
> > *	Initialise the processor for turning the MMU on.  Return in x0 the
> > *	value of the SCTLR_EL1 register.
> > */
> > 	.pushsection ".idmap.text", "awx"
> > ENTRY(__cpu_setup)
> > 	tlbi	vmalle1				// Invalidate local
> > TLB
> > 	dsb	nsh
> 
> This is called from stext, which then branches to __primary_switch(), which
> calls __enable_mmu() where you see this problem. It shouldn't not be possible
> to
> allocate new tlb entries between these points...
> 
> Do you have CONFIG_RANDOMIZE_BASE disabled? This causes enable_mmu() to be
> called twice, the extra tlb maintenance is in __primary_switch.
> (if it works with this turned off, it points to the extra off/tlbi/on
> sequence).

Yes, CONFIG_RANDOMIZE_BASE is NOT set.

> 
> 
> > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> > index 4471f570a295..5196f3d729de 100644
> > --- a/arch/arm64/kernel/head.S
> > +++ b/arch/arm64/kernel/head.S
> > @@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
> >  	msr	ttbr0_el1, x2			// load TTBR0
> >  	msr	ttbr1_el1, x1			// load TTBR1
> >  	isb
> > +	dsb	nshst
> > +	tlbi	vmalle1				// invalidate
> > TLB
> > +	dsb	nsh
> > +	isb
> >  	msr	sctlr_el1, x0
> >  	isb
> 
> The overall change here is that we do extra maintenance later.
> 
> Can move this around to bisect where the TLB entries are either coming from,
> or
> failing-to-be invalidated?
> Do your first and kdump kernels have the same VA_BITS/PAGE_SIZE?

Yes,

CONFIG_ARM64_VA_BITS=48
CONFIG_ARM64_PAGE_SHIFT=16
# CONFIG_ARM64_4K_PAGES is not set
# CONFIG_ARM64_16K_PAGES is not set
CONFIG_ARM64_64K_PAGES=y

> As a stab in the dark, (totally untested):
> ------------------------------%<------------------------------
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 2c75b0b903ae..a5f3b7314bda 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -406,9 +406,6 @@ ENDPROC(idmap_kpti_install_ng_mappings)
>   */
>         .pushsection ".idmap.text", "awx"
>  ENTRY(__cpu_setup)
> -       tlbi    vmalle1                         // Invalidate local TLB
> -       dsb     nsh
> -
>         mov     x0, #3 << 20
>         msr     cpacr_el1, x0                   // Enable FP/ASIMD
>         mov     x0, #1 << 12                    // Reset mdscr_el1 and disable
> @@ -465,5 +462,10 @@ ENTRY(__cpu_setup)
>  1:
>  #endif /* CONFIG_ARM64_HW_AFDBM */
>         msr     tcr_el1, x10
> +       isb
> +
> +       tlbi    vmalle1                         // Invalidate local TLB
> +       dsb     nsh
> +
>         ret                                     // return to head.S
>  ENDPROC(__cpu_setup)
> ------------------------------%<------------------------------
> 

This patch works well too.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] arm64: invalidate TLB before turning MMU on
@ 2018-12-13 13:44                 ` Qian Cai
  0 siblings, 0 replies; 41+ messages in thread
From: Qian Cai @ 2018-12-13 13:44 UTC (permalink / raw)
  To: James Morse
  Cc: ard.biesheuvel, marc.zyngier, catalin.marinas, will.deacon,
	linux-kernel, takahiro.akashi, kexec, linux-arm-kernel

On Thu, 2018-12-13 at 10:44 +0000, James Morse wrote:
> The kernel should already handle this, as we don't trust the bootloader to
> clean
> up either.
> 
> In arch/arm64/mm/proc.S::__cpu_setup()
> > /*
> > *	__cpu_setup
> > *
> > *	Initialise the processor for turning the MMU on.  Return in x0 the
> > *	value of the SCTLR_EL1 register.
> > */
> > 	.pushsection ".idmap.text", "awx"
> > ENTRY(__cpu_setup)
> > 	tlbi	vmalle1				// Invalidate local
> > TLB
> > 	dsb	nsh
> 
> This is called from stext, which then branches to __primary_switch(), which
> calls __enable_mmu() where you see this problem. It shouldn't not be possible
> to
> allocate new tlb entries between these points...
> 
> Do you have CONFIG_RANDOMIZE_BASE disabled? This causes enable_mmu() to be
> called twice, the extra tlb maintenance is in __primary_switch.
> (if it works with this turned off, it points to the extra off/tlbi/on
> sequence).

Yes, CONFIG_RANDOMIZE_BASE is NOT set.

> 
> 
> > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> > index 4471f570a295..5196f3d729de 100644
> > --- a/arch/arm64/kernel/head.S
> > +++ b/arch/arm64/kernel/head.S
> > @@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
> >  	msr	ttbr0_el1, x2			// load TTBR0
> >  	msr	ttbr1_el1, x1			// load TTBR1
> >  	isb
> > +	dsb	nshst
> > +	tlbi	vmalle1				// invalidate
> > TLB
> > +	dsb	nsh
> > +	isb
> >  	msr	sctlr_el1, x0
> >  	isb
> 
> The overall change here is that we do extra maintenance later.
> 
> Can move this around to bisect where the TLB entries are either coming from,
> or
> failing-to-be invalidated?
> Do your first and kdump kernels have the same VA_BITS/PAGE_SIZE?

Yes,

CONFIG_ARM64_VA_BITS=48
CONFIG_ARM64_PAGE_SHIFT=16
# CONFIG_ARM64_4K_PAGES is not set
# CONFIG_ARM64_16K_PAGES is not set
CONFIG_ARM64_64K_PAGES=y

> As a stab in the dark, (totally untested):
> ------------------------------%<------------------------------
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 2c75b0b903ae..a5f3b7314bda 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -406,9 +406,6 @@ ENDPROC(idmap_kpti_install_ng_mappings)
>   */
>         .pushsection ".idmap.text", "awx"
>  ENTRY(__cpu_setup)
> -       tlbi    vmalle1                         // Invalidate local TLB
> -       dsb     nsh
> -
>         mov     x0, #3 << 20
>         msr     cpacr_el1, x0                   // Enable FP/ASIMD
>         mov     x0, #1 << 12                    // Reset mdscr_el1 and disable
> @@ -465,5 +462,10 @@ ENTRY(__cpu_setup)
>  1:
>  #endif /* CONFIG_ARM64_HW_AFDBM */
>         msr     tcr_el1, x10
> +       isb
> +
> +       tlbi    vmalle1                         // Invalidate local TLB
> +       dsb     nsh
> +
>         ret                                     // return to head.S
>  ENDPROC(__cpu_setup)
> ------------------------------%<------------------------------
> 

This patch works well too.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] arm64: invalidate TLB before turning MMU on
@ 2018-12-13 13:44                 ` Qian Cai
  0 siblings, 0 replies; 41+ messages in thread
From: Qian Cai @ 2018-12-13 13:44 UTC (permalink / raw)
  To: James Morse
  Cc: ard.biesheuvel, marc.zyngier, catalin.marinas, will.deacon,
	linux-kernel, takahiro.akashi, kexec, linux-arm-kernel

On Thu, 2018-12-13 at 10:44 +0000, James Morse wrote:
> The kernel should already handle this, as we don't trust the bootloader to
> clean
> up either.
> 
> In arch/arm64/mm/proc.S::__cpu_setup()
> > /*
> > *	__cpu_setup
> > *
> > *	Initialise the processor for turning the MMU on.  Return in x0 the
> > *	value of the SCTLR_EL1 register.
> > */
> > 	.pushsection ".idmap.text", "awx"
> > ENTRY(__cpu_setup)
> > 	tlbi	vmalle1				// Invalidate local
> > TLB
> > 	dsb	nsh
> 
> This is called from stext, which then branches to __primary_switch(), which
> calls __enable_mmu() where you see this problem. It shouldn't not be possible
> to
> allocate new tlb entries between these points...
> 
> Do you have CONFIG_RANDOMIZE_BASE disabled? This causes enable_mmu() to be
> called twice, the extra tlb maintenance is in __primary_switch.
> (if it works with this turned off, it points to the extra off/tlbi/on
> sequence).

Yes, CONFIG_RANDOMIZE_BASE is NOT set.

> 
> 
> > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> > index 4471f570a295..5196f3d729de 100644
> > --- a/arch/arm64/kernel/head.S
> > +++ b/arch/arm64/kernel/head.S
> > @@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
> >  	msr	ttbr0_el1, x2			// load TTBR0
> >  	msr	ttbr1_el1, x1			// load TTBR1
> >  	isb
> > +	dsb	nshst
> > +	tlbi	vmalle1				// invalidate
> > TLB
> > +	dsb	nsh
> > +	isb
> >  	msr	sctlr_el1, x0
> >  	isb
> 
> The overall change here is that we do extra maintenance later.
> 
> Can move this around to bisect where the TLB entries are either coming from,
> or
> failing-to-be invalidated?
> Do your first and kdump kernels have the same VA_BITS/PAGE_SIZE?

Yes,

CONFIG_ARM64_VA_BITS=48
CONFIG_ARM64_PAGE_SHIFT=16
# CONFIG_ARM64_4K_PAGES is not set
# CONFIG_ARM64_16K_PAGES is not set
CONFIG_ARM64_64K_PAGES=y

> As a stab in the dark, (totally untested):
> ------------------------------%<------------------------------
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 2c75b0b903ae..a5f3b7314bda 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -406,9 +406,6 @@ ENDPROC(idmap_kpti_install_ng_mappings)
>   */
>         .pushsection ".idmap.text", "awx"
>  ENTRY(__cpu_setup)
> -       tlbi    vmalle1                         // Invalidate local TLB
> -       dsb     nsh
> -
>         mov     x0, #3 << 20
>         msr     cpacr_el1, x0                   // Enable FP/ASIMD
>         mov     x0, #1 << 12                    // Reset mdscr_el1 and disable
> @@ -465,5 +462,10 @@ ENTRY(__cpu_setup)
>  1:
>  #endif /* CONFIG_ARM64_HW_AFDBM */
>         msr     tcr_el1, x10
> +       isb
> +
> +       tlbi    vmalle1                         // Invalidate local TLB
> +       dsb     nsh
> +
>         ret                                     // return to head.S
>  ENDPROC(__cpu_setup)
> ------------------------------%<------------------------------
> 

This patch works well too.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2] arm64: invalidate TLB just before turning MMU on
  2018-12-13  5:22             ` Qian Cai
  (?)
@ 2018-12-14  4:08               ` Qian Cai
  -1 siblings, 0 replies; 41+ messages in thread
From: Qian Cai @ 2018-12-14  4:08 UTC (permalink / raw)
  To: catalin.marinas, will.deacon
  Cc: marc.zyngier, james.morse, takahiro.akashi, ard.biesheuvel,
	linux-arm-kernel, kexec, linux-kernel, Qian Cai

On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
dump just hung. It has 4 threads on each core. Each 2-core share a same
L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
L3 cache.

It turned out that this was due to the TLB contained stale entries (or
uninitialized junk which just happened to look valid) before turning the
MMU on in the second kernel which caused this instruction hung,

msr	sctlr_el1, x0

Although there is a local TLB flush in the second kernel in
__cpu_setup(), it is called too early. When the time to turn the MMU on
later, the TLB is dirty again from some reasons.

Also tried to move the local TLB flush part around a bit inside
__cpu_setup(), although it did complete kdump some times, it did trigger
"Synchronous Exception" in EFI after a cold-reboot fairly often that
seems no way to recover remotely without reinstalling the OS. For
example, in those places,

ENTRY(__cpu_setup)
+	isb
	tlbi	vmalle1
	dsb	nsh

or

	mov	x0, #3 << 20
	msr	cpacr_el1, x0
+	tlbi    vmalle1
+	dsb     nsh

Since it is only necessary to flush local TLB right before turning the
MMU on, just re-arrage the part a bit like the one in __primary_switch()
within CONFIG_RANDOMIZE_BASE path, so it does not depends on other
instructions in between that could pollute the TLB, and it no longer
trigger "Synchronous Exception" as well.

Signed-off-by: Qian Cai <cai@lca.pw>
---

v2: merge the similar part from __cpu_setup() pointed out by James.

 arch/arm64/kernel/head.S | 4 ++++
 arch/arm64/mm/proc.S     | 3 ---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 4471f570a295..7f555dd4577e 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
 	msr	ttbr0_el1, x2			// load TTBR0
 	msr	ttbr1_el1, x1			// load TTBR1
 	isb
+
+	tlbi	vmalle1				// invalidate TLB
+	dsb	nsh
+
 	msr	sctlr_el1, x0
 	isb
 	/*
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 2c75b0b903ae..14f68afdd57f 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -406,9 +406,6 @@ ENDPROC(idmap_kpti_install_ng_mappings)
  */
 	.pushsection ".idmap.text", "awx"
 ENTRY(__cpu_setup)
-	tlbi	vmalle1				// Invalidate local TLB
-	dsb	nsh
-
 	mov	x0, #3 << 20
 	msr	cpacr_el1, x0			// Enable FP/ASIMD
 	mov	x0, #1 << 12			// Reset mdscr_el1 and disable
-- 
2.17.2 (Apple Git-113)


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2] arm64: invalidate TLB just before turning MMU on
@ 2018-12-14  4:08               ` Qian Cai
  0 siblings, 0 replies; 41+ messages in thread
From: Qian Cai @ 2018-12-14  4:08 UTC (permalink / raw)
  To: catalin.marinas, will.deacon
  Cc: ard.biesheuvel, marc.zyngier, kexec, linux-kernel,
	takahiro.akashi, james.morse, Qian Cai, linux-arm-kernel

On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
dump just hung. It has 4 threads on each core. Each 2-core share a same
L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
L3 cache.

It turned out that this was due to the TLB contained stale entries (or
uninitialized junk which just happened to look valid) before turning the
MMU on in the second kernel which caused this instruction hung,

msr	sctlr_el1, x0

Although there is a local TLB flush in the second kernel in
__cpu_setup(), it is called too early. When the time to turn the MMU on
later, the TLB is dirty again from some reasons.

Also tried to move the local TLB flush part around a bit inside
__cpu_setup(), although it did complete kdump some times, it did trigger
"Synchronous Exception" in EFI after a cold-reboot fairly often that
seems no way to recover remotely without reinstalling the OS. For
example, in those places,

ENTRY(__cpu_setup)
+	isb
	tlbi	vmalle1
	dsb	nsh

or

	mov	x0, #3 << 20
	msr	cpacr_el1, x0
+	tlbi    vmalle1
+	dsb     nsh

Since it is only necessary to flush local TLB right before turning the
MMU on, just re-arrage the part a bit like the one in __primary_switch()
within CONFIG_RANDOMIZE_BASE path, so it does not depends on other
instructions in between that could pollute the TLB, and it no longer
trigger "Synchronous Exception" as well.

Signed-off-by: Qian Cai <cai@lca.pw>
---

v2: merge the similar part from __cpu_setup() pointed out by James.

 arch/arm64/kernel/head.S | 4 ++++
 arch/arm64/mm/proc.S     | 3 ---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 4471f570a295..7f555dd4577e 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
 	msr	ttbr0_el1, x2			// load TTBR0
 	msr	ttbr1_el1, x1			// load TTBR1
 	isb
+
+	tlbi	vmalle1				// invalidate TLB
+	dsb	nsh
+
 	msr	sctlr_el1, x0
 	isb
 	/*
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 2c75b0b903ae..14f68afdd57f 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -406,9 +406,6 @@ ENDPROC(idmap_kpti_install_ng_mappings)
  */
 	.pushsection ".idmap.text", "awx"
 ENTRY(__cpu_setup)
-	tlbi	vmalle1				// Invalidate local TLB
-	dsb	nsh
-
 	mov	x0, #3 << 20
 	msr	cpacr_el1, x0			// Enable FP/ASIMD
 	mov	x0, #1 << 12			// Reset mdscr_el1 and disable
-- 
2.17.2 (Apple Git-113)


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2] arm64: invalidate TLB just before turning MMU on
@ 2018-12-14  4:08               ` Qian Cai
  0 siblings, 0 replies; 41+ messages in thread
From: Qian Cai @ 2018-12-14  4:08 UTC (permalink / raw)
  To: catalin.marinas, will.deacon
  Cc: ard.biesheuvel, marc.zyngier, kexec, linux-kernel,
	takahiro.akashi, james.morse, Qian Cai, linux-arm-kernel

On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
dump just hung. It has 4 threads on each core. Each 2-core share a same
L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
L3 cache.

It turned out that this was due to the TLB contained stale entries (or
uninitialized junk which just happened to look valid) before turning the
MMU on in the second kernel which caused this instruction hung,

msr	sctlr_el1, x0

Although there is a local TLB flush in the second kernel in
__cpu_setup(), it is called too early. When the time to turn the MMU on
later, the TLB is dirty again from some reasons.

Also tried to move the local TLB flush part around a bit inside
__cpu_setup(), although it did complete kdump some times, it did trigger
"Synchronous Exception" in EFI after a cold-reboot fairly often that
seems no way to recover remotely without reinstalling the OS. For
example, in those places,

ENTRY(__cpu_setup)
+	isb
	tlbi	vmalle1
	dsb	nsh

or

	mov	x0, #3 << 20
	msr	cpacr_el1, x0
+	tlbi    vmalle1
+	dsb     nsh

Since it is only necessary to flush local TLB right before turning the
MMU on, just re-arrage the part a bit like the one in __primary_switch()
within CONFIG_RANDOMIZE_BASE path, so it does not depends on other
instructions in between that could pollute the TLB, and it no longer
trigger "Synchronous Exception" as well.

Signed-off-by: Qian Cai <cai@lca.pw>
---

v2: merge the similar part from __cpu_setup() pointed out by James.

 arch/arm64/kernel/head.S | 4 ++++
 arch/arm64/mm/proc.S     | 3 ---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 4471f570a295..7f555dd4577e 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
 	msr	ttbr0_el1, x2			// load TTBR0
 	msr	ttbr1_el1, x1			// load TTBR1
 	isb
+
+	tlbi	vmalle1				// invalidate TLB
+	dsb	nsh
+
 	msr	sctlr_el1, x0
 	isb
 	/*
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 2c75b0b903ae..14f68afdd57f 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -406,9 +406,6 @@ ENDPROC(idmap_kpti_install_ng_mappings)
  */
 	.pushsection ".idmap.text", "awx"
 ENTRY(__cpu_setup)
-	tlbi	vmalle1				// Invalidate local TLB
-	dsb	nsh
-
 	mov	x0, #3 << 20
 	msr	cpacr_el1, x0			// Enable FP/ASIMD
 	mov	x0, #1 << 12			// Reset mdscr_el1 and disable
-- 
2.17.2 (Apple Git-113)


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2] arm64: invalidate TLB just before turning MMU on
  2018-12-14  4:08               ` Qian Cai
  (?)
@ 2018-12-14  5:01                 ` Bhupesh Sharma
  -1 siblings, 0 replies; 41+ messages in thread
From: Bhupesh Sharma @ 2018-12-14  5:01 UTC (permalink / raw)
  To: cai
  Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, Marc Zyngier,
	kexec mailing list, Linux Kernel Mailing List, AKASHI Takahiro,
	James Morse, linux-arm-kernel

On Fri, Dec 14, 2018 at 9:39 AM Qian Cai <cai@lca.pw> wrote:
>
> On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
> dump just hung. It has 4 threads on each core. Each 2-core share a same
> L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
> L3 cache.
>
> It turned out that this was due to the TLB contained stale entries (or
> uninitialized junk which just happened to look valid) before turning the
> MMU on in the second kernel which caused this instruction hung,
>
> msr     sctlr_el1, x0
>
> Although there is a local TLB flush in the second kernel in
> __cpu_setup(), it is called too early. When the time to turn the MMU on
> later, the TLB is dirty again from some reasons.
>
> Also tried to move the local TLB flush part around a bit inside
> __cpu_setup(), although it did complete kdump some times, it did trigger
> "Synchronous Exception" in EFI after a cold-reboot fairly often that
> seems no way to recover remotely without reinstalling the OS. For
> example, in those places,
>
> ENTRY(__cpu_setup)
> +       isb
>         tlbi    vmalle1
>         dsb     nsh
>
> or
>
>         mov     x0, #3 << 20
>         msr     cpacr_el1, x0
> +       tlbi    vmalle1
> +       dsb     nsh
>
> Since it is only necessary to flush local TLB right before turning the
> MMU on, just re-arrage the part a bit like the one in __primary_switch()
> within CONFIG_RANDOMIZE_BASE path, so it does not depends on other
> instructions in between that could pollute the TLB, and it no longer
> trigger "Synchronous Exception" as well.
>
> Signed-off-by: Qian Cai <cai@lca.pw>
> ---
>
> v2: merge the similar part from __cpu_setup() pointed out by James.
>
>  arch/arm64/kernel/head.S | 4 ++++
>  arch/arm64/mm/proc.S     | 3 ---
>  2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 4471f570a295..7f555dd4577e 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
>         msr     ttbr0_el1, x2                   // load TTBR0
>         msr     ttbr1_el1, x1                   // load TTBR1
>         isb
> +
> +       tlbi    vmalle1                         // invalidate TLB
> +       dsb     nsh
> +
>         msr     sctlr_el1, x0
>         isb
>         /*
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 2c75b0b903ae..14f68afdd57f 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -406,9 +406,6 @@ ENDPROC(idmap_kpti_install_ng_mappings)
>   */
>         .pushsection ".idmap.text", "awx"
>  ENTRY(__cpu_setup)
> -       tlbi    vmalle1                         // Invalidate local TLB
> -       dsb     nsh
> -
>         mov     x0, #3 << 20
>         msr     cpacr_el1, x0                   // Enable FP/ASIMD
>         mov     x0, #1 << 12                    // Reset mdscr_el1 and disable
> --
> 2.17.2 (Apple Git-113)
>

Not sure why I can't reproduce on my HPE Apollo machine, so a couple
of questions:
1. How many CPUs do you enable in the kdump kernel - do you pass
'nr_cpus=1' to the kdump kernel to limit the maximum number of cores
to 1 in the kdump kernel?
2. Which firmware version do you use on your board?

Thanks,
Bhupesh

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2] arm64: invalidate TLB just before turning MMU on
@ 2018-12-14  5:01                 ` Bhupesh Sharma
  0 siblings, 0 replies; 41+ messages in thread
From: Bhupesh Sharma @ 2018-12-14  5:01 UTC (permalink / raw)
  To: cai
  Cc: Ard Biesheuvel, Marc Zyngier, Catalin Marinas, Will Deacon,
	Linux Kernel Mailing List, AKASHI Takahiro, James Morse,
	kexec mailing list, linux-arm-kernel

On Fri, Dec 14, 2018 at 9:39 AM Qian Cai <cai@lca.pw> wrote:
>
> On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
> dump just hung. It has 4 threads on each core. Each 2-core share a same
> L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
> L3 cache.
>
> It turned out that this was due to the TLB contained stale entries (or
> uninitialized junk which just happened to look valid) before turning the
> MMU on in the second kernel which caused this instruction hung,
>
> msr     sctlr_el1, x0
>
> Although there is a local TLB flush in the second kernel in
> __cpu_setup(), it is called too early. When the time to turn the MMU on
> later, the TLB is dirty again from some reasons.
>
> Also tried to move the local TLB flush part around a bit inside
> __cpu_setup(), although it did complete kdump some times, it did trigger
> "Synchronous Exception" in EFI after a cold-reboot fairly often that
> seems no way to recover remotely without reinstalling the OS. For
> example, in those places,
>
> ENTRY(__cpu_setup)
> +       isb
>         tlbi    vmalle1
>         dsb     nsh
>
> or
>
>         mov     x0, #3 << 20
>         msr     cpacr_el1, x0
> +       tlbi    vmalle1
> +       dsb     nsh
>
> Since it is only necessary to flush local TLB right before turning the
> MMU on, just re-arrage the part a bit like the one in __primary_switch()
> within CONFIG_RANDOMIZE_BASE path, so it does not depends on other
> instructions in between that could pollute the TLB, and it no longer
> trigger "Synchronous Exception" as well.
>
> Signed-off-by: Qian Cai <cai@lca.pw>
> ---
>
> v2: merge the similar part from __cpu_setup() pointed out by James.
>
>  arch/arm64/kernel/head.S | 4 ++++
>  arch/arm64/mm/proc.S     | 3 ---
>  2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 4471f570a295..7f555dd4577e 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
>         msr     ttbr0_el1, x2                   // load TTBR0
>         msr     ttbr1_el1, x1                   // load TTBR1
>         isb
> +
> +       tlbi    vmalle1                         // invalidate TLB
> +       dsb     nsh
> +
>         msr     sctlr_el1, x0
>         isb
>         /*
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 2c75b0b903ae..14f68afdd57f 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -406,9 +406,6 @@ ENDPROC(idmap_kpti_install_ng_mappings)
>   */
>         .pushsection ".idmap.text", "awx"
>  ENTRY(__cpu_setup)
> -       tlbi    vmalle1                         // Invalidate local TLB
> -       dsb     nsh
> -
>         mov     x0, #3 << 20
>         msr     cpacr_el1, x0                   // Enable FP/ASIMD
>         mov     x0, #1 << 12                    // Reset mdscr_el1 and disable
> --
> 2.17.2 (Apple Git-113)
>

Not sure why I can't reproduce on my HPE Apollo machine, so a couple
of questions:
1. How many CPUs do you enable in the kdump kernel - do you pass
'nr_cpus=1' to the kdump kernel to limit the maximum number of cores
to 1 in the kdump kernel?
2. Which firmware version do you use on your board?

Thanks,
Bhupesh

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2] arm64: invalidate TLB just before turning MMU on
@ 2018-12-14  5:01                 ` Bhupesh Sharma
  0 siblings, 0 replies; 41+ messages in thread
From: Bhupesh Sharma @ 2018-12-14  5:01 UTC (permalink / raw)
  To: cai
  Cc: Ard Biesheuvel, Marc Zyngier, Catalin Marinas, Will Deacon,
	Linux Kernel Mailing List, AKASHI Takahiro, James Morse,
	kexec mailing list, linux-arm-kernel

On Fri, Dec 14, 2018 at 9:39 AM Qian Cai <cai@lca.pw> wrote:
>
> On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
> dump just hung. It has 4 threads on each core. Each 2-core share a same
> L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
> L3 cache.
>
> It turned out that this was due to the TLB contained stale entries (or
> uninitialized junk which just happened to look valid) before turning the
> MMU on in the second kernel which caused this instruction hung,
>
> msr     sctlr_el1, x0
>
> Although there is a local TLB flush in the second kernel in
> __cpu_setup(), it is called too early. When the time to turn the MMU on
> later, the TLB is dirty again from some reasons.
>
> Also tried to move the local TLB flush part around a bit inside
> __cpu_setup(), although it did complete kdump some times, it did trigger
> "Synchronous Exception" in EFI after a cold-reboot fairly often that
> seems no way to recover remotely without reinstalling the OS. For
> example, in those places,
>
> ENTRY(__cpu_setup)
> +       isb
>         tlbi    vmalle1
>         dsb     nsh
>
> or
>
>         mov     x0, #3 << 20
>         msr     cpacr_el1, x0
> +       tlbi    vmalle1
> +       dsb     nsh
>
> Since it is only necessary to flush local TLB right before turning the
> MMU on, just re-arrage the part a bit like the one in __primary_switch()
> within CONFIG_RANDOMIZE_BASE path, so it does not depends on other
> instructions in between that could pollute the TLB, and it no longer
> trigger "Synchronous Exception" as well.
>
> Signed-off-by: Qian Cai <cai@lca.pw>
> ---
>
> v2: merge the similar part from __cpu_setup() pointed out by James.
>
>  arch/arm64/kernel/head.S | 4 ++++
>  arch/arm64/mm/proc.S     | 3 ---
>  2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 4471f570a295..7f555dd4577e 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
>         msr     ttbr0_el1, x2                   // load TTBR0
>         msr     ttbr1_el1, x1                   // load TTBR1
>         isb
> +
> +       tlbi    vmalle1                         // invalidate TLB
> +       dsb     nsh
> +
>         msr     sctlr_el1, x0
>         isb
>         /*
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 2c75b0b903ae..14f68afdd57f 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -406,9 +406,6 @@ ENDPROC(idmap_kpti_install_ng_mappings)
>   */
>         .pushsection ".idmap.text", "awx"
>  ENTRY(__cpu_setup)
> -       tlbi    vmalle1                         // Invalidate local TLB
> -       dsb     nsh
> -
>         mov     x0, #3 << 20
>         msr     cpacr_el1, x0                   // Enable FP/ASIMD
>         mov     x0, #1 << 12                    // Reset mdscr_el1 and disable
> --
> 2.17.2 (Apple Git-113)
>

Not sure why I can't reproduce on my HPE Apollo machine, so a couple
of questions:
1. How many CPUs do you enable in the kdump kernel - do you pass
'nr_cpus=1' to the kdump kernel to limit the maximum number of cores
to 1 in the kdump kernel?
2. Which firmware version do you use on your board?

Thanks,
Bhupesh

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2] arm64: invalidate TLB just before turning MMU on
  2018-12-14  4:08               ` Qian Cai
  (?)
@ 2018-12-14  7:23                 ` Ard Biesheuvel
  -1 siblings, 0 replies; 41+ messages in thread
From: Ard Biesheuvel @ 2018-12-14  7:23 UTC (permalink / raw)
  To: Qian Cai
  Cc: Catalin Marinas, Will Deacon, Marc Zyngier, James Morse,
	AKASHI Takahiro, linux-arm-kernel, Kexec Mailing List,
	Linux Kernel Mailing List

On Fri, 14 Dec 2018 at 05:08, Qian Cai <cai@lca.pw> wrote:
> Also tried to move the local TLB flush part around a bit inside
> __cpu_setup(), although it did complete kdump some times, it did trigger
> "Synchronous Exception" in EFI after a cold-reboot fairly often that
> seems no way to recover remotely without reinstalling the OS.

This doesn't make any sense to me. If the system gets into a weird
state out of cold reboot, how could this code be the culprit? Please
check your firmware, and try to reproduce the issue on a system that
doesn't have such defects.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2] arm64: invalidate TLB just before turning MMU on
@ 2018-12-14  7:23                 ` Ard Biesheuvel
  0 siblings, 0 replies; 41+ messages in thread
From: Ard Biesheuvel @ 2018-12-14  7:23 UTC (permalink / raw)
  To: Qian Cai
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon,
	Linux Kernel Mailing List, AKASHI Takahiro, James Morse,
	Kexec Mailing List, linux-arm-kernel

On Fri, 14 Dec 2018 at 05:08, Qian Cai <cai@lca.pw> wrote:
> Also tried to move the local TLB flush part around a bit inside
> __cpu_setup(), although it did complete kdump some times, it did trigger
> "Synchronous Exception" in EFI after a cold-reboot fairly often that
> seems no way to recover remotely without reinstalling the OS.

This doesn't make any sense to me. If the system gets into a weird
state out of cold reboot, how could this code be the culprit? Please
check your firmware, and try to reproduce the issue on a system that
doesn't have such defects.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2] arm64: invalidate TLB just before turning MMU on
@ 2018-12-14  7:23                 ` Ard Biesheuvel
  0 siblings, 0 replies; 41+ messages in thread
From: Ard Biesheuvel @ 2018-12-14  7:23 UTC (permalink / raw)
  To: Qian Cai
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon,
	Linux Kernel Mailing List, AKASHI Takahiro, James Morse,
	Kexec Mailing List, linux-arm-kernel

On Fri, 14 Dec 2018 at 05:08, Qian Cai <cai@lca.pw> wrote:
> Also tried to move the local TLB flush part around a bit inside
> __cpu_setup(), although it did complete kdump some times, it did trigger
> "Synchronous Exception" in EFI after a cold-reboot fairly often that
> seems no way to recover remotely without reinstalling the OS.

This doesn't make any sense to me. If the system gets into a weird
state out of cold reboot, how could this code be the culprit? Please
check your firmware, and try to reproduce the issue on a system that
doesn't have such defects.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2] arm64: invalidate TLB just before turning MMU on
  2018-12-14  5:01                 ` Bhupesh Sharma
  (?)
@ 2018-12-14 12:54                   ` Qian Cai
  -1 siblings, 0 replies; 41+ messages in thread
From: Qian Cai @ 2018-12-14 12:54 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, Marc Zyngier,
	kexec mailing list, Linux Kernel Mailing List, AKASHI Takahiro,
	James Morse, linux-arm-kernel

On 12/14/18 12:01 AM, Bhupesh Sharma wrote:
> Not sure why I can't reproduce on my HPE Apollo machine, so a couple
> of questions:
> 1. How many CPUs do you enable in the kdump kernel - do you pass
> 'nr_cpus=1' to the kdump kernel to limit the maximum number of cores
> to 1 in the kdump kernel?

Yes

> 2. Which firmware version do you use on your board?

Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
        Vendor: American Megatrends Inc.
        Version: L50_5.13_1.0.6
        Release Date: 07/10/2018
        Address: 0xF0000
        Runtime Size: 64 kB
        ROM Size: 64 MB
        Characteristics:
                PCI is supported
                BIOS is upgradeable
                BIOS shadowing is allowed
                Boot from CD is supported
                Selectable boot is supported
                BIOS ROM is socketed
                ACPI is supported
                BIOS boot specification is supported
                Targeted content distribution is supported
                UEFI is supported
        BIOS Revision: 6.3

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2] arm64: invalidate TLB just before turning MMU on
@ 2018-12-14 12:54                   ` Qian Cai
  0 siblings, 0 replies; 41+ messages in thread
From: Qian Cai @ 2018-12-14 12:54 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Ard Biesheuvel, Marc Zyngier, Catalin Marinas, Will Deacon,
	Linux Kernel Mailing List, AKASHI Takahiro, James Morse,
	kexec mailing list, linux-arm-kernel

On 12/14/18 12:01 AM, Bhupesh Sharma wrote:
> Not sure why I can't reproduce on my HPE Apollo machine, so a couple
> of questions:
> 1. How many CPUs do you enable in the kdump kernel - do you pass
> 'nr_cpus=1' to the kdump kernel to limit the maximum number of cores
> to 1 in the kdump kernel?

Yes

> 2. Which firmware version do you use on your board?

Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
        Vendor: American Megatrends Inc.
        Version: L50_5.13_1.0.6
        Release Date: 07/10/2018
        Address: 0xF0000
        Runtime Size: 64 kB
        ROM Size: 64 MB
        Characteristics:
                PCI is supported
                BIOS is upgradeable
                BIOS shadowing is allowed
                Boot from CD is supported
                Selectable boot is supported
                BIOS ROM is socketed
                ACPI is supported
                BIOS boot specification is supported
                Targeted content distribution is supported
                UEFI is supported
        BIOS Revision: 6.3

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2] arm64: invalidate TLB just before turning MMU on
@ 2018-12-14 12:54                   ` Qian Cai
  0 siblings, 0 replies; 41+ messages in thread
From: Qian Cai @ 2018-12-14 12:54 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Ard Biesheuvel, Marc Zyngier, Catalin Marinas, Will Deacon,
	Linux Kernel Mailing List, AKASHI Takahiro, James Morse,
	kexec mailing list, linux-arm-kernel

On 12/14/18 12:01 AM, Bhupesh Sharma wrote:
> Not sure why I can't reproduce on my HPE Apollo machine, so a couple
> of questions:
> 1. How many CPUs do you enable in the kdump kernel - do you pass
> 'nr_cpus=1' to the kdump kernel to limit the maximum number of cores
> to 1 in the kdump kernel?

Yes

> 2. Which firmware version do you use on your board?

Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
        Vendor: American Megatrends Inc.
        Version: L50_5.13_1.0.6
        Release Date: 07/10/2018
        Address: 0xF0000
        Runtime Size: 64 kB
        ROM Size: 64 MB
        Characteristics:
                PCI is supported
                BIOS is upgradeable
                BIOS shadowing is allowed
                Boot from CD is supported
                Selectable boot is supported
                BIOS ROM is socketed
                ACPI is supported
                BIOS boot specification is supported
                Targeted content distribution is supported
                UEFI is supported
        BIOS Revision: 6.3

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2] arm64: invalidate TLB just before turning MMU on
  2018-12-14  7:23                 ` Ard Biesheuvel
  (?)
@ 2018-12-15  1:53                   ` Qian Cai
  -1 siblings, 0 replies; 41+ messages in thread
From: Qian Cai @ 2018-12-15  1:53 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Catalin Marinas, Will Deacon, Marc Zyngier, James Morse,
	AKASHI Takahiro, linux-arm-kernel, Kexec Mailing List,
	Linux Kernel Mailing List

On 12/14/18 2:23 AM, Ard Biesheuvel wrote:
> On Fri, 14 Dec 2018 at 05:08, Qian Cai <cai@lca.pw> wrote:
>> Also tried to move the local TLB flush part around a bit inside
>> __cpu_setup(), although it did complete kdump some times, it did trigger
>> "Synchronous Exception" in EFI after a cold-reboot fairly often that
>> seems no way to recover remotely without reinstalling the OS.
> 
> This doesn't make any sense to me. If the system gets into a weird
> state out of cold reboot, how could this code be the culprit? Please
> check your firmware, and try to reproduce the issue on a system that
> doesn't have such defects.
> 

I'll continue investigating those "Synchronous Exception" although it is kind of
hard due to I don't have any source code of the firmware to confirm it is buggy
or not.

I did manage to reproduce this kdump issue on around 5 of those server running a
fairly recent version of the firmware (07/01/2018). I don't have access to other
large CPU machines.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2] arm64: invalidate TLB just before turning MMU on
@ 2018-12-15  1:53                   ` Qian Cai
  0 siblings, 0 replies; 41+ messages in thread
From: Qian Cai @ 2018-12-15  1:53 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon,
	Linux Kernel Mailing List, AKASHI Takahiro, James Morse,
	Kexec Mailing List, linux-arm-kernel

On 12/14/18 2:23 AM, Ard Biesheuvel wrote:
> On Fri, 14 Dec 2018 at 05:08, Qian Cai <cai@lca.pw> wrote:
>> Also tried to move the local TLB flush part around a bit inside
>> __cpu_setup(), although it did complete kdump some times, it did trigger
>> "Synchronous Exception" in EFI after a cold-reboot fairly often that
>> seems no way to recover remotely without reinstalling the OS.
> 
> This doesn't make any sense to me. If the system gets into a weird
> state out of cold reboot, how could this code be the culprit? Please
> check your firmware, and try to reproduce the issue on a system that
> doesn't have such defects.
> 

I'll continue investigating those "Synchronous Exception" although it is kind of
hard due to I don't have any source code of the firmware to confirm it is buggy
or not.

I did manage to reproduce this kdump issue on around 5 of those server running a
fairly recent version of the firmware (07/01/2018). I don't have access to other
large CPU machines.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2] arm64: invalidate TLB just before turning MMU on
@ 2018-12-15  1:53                   ` Qian Cai
  0 siblings, 0 replies; 41+ messages in thread
From: Qian Cai @ 2018-12-15  1:53 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon,
	Linux Kernel Mailing List, AKASHI Takahiro, James Morse,
	Kexec Mailing List, linux-arm-kernel

On 12/14/18 2:23 AM, Ard Biesheuvel wrote:
> On Fri, 14 Dec 2018 at 05:08, Qian Cai <cai@lca.pw> wrote:
>> Also tried to move the local TLB flush part around a bit inside
>> __cpu_setup(), although it did complete kdump some times, it did trigger
>> "Synchronous Exception" in EFI after a cold-reboot fairly often that
>> seems no way to recover remotely without reinstalling the OS.
> 
> This doesn't make any sense to me. If the system gets into a weird
> state out of cold reboot, how could this code be the culprit? Please
> check your firmware, and try to reproduce the issue on a system that
> doesn't have such defects.
> 

I'll continue investigating those "Synchronous Exception" although it is kind of
hard due to I don't have any source code of the firmware to confirm it is buggy
or not.

I did manage to reproduce this kdump issue on around 5 of those server running a
fairly recent version of the firmware (07/01/2018). I don't have access to other
large CPU machines.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2] arm64: invalidate TLB just before turning MMU on
  2018-12-15  1:53                   ` Qian Cai
  (?)
@ 2019-01-10 20:00                     ` Bhupesh Sharma
  -1 siblings, 0 replies; 41+ messages in thread
From: Bhupesh Sharma @ 2019-01-10 20:00 UTC (permalink / raw)
  To: Qian Cai
  Cc: Ard Biesheuvel, Marc Zyngier, Catalin Marinas, Will Deacon,
	Linux Kernel Mailing List, AKASHI Takahiro, James Morse,
	Kexec Mailing List, linux-arm-kernel

Hi Qian,

On Sat, Dec 15, 2018 at 7:24 AM Qian Cai <cai@lca.pw> wrote:
>
> On 12/14/18 2:23 AM, Ard Biesheuvel wrote:
> > On Fri, 14 Dec 2018 at 05:08, Qian Cai <cai@lca.pw> wrote:
> >> Also tried to move the local TLB flush part around a bit inside
> >> __cpu_setup(), although it did complete kdump some times, it did trigger
> >> "Synchronous Exception" in EFI after a cold-reboot fairly often that
> >> seems no way to recover remotely without reinstalling the OS.
> >
> > This doesn't make any sense to me. If the system gets into a weird
> > state out of cold reboot, how could this code be the culprit? Please
> > check your firmware, and try to reproduce the issue on a system that
> > doesn't have such defects.
> >
>
> I'll continue investigating those "Synchronous Exception" although it is kind of
> hard due to I don't have any source code of the firmware to confirm it is buggy
> or not.
>
> I did manage to reproduce this kdump issue on around 5 of those server running a
> fairly recent version of the firmware (07/01/2018). I don't have access to other
> large CPU machines.

Sorry I got busy with some other stuff, but as I reported earlier, I
am not able to reproduce this on my HPE apollo with the latest linus
tree as well.
Here are some details on my setup:

1. # uname -r
5.0.0-rc1+

with the following commit as the HEAD:
commit a88cc8da0279f8e481b0d90e51a0a1cffac55906 (HEAD -> master,
origin/master, origin/HEAD)
Merge: 9cb2feb4d21d 73444bc4d8f9
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Tue Jan 8 18:58:29 2019 -0800

    Merge branch 'akpm' (patches from Andrew)

2. I use the following kdump commandline:
Kernel command line: BOOT_IMAGE=(hd9,gpt2)/vmlinuz-5.0.0-rc1+ ro
irqpoll nr_cpus=1 swiotlb=noforce reset_devices
earlycon=pl011,mmio,0x402020000

3. I am able to run kdump successfully on the machine and also collect
the crash core properly:

.. snip..
kdump: saving to /sysroot//var/crash/127.0.0.1-2019-01-10-10:52:25/
kdump: saving vmcore-dmesg.txt
kdump: saving vmcore-dmesg.txt complete
kdump: saving vmcore
Copying data                                      : [100.0 %] \
   eta: 0s
kdump: saving vmcore complete
.. snip ..

4. I use the same firmware version on the board as you shared earlier:
# dmidecode | grep -A 20 -i "BIOS Information"
BIOS Information
    Vendor: American Megatrends Inc.
    Version: L50_5.13_1.0.6
    Release Date: 07/10/2018
    Address: 0xF0000
    Runtime Size: 64 kB
    ROM Size: 64 MB
    Characteristics:
        PCI is supported
        BIOS is upgradeable
        BIOS shadowing is allowed
        Boot from CD is supported
        Selectable boot is supported
        BIOS ROM is socketed
        ACPI is supported
        BIOS boot specification is supported
        Targeted content distribution is supported
        UEFI is supported
    BIOS Revision: 6.3

So, I am guessing that it might be a kdump command line issue at your end.

Thanks,
Bhupesh

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2] arm64: invalidate TLB just before turning MMU on
@ 2019-01-10 20:00                     ` Bhupesh Sharma
  0 siblings, 0 replies; 41+ messages in thread
From: Bhupesh Sharma @ 2019-01-10 20:00 UTC (permalink / raw)
  To: Qian Cai
  Cc: Ard Biesheuvel, Marc Zyngier, Catalin Marinas, Will Deacon,
	Linux Kernel Mailing List, AKASHI Takahiro, James Morse,
	Kexec Mailing List, linux-arm-kernel

Hi Qian,

On Sat, Dec 15, 2018 at 7:24 AM Qian Cai <cai@lca.pw> wrote:
>
> On 12/14/18 2:23 AM, Ard Biesheuvel wrote:
> > On Fri, 14 Dec 2018 at 05:08, Qian Cai <cai@lca.pw> wrote:
> >> Also tried to move the local TLB flush part around a bit inside
> >> __cpu_setup(), although it did complete kdump some times, it did trigger
> >> "Synchronous Exception" in EFI after a cold-reboot fairly often that
> >> seems no way to recover remotely without reinstalling the OS.
> >
> > This doesn't make any sense to me. If the system gets into a weird
> > state out of cold reboot, how could this code be the culprit? Please
> > check your firmware, and try to reproduce the issue on a system that
> > doesn't have such defects.
> >
>
> I'll continue investigating those "Synchronous Exception" although it is kind of
> hard due to I don't have any source code of the firmware to confirm it is buggy
> or not.
>
> I did manage to reproduce this kdump issue on around 5 of those server running a
> fairly recent version of the firmware (07/01/2018). I don't have access to other
> large CPU machines.

Sorry I got busy with some other stuff, but as I reported earlier, I
am not able to reproduce this on my HPE apollo with the latest linus
tree as well.
Here are some details on my setup:

1. # uname -r
5.0.0-rc1+

with the following commit as the HEAD:
commit a88cc8da0279f8e481b0d90e51a0a1cffac55906 (HEAD -> master,
origin/master, origin/HEAD)
Merge: 9cb2feb4d21d 73444bc4d8f9
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Tue Jan 8 18:58:29 2019 -0800

    Merge branch 'akpm' (patches from Andrew)

2. I use the following kdump commandline:
Kernel command line: BOOT_IMAGE=(hd9,gpt2)/vmlinuz-5.0.0-rc1+ ro
irqpoll nr_cpus=1 swiotlb=noforce reset_devices
earlycon=pl011,mmio,0x402020000

3. I am able to run kdump successfully on the machine and also collect
the crash core properly:

.. snip..
kdump: saving to /sysroot//var/crash/127.0.0.1-2019-01-10-10:52:25/
kdump: saving vmcore-dmesg.txt
kdump: saving vmcore-dmesg.txt complete
kdump: saving vmcore
Copying data                                      : [100.0 %] \
   eta: 0s
kdump: saving vmcore complete
.. snip ..

4. I use the same firmware version on the board as you shared earlier:
# dmidecode | grep -A 20 -i "BIOS Information"
BIOS Information
    Vendor: American Megatrends Inc.
    Version: L50_5.13_1.0.6
    Release Date: 07/10/2018
    Address: 0xF0000
    Runtime Size: 64 kB
    ROM Size: 64 MB
    Characteristics:
        PCI is supported
        BIOS is upgradeable
        BIOS shadowing is allowed
        Boot from CD is supported
        Selectable boot is supported
        BIOS ROM is socketed
        ACPI is supported
        BIOS boot specification is supported
        Targeted content distribution is supported
        UEFI is supported
    BIOS Revision: 6.3

So, I am guessing that it might be a kdump command line issue at your end.

Thanks,
Bhupesh

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2] arm64: invalidate TLB just before turning MMU on
@ 2019-01-10 20:00                     ` Bhupesh Sharma
  0 siblings, 0 replies; 41+ messages in thread
From: Bhupesh Sharma @ 2019-01-10 20:00 UTC (permalink / raw)
  To: Qian Cai
  Cc: Ard Biesheuvel, Marc Zyngier, Catalin Marinas, Will Deacon,
	Linux Kernel Mailing List, AKASHI Takahiro, James Morse,
	Kexec Mailing List, linux-arm-kernel

Hi Qian,

On Sat, Dec 15, 2018 at 7:24 AM Qian Cai <cai@lca.pw> wrote:
>
> On 12/14/18 2:23 AM, Ard Biesheuvel wrote:
> > On Fri, 14 Dec 2018 at 05:08, Qian Cai <cai@lca.pw> wrote:
> >> Also tried to move the local TLB flush part around a bit inside
> >> __cpu_setup(), although it did complete kdump some times, it did trigger
> >> "Synchronous Exception" in EFI after a cold-reboot fairly often that
> >> seems no way to recover remotely without reinstalling the OS.
> >
> > This doesn't make any sense to me. If the system gets into a weird
> > state out of cold reboot, how could this code be the culprit? Please
> > check your firmware, and try to reproduce the issue on a system that
> > doesn't have such defects.
> >
>
> I'll continue investigating those "Synchronous Exception" although it is kind of
> hard due to I don't have any source code of the firmware to confirm it is buggy
> or not.
>
> I did manage to reproduce this kdump issue on around 5 of those server running a
> fairly recent version of the firmware (07/01/2018). I don't have access to other
> large CPU machines.

Sorry I got busy with some other stuff, but as I reported earlier, I
am not able to reproduce this on my HPE apollo with the latest linus
tree as well.
Here are some details on my setup:

1. # uname -r
5.0.0-rc1+

with the following commit as the HEAD:
commit a88cc8da0279f8e481b0d90e51a0a1cffac55906 (HEAD -> master,
origin/master, origin/HEAD)
Merge: 9cb2feb4d21d 73444bc4d8f9
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Tue Jan 8 18:58:29 2019 -0800

    Merge branch 'akpm' (patches from Andrew)

2. I use the following kdump commandline:
Kernel command line: BOOT_IMAGE=(hd9,gpt2)/vmlinuz-5.0.0-rc1+ ro
irqpoll nr_cpus=1 swiotlb=noforce reset_devices
earlycon=pl011,mmio,0x402020000

3. I am able to run kdump successfully on the machine and also collect
the crash core properly:

.. snip..
kdump: saving to /sysroot//var/crash/127.0.0.1-2019-01-10-10:52:25/
kdump: saving vmcore-dmesg.txt
kdump: saving vmcore-dmesg.txt complete
kdump: saving vmcore
Copying data                                      : [100.0 %] \
   eta: 0s
kdump: saving vmcore complete
.. snip ..

4. I use the same firmware version on the board as you shared earlier:
# dmidecode | grep -A 20 -i "BIOS Information"
BIOS Information
    Vendor: American Megatrends Inc.
    Version: L50_5.13_1.0.6
    Release Date: 07/10/2018
    Address: 0xF0000
    Runtime Size: 64 kB
    ROM Size: 64 MB
    Characteristics:
        PCI is supported
        BIOS is upgradeable
        BIOS shadowing is allowed
        Boot from CD is supported
        Selectable boot is supported
        BIOS ROM is socketed
        ACPI is supported
        BIOS boot specification is supported
        Targeted content distribution is supported
        UEFI is supported
    BIOS Revision: 6.3

So, I am guessing that it might be a kdump command line issue at your end.

Thanks,
Bhupesh

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2019-01-10 20:00 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-10 22:30 arm64: kdump broken on a large CPU system Qian Cai
2018-12-11 10:09 ` Marc Zyngier
2018-12-11 11:34   ` James Morse
2018-12-12  2:51     ` AKASHI, Takahiro
2018-12-12  4:39       ` Qian Cai
2018-12-12  4:39         ` Qian Cai
2018-12-12 22:37         ` Qian Cai
2018-12-12 22:37           ` Qian Cai
2018-12-13  5:22           ` [PATCH] arm64: invalidate TLB before turning MMU on Qian Cai
2018-12-13  5:22             ` Qian Cai
2018-12-13  5:22             ` Qian Cai
2018-12-13  5:40             ` Bhupesh Sharma
2018-12-13  5:40               ` Bhupesh Sharma
2018-12-13  5:40               ` Bhupesh Sharma
2018-12-13 13:39               ` Qian Cai
2018-12-13 13:39                 ` Qian Cai
2018-12-13 13:39                 ` Qian Cai
2018-12-13 10:44             ` James Morse
2018-12-13 10:44               ` James Morse
2018-12-13 10:44               ` James Morse
2018-12-13 13:44               ` Qian Cai
2018-12-13 13:44                 ` Qian Cai
2018-12-13 13:44                 ` Qian Cai
2018-12-14  4:08             ` [PATCH v2] arm64: invalidate TLB just " Qian Cai
2018-12-14  4:08               ` Qian Cai
2018-12-14  4:08               ` Qian Cai
2018-12-14  5:01               ` Bhupesh Sharma
2018-12-14  5:01                 ` Bhupesh Sharma
2018-12-14  5:01                 ` Bhupesh Sharma
2018-12-14 12:54                 ` Qian Cai
2018-12-14 12:54                   ` Qian Cai
2018-12-14 12:54                   ` Qian Cai
2018-12-14  7:23               ` Ard Biesheuvel
2018-12-14  7:23                 ` Ard Biesheuvel
2018-12-14  7:23                 ` Ard Biesheuvel
2018-12-15  1:53                 ` Qian Cai
2018-12-15  1:53                   ` Qian Cai
2018-12-15  1:53                   ` Qian Cai
2019-01-10 20:00                   ` Bhupesh Sharma
2019-01-10 20:00                     ` Bhupesh Sharma
2019-01-10 20:00                     ` Bhupesh Sharma

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.