All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug Report] kdump crashes after latest EFI memblock changes on arm64 machines with large number of CPUs
@ 2018-11-01 21:14 ` Bhupesh Sharma
  0 siblings, 0 replies; 18+ messages in thread
From: Bhupesh Sharma @ 2018-11-01 21:14 UTC (permalink / raw)
  To: linux-arm-kernel, Ard Biesheuvel, linux-efi
  Cc: Mark Rutland, Will Deacon, Bhupesh SHARMA, kexec mailing list

Hi,

With the latest EFI changes for memblock reservation across kdump
kernel from Ard (Commit 71e0940d52e107748b270213a01d3b1546657d74
["efi: honour memory reservations passed via a linux specific config
table"]), we hit a panic while trying to boot the kdump kernel on
machines which have large number of CPUs.

I have a arm64 board which has 224 CPUS:
# lscpu
<..snip..>
CPU(s):              224
On-line CPU(s) list: 0-223
<..snip..>

Here are the crash logs in the kdump kernel on this machine:

[    0.000000] Unable to handle kernel paging request at virtual
address ffff80003ffe0000
val____)nt EL), IL ata abort info:
[    0.or: Oops: 960000inted 4.18.0+ #3
[    0.000000] pstate: 20400089 (nzCv daIf +PAN -UAO)
[    0.000000] pc : __memcpy+0x110/0x180
[    0.000000] lr : memblock_double_array+0x240/0x348
[    0.000000] sp : ffff0000092efc80 x28: 00000000bffe0000
[    0.000000] x27: 0000000000001800 x26: ffff000009d59000
[    0.000000] x25: ffff80003ffe0000 x24: 0000000000000000
[    0.000000] x23: 0000000000010000 x22: ffff000009d594e8
[    0.000000] x21: ffff000009d594f4 x20: ffff0000093c7268
[    0.000000] x19: 0000000000000c00 x18: 0000000000000010
[    0.000000] x17: 0000000000000000 x16: 0000000000000000
[    0.000000] x15: ffffffffffffffff3: 0000000fc18d0000 x12: 0000000800000000
[    0.000000] x11: 0000000000000018 x10: 00000000ddab9e18
[    0.000000] x9 : 0000000800000000 x8 : 00000000000002c1
[    0.000000] x7 : 0000000091b90000 x6 : ffff80003ffe0000
[    0.000000] x5 : 0000000000000001 x4 : 0000000000000000
[    0.000000] x3 : 0000000000000000 x2 : 0000000000000b80
[    0.000000] x1 : ffff000009d59540 x0 : ffff80003ffe0000
[    0.000000] Process swapper)
[    0.000000] Call trace:
[    0.000000]  __memcpy+0x110/0x180
[    0.000000]  memblock_add_range+0x134/0x2e8
[    0.000000]  memblock_reserve+0x70/0xb8
[    0.000000]  memblock_alloc_base_nid+0x6c/0x88
[    0.000000]  __memblock_alloc_base+0x3c/0x4c
[    0.000000]  memblock_alloc_base+0x28/0x4c
[    0.000000]  memblock_alloc+0x2c/0x38
[    0.000000]  early_pgtable_alloc+0x20/0xb0
[    0.000000]  paging_init+0x28/0x7f8
[   0.000000]  start_kernel+0x78/0x4cc
[    0.000000] Code: a8c12027 a8c12829 a8c1302b a8c1382d (a88120c7)
[    0.000000] random: get_random_bytes called from
print_oops_end_marker+0x30/0x58 with crng_init=0
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Kernel panic - not syncing: Fatal exception
[    0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]---

Adding more debug logs via 'memblock=debug' being passed to the kdump
kernel, (and adding a few more prints to 'mm/memblock.c'), I can see
that the panic happens while trying to resize array inside
'memblock_double_array' (which doubles the size of the memblock
regions array):

[    0.000000] Reserving 13KB of memory at 0xbfff0000 for elfcorehdr
[    0.000000] memblock_reserve:
[0x00000000bfff0000-0x00000000bfffffff]
memblock_alloc_base_nid+0x6c/0x88
[    0.000000] memblock: use_slab is 0, new_area_start=bfff0000,
new_area_size=10000
[    0.000000] memblock: use_slab is 0, addr=0, new_area_size=10000
[    0.000000] memblock: addr=bffe0000, __va(addr)=ffff80003ffe0000
[    0.00000 [0xbffe0000-0xbffe17ff]
[    0.000000] Unable to handle kernel paging request at virtual
address ffff80003ffe0000

which indicates that after Ard's patch the memblocks being reserved
across kdump swell up on systems which have large number of CPUs and
hence 'memblock_double_array' is called up in early kdump boot code to
double the size of the memblock regions array.

To confirm the above, I reduced the number of SMP CPUs available to
the kernel on this system, by specifying 'nr_cpus=46' in the kernel
bootargs for the primary kernel. As expected this makes the kdump
kernel boot successfully and also save the crash dump properly.

I saw another arm64 kdump user report this issue to me privately, so I
am sending this to a wider audience, so that kdump users are aware
that this is a known issue.

I am working on a RFC patch which seems to fix the issue on my board
and will try to send it out for wider review in coming days after some
more checks at my end.

Any advices on the same are also welcome :)

Thanks,
Bhupesh

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug Report] kdump crashes after latest EFI memblock changes on arm64 machines with large number of CPUs
@ 2018-11-01 21:14 ` Bhupesh Sharma
  0 siblings, 0 replies; 18+ messages in thread
From: Bhupesh Sharma @ 2018-11-01 21:14 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

With the latest EFI changes for memblock reservation across kdump
kernel from Ard (Commit 71e0940d52e107748b270213a01d3b1546657d74
["efi: honour memory reservations passed via a linux specific config
table"]), we hit a panic while trying to boot the kdump kernel on
machines which have large number of CPUs.

I have a arm64 board which has 224 CPUS:
# lscpu
<..snip..>
CPU(s):              224
On-line CPU(s) list: 0-223
<..snip..>

Here are the crash logs in the kdump kernel on this machine:

[    0.000000] Unable to handle kernel paging request at virtual
address ffff80003ffe0000
val____)nt EL), IL ata abort info:
[    0.or: Oops: 960000inted 4.18.0+ #3
[    0.000000] pstate: 20400089 (nzCv daIf +PAN -UAO)
[    0.000000] pc : __memcpy+0x110/0x180
[    0.000000] lr : memblock_double_array+0x240/0x348
[    0.000000] sp : ffff0000092efc80 x28: 00000000bffe0000
[    0.000000] x27: 0000000000001800 x26: ffff000009d59000
[    0.000000] x25: ffff80003ffe0000 x24: 0000000000000000
[    0.000000] x23: 0000000000010000 x22: ffff000009d594e8
[    0.000000] x21: ffff000009d594f4 x20: ffff0000093c7268
[    0.000000] x19: 0000000000000c00 x18: 0000000000000010
[    0.000000] x17: 0000000000000000 x16: 0000000000000000
[    0.000000] x15: ffffffffffffffff3: 0000000fc18d0000 x12: 0000000800000000
[    0.000000] x11: 0000000000000018 x10: 00000000ddab9e18
[    0.000000] x9 : 0000000800000000 x8 : 00000000000002c1
[    0.000000] x7 : 0000000091b90000 x6 : ffff80003ffe0000
[    0.000000] x5 : 0000000000000001 x4 : 0000000000000000
[    0.000000] x3 : 0000000000000000 x2 : 0000000000000b80
[    0.000000] x1 : ffff000009d59540 x0 : ffff80003ffe0000
[    0.000000] Process swapper)
[    0.000000] Call trace:
[    0.000000]  __memcpy+0x110/0x180
[    0.000000]  memblock_add_range+0x134/0x2e8
[    0.000000]  memblock_reserve+0x70/0xb8
[    0.000000]  memblock_alloc_base_nid+0x6c/0x88
[    0.000000]  __memblock_alloc_base+0x3c/0x4c
[    0.000000]  memblock_alloc_base+0x28/0x4c
[    0.000000]  memblock_alloc+0x2c/0x38
[    0.000000]  early_pgtable_alloc+0x20/0xb0
[    0.000000]  paging_init+0x28/0x7f8
[   0.000000]  start_kernel+0x78/0x4cc
[    0.000000] Code: a8c12027 a8c12829 a8c1302b a8c1382d (a88120c7)
[    0.000000] random: get_random_bytes called from
print_oops_end_marker+0x30/0x58 with crng_init=0
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Kernel panic - not syncing: Fatal exception
[    0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]---

Adding more debug logs via 'memblock=debug' being passed to the kdump
kernel, (and adding a few more prints to 'mm/memblock.c'), I can see
that the panic happens while trying to resize array inside
'memblock_double_array' (which doubles the size of the memblock
regions array):

[    0.000000] Reserving 13KB of memory at 0xbfff0000 for elfcorehdr
[    0.000000] memblock_reserve:
[0x00000000bfff0000-0x00000000bfffffff]
memblock_alloc_base_nid+0x6c/0x88
[    0.000000] memblock: use_slab is 0, new_area_start=bfff0000,
new_area_size=10000
[    0.000000] memblock: use_slab is 0, addr=0, new_area_size=10000
[    0.000000] memblock: addr=bffe0000, __va(addr)=ffff80003ffe0000
[    0.00000 [0xbffe0000-0xbffe17ff]
[    0.000000] Unable to handle kernel paging request at virtual
address ffff80003ffe0000

which indicates that after Ard's patch the memblocks being reserved
across kdump swell up on systems which have large number of CPUs and
hence 'memblock_double_array' is called up in early kdump boot code to
double the size of the memblock regions array.

To confirm the above, I reduced the number of SMP CPUs available to
the kernel on this system, by specifying 'nr_cpus=46' in the kernel
bootargs for the primary kernel. As expected this makes the kdump
kernel boot successfully and also save the crash dump properly.

I saw another arm64 kdump user report this issue to me privately, so I
am sending this to a wider audience, so that kdump users are aware
that this is a known issue.

I am working on a RFC patch which seems to fix the issue on my board
and will try to send it out for wider review in coming days after some
more checks at my end.

Any advices on the same are also welcome :)

Thanks,
Bhupesh

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug Report] kdump crashes after latest EFI memblock changes on arm64 machines with large number of CPUs
@ 2018-11-01 21:14 ` Bhupesh Sharma
  0 siblings, 0 replies; 18+ messages in thread
From: Bhupesh Sharma @ 2018-11-01 21:14 UTC (permalink / raw)
  To: linux-arm-kernel, Ard Biesheuvel, linux-efi
  Cc: Mark Rutland, Will Deacon, Bhupesh SHARMA, kexec mailing list

Hi,

With the latest EFI changes for memblock reservation across kdump
kernel from Ard (Commit 71e0940d52e107748b270213a01d3b1546657d74
["efi: honour memory reservations passed via a linux specific config
table"]), we hit a panic while trying to boot the kdump kernel on
machines which have large number of CPUs.

I have a arm64 board which has 224 CPUS:
# lscpu
<..snip..>
CPU(s):              224
On-line CPU(s) list: 0-223
<..snip..>

Here are the crash logs in the kdump kernel on this machine:

[    0.000000] Unable to handle kernel paging request at virtual
address ffff80003ffe0000
val____)nt EL), IL ata abort info:
[    0.or: Oops: 960000inted 4.18.0+ #3
[    0.000000] pstate: 20400089 (nzCv daIf +PAN -UAO)
[    0.000000] pc : __memcpy+0x110/0x180
[    0.000000] lr : memblock_double_array+0x240/0x348
[    0.000000] sp : ffff0000092efc80 x28: 00000000bffe0000
[    0.000000] x27: 0000000000001800 x26: ffff000009d59000
[    0.000000] x25: ffff80003ffe0000 x24: 0000000000000000
[    0.000000] x23: 0000000000010000 x22: ffff000009d594e8
[    0.000000] x21: ffff000009d594f4 x20: ffff0000093c7268
[    0.000000] x19: 0000000000000c00 x18: 0000000000000010
[    0.000000] x17: 0000000000000000 x16: 0000000000000000
[    0.000000] x15: ffffffffffffffff3: 0000000fc18d0000 x12: 0000000800000000
[    0.000000] x11: 0000000000000018 x10: 00000000ddab9e18
[    0.000000] x9 : 0000000800000000 x8 : 00000000000002c1
[    0.000000] x7 : 0000000091b90000 x6 : ffff80003ffe0000
[    0.000000] x5 : 0000000000000001 x4 : 0000000000000000
[    0.000000] x3 : 0000000000000000 x2 : 0000000000000b80
[    0.000000] x1 : ffff000009d59540 x0 : ffff80003ffe0000
[    0.000000] Process swapper)
[    0.000000] Call trace:
[    0.000000]  __memcpy+0x110/0x180
[    0.000000]  memblock_add_range+0x134/0x2e8
[    0.000000]  memblock_reserve+0x70/0xb8
[    0.000000]  memblock_alloc_base_nid+0x6c/0x88
[    0.000000]  __memblock_alloc_base+0x3c/0x4c
[    0.000000]  memblock_alloc_base+0x28/0x4c
[    0.000000]  memblock_alloc+0x2c/0x38
[    0.000000]  early_pgtable_alloc+0x20/0xb0
[    0.000000]  paging_init+0x28/0x7f8
[   0.000000]  start_kernel+0x78/0x4cc
[    0.000000] Code: a8c12027 a8c12829 a8c1302b a8c1382d (a88120c7)
[    0.000000] random: get_random_bytes called from
print_oops_end_marker+0x30/0x58 with crng_init=0
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Kernel panic - not syncing: Fatal exception
[    0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]---

Adding more debug logs via 'memblock=debug' being passed to the kdump
kernel, (and adding a few more prints to 'mm/memblock.c'), I can see
that the panic happens while trying to resize array inside
'memblock_double_array' (which doubles the size of the memblock
regions array):

[    0.000000] Reserving 13KB of memory at 0xbfff0000 for elfcorehdr
[    0.000000] memblock_reserve:
[0x00000000bfff0000-0x00000000bfffffff]
memblock_alloc_base_nid+0x6c/0x88
[    0.000000] memblock: use_slab is 0, new_area_start=bfff0000,
new_area_size=10000
[    0.000000] memblock: use_slab is 0, addr=0, new_area_size=10000
[    0.000000] memblock: addr=bffe0000, __va(addr)=ffff80003ffe0000
[    0.00000 [0xbffe0000-0xbffe17ff]
[    0.000000] Unable to handle kernel paging request at virtual
address ffff80003ffe0000

which indicates that after Ard's patch the memblocks being reserved
across kdump swell up on systems which have large number of CPUs and
hence 'memblock_double_array' is called up in early kdump boot code to
double the size of the memblock regions array.

To confirm the above, I reduced the number of SMP CPUs available to
the kernel on this system, by specifying 'nr_cpus=46' in the kernel
bootargs for the primary kernel. As expected this makes the kdump
kernel boot successfully and also save the crash dump properly.

I saw another arm64 kdump user report this issue to me privately, so I
am sending this to a wider audience, so that kdump users are aware
that this is a known issue.

I am working on a RFC patch which seems to fix the issue on my board
and will try to send it out for wider review in coming days after some
more checks at my end.

Any advices on the same are also welcome :)

Thanks,
Bhupesh

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug Report] kdump crashes after latest EFI memblock changes on arm64 machines with large number of CPUs
  2018-11-01 21:14 ` Bhupesh Sharma
  (?)
@ 2018-11-05 11:11   ` Ard Biesheuvel
  -1 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2018-11-05 11:11 UTC (permalink / raw)
  To: Bhupesh Sharma, Marc Zyngier
  Cc: Mark Rutland, linux-efi, kexec mailing list, Will Deacon,
	Bhupesh SHARMA, linux-arm-kernel

(+ Marc)

On 1 November 2018 at 22:14, Bhupesh Sharma <bhsharma@redhat.com> wrote:
> Hi,
>
> With the latest EFI changes for memblock reservation across kdump
> kernel from Ard (Commit 71e0940d52e107748b270213a01d3b1546657d74
> ["efi: honour memory reservations passed via a linux specific config
> table"]), we hit a panic while trying to boot the kdump kernel on
> machines which have large number of CPUs.
>

Just for my understanding: why do you boot all 224 CPus when running
the crash kernel?

I'm not saying we shouldn't fix the underlying issue, I'm just curious.

> I have a arm64 board which has 224 CPUS:
> # lscpu
> <..snip..>
> CPU(s):              224
> On-line CPU(s) list: 0-223
> <..snip..>
>
> Here are the crash logs in the kdump kernel on this machine:
>
> [    0.000000] Unable to handle kernel paging request at virtual
> address ffff80003ffe0000
> val____)nt EL), IL ata abort info:
> [    0.or: Oops: 960000inted 4.18.0+ #3
> [    0.000000] pstate: 20400089 (nzCv daIf +PAN -UAO)
> [    0.000000] pc : __memcpy+0x110/0x180
> [    0.000000] lr : memblock_double_array+0x240/0x348
> [    0.000000] sp : ffff0000092efc80 x28: 00000000bffe0000
> [    0.000000] x27: 0000000000001800 x26: ffff000009d59000
> [    0.000000] x25: ffff80003ffe0000 x24: 0000000000000000
> [    0.000000] x23: 0000000000010000 x22: ffff000009d594e8
> [    0.000000] x21: ffff000009d594f4 x20: ffff0000093c7268
> [    0.000000] x19: 0000000000000c00 x18: 0000000000000010
> [    0.000000] x17: 0000000000000000 x16: 0000000000000000
> [    0.000000] x15: ffffffffffffffff3: 0000000fc18d0000 x12: 0000000800000000
> [    0.000000] x11: 0000000000000018 x10: 00000000ddab9e18
> [    0.000000] x9 : 0000000800000000 x8 : 00000000000002c1
> [    0.000000] x7 : 0000000091b90000 x6 : ffff80003ffe0000
> [    0.000000] x5 : 0000000000000001 x4 : 0000000000000000
> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000b80
> [    0.000000] x1 : ffff000009d59540 x0 : ffff80003ffe0000
> [    0.000000] Process swapper)
> [    0.000000] Call trace:
> [    0.000000]  __memcpy+0x110/0x180
> [    0.000000]  memblock_add_range+0x134/0x2e8
> [    0.000000]  memblock_reserve+0x70/0xb8
> [    0.000000]  memblock_alloc_base_nid+0x6c/0x88
> [    0.000000]  __memblock_alloc_base+0x3c/0x4c
> [    0.000000]  memblock_alloc_base+0x28/0x4c
> [    0.000000]  memblock_alloc+0x2c/0x38
> [    0.000000]  early_pgtable_alloc+0x20/0xb0
> [    0.000000]  paging_init+0x28/0x7f8
> [   0.000000]  start_kernel+0x78/0x4cc
> [    0.000000] Code: a8c12027 a8c12829 a8c1302b a8c1382d (a88120c7)
> [    0.000000] random: get_random_bytes called from
> print_oops_end_marker+0x30/0x58 with crng_init=0
> [    0.000000] ---[ end trace 0000000000000000 ]---
> [    0.000000] Kernel panic - not syncing: Fatal exception
> [    0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]---
>
> Adding more debug logs via 'memblock=debug' being passed to the kdump
> kernel, (and adding a few more prints to 'mm/memblock.c'), I can see
> that the panic happens while trying to resize array inside
> 'memblock_double_array' (which doubles the size of the memblock
> regions array):
>
> [    0.000000] Reserving 13KB of memory at 0xbfff0000 for elfcorehdr
> [    0.000000] memblock_reserve:
> [0x00000000bfff0000-0x00000000bfffffff]
> memblock_alloc_base_nid+0x6c/0x88
> [    0.000000] memblock: use_slab is 0, new_area_start=bfff0000,
> new_area_size=10000
> [    0.000000] memblock: use_slab is 0, addr=0, new_area_size=10000
> [    0.000000] memblock: addr=bffe0000, __va(addr)=ffff80003ffe0000
> [    0.00000 [0xbffe0000-0xbffe17ff]
> [    0.000000] Unable to handle kernel paging request at virtual
> address ffff80003ffe0000
>
> which indicates that after Ard's patch the memblocks being reserved
> across kdump swell up on systems which have large number of CPUs and
> hence 'memblock_double_array' is called up in early kdump boot code to
> double the size of the memblock regions array.
>
> To confirm the above, I reduced the number of SMP CPUs available to
> the kernel on this system, by specifying 'nr_cpus=46' in the kernel
> bootargs for the primary kernel. As expected this makes the kdump
> kernel boot successfully and also save the crash dump properly.
>
> I saw another arm64 kdump user report this issue to me privately, so I
> am sending this to a wider audience, so that kdump users are aware
> that this is a known issue.
>
> I am working on a RFC patch which seems to fix the issue on my board
> and will try to send it out for wider review in coming days after some
> more checks at my end.
>
> Any advices on the same are also welcome :)
>
> Thanks,
> Bhupesh

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug Report] kdump crashes after latest EFI memblock changes on arm64 machines with large number of CPUs
@ 2018-11-05 11:11   ` Ard Biesheuvel
  0 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2018-11-05 11:11 UTC (permalink / raw)
  To: linux-arm-kernel

(+ Marc)

On 1 November 2018 at 22:14, Bhupesh Sharma <bhsharma@redhat.com> wrote:
> Hi,
>
> With the latest EFI changes for memblock reservation across kdump
> kernel from Ard (Commit 71e0940d52e107748b270213a01d3b1546657d74
> ["efi: honour memory reservations passed via a linux specific config
> table"]), we hit a panic while trying to boot the kdump kernel on
> machines which have large number of CPUs.
>

Just for my understanding: why do you boot all 224 CPus when running
the crash kernel?

I'm not saying we shouldn't fix the underlying issue, I'm just curious.

> I have a arm64 board which has 224 CPUS:
> # lscpu
> <..snip..>
> CPU(s):              224
> On-line CPU(s) list: 0-223
> <..snip..>
>
> Here are the crash logs in the kdump kernel on this machine:
>
> [    0.000000] Unable to handle kernel paging request at virtual
> address ffff80003ffe0000
> val____)nt EL), IL ata abort info:
> [    0.or: Oops: 960000inted 4.18.0+ #3
> [    0.000000] pstate: 20400089 (nzCv daIf +PAN -UAO)
> [    0.000000] pc : __memcpy+0x110/0x180
> [    0.000000] lr : memblock_double_array+0x240/0x348
> [    0.000000] sp : ffff0000092efc80 x28: 00000000bffe0000
> [    0.000000] x27: 0000000000001800 x26: ffff000009d59000
> [    0.000000] x25: ffff80003ffe0000 x24: 0000000000000000
> [    0.000000] x23: 0000000000010000 x22: ffff000009d594e8
> [    0.000000] x21: ffff000009d594f4 x20: ffff0000093c7268
> [    0.000000] x19: 0000000000000c00 x18: 0000000000000010
> [    0.000000] x17: 0000000000000000 x16: 0000000000000000
> [    0.000000] x15: ffffffffffffffff3: 0000000fc18d0000 x12: 0000000800000000
> [    0.000000] x11: 0000000000000018 x10: 00000000ddab9e18
> [    0.000000] x9 : 0000000800000000 x8 : 00000000000002c1
> [    0.000000] x7 : 0000000091b90000 x6 : ffff80003ffe0000
> [    0.000000] x5 : 0000000000000001 x4 : 0000000000000000
> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000b80
> [    0.000000] x1 : ffff000009d59540 x0 : ffff80003ffe0000
> [    0.000000] Process swapper)
> [    0.000000] Call trace:
> [    0.000000]  __memcpy+0x110/0x180
> [    0.000000]  memblock_add_range+0x134/0x2e8
> [    0.000000]  memblock_reserve+0x70/0xb8
> [    0.000000]  memblock_alloc_base_nid+0x6c/0x88
> [    0.000000]  __memblock_alloc_base+0x3c/0x4c
> [    0.000000]  memblock_alloc_base+0x28/0x4c
> [    0.000000]  memblock_alloc+0x2c/0x38
> [    0.000000]  early_pgtable_alloc+0x20/0xb0
> [    0.000000]  paging_init+0x28/0x7f8
> [   0.000000]  start_kernel+0x78/0x4cc
> [    0.000000] Code: a8c12027 a8c12829 a8c1302b a8c1382d (a88120c7)
> [    0.000000] random: get_random_bytes called from
> print_oops_end_marker+0x30/0x58 with crng_init=0
> [    0.000000] ---[ end trace 0000000000000000 ]---
> [    0.000000] Kernel panic - not syncing: Fatal exception
> [    0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]---
>
> Adding more debug logs via 'memblock=debug' being passed to the kdump
> kernel, (and adding a few more prints to 'mm/memblock.c'), I can see
> that the panic happens while trying to resize array inside
> 'memblock_double_array' (which doubles the size of the memblock
> regions array):
>
> [    0.000000] Reserving 13KB of memory at 0xbfff0000 for elfcorehdr
> [    0.000000] memblock_reserve:
> [0x00000000bfff0000-0x00000000bfffffff]
> memblock_alloc_base_nid+0x6c/0x88
> [    0.000000] memblock: use_slab is 0, new_area_start=bfff0000,
> new_area_size=10000
> [    0.000000] memblock: use_slab is 0, addr=0, new_area_size=10000
> [    0.000000] memblock: addr=bffe0000, __va(addr)=ffff80003ffe0000
> [    0.00000 [0xbffe0000-0xbffe17ff]
> [    0.000000] Unable to handle kernel paging request at virtual
> address ffff80003ffe0000
>
> which indicates that after Ard's patch the memblocks being reserved
> across kdump swell up on systems which have large number of CPUs and
> hence 'memblock_double_array' is called up in early kdump boot code to
> double the size of the memblock regions array.
>
> To confirm the above, I reduced the number of SMP CPUs available to
> the kernel on this system, by specifying 'nr_cpus=46' in the kernel
> bootargs for the primary kernel. As expected this makes the kdump
> kernel boot successfully and also save the crash dump properly.
>
> I saw another arm64 kdump user report this issue to me privately, so I
> am sending this to a wider audience, so that kdump users are aware
> that this is a known issue.
>
> I am working on a RFC patch which seems to fix the issue on my board
> and will try to send it out for wider review in coming days after some
> more checks at my end.
>
> Any advices on the same are also welcome :)
>
> Thanks,
> Bhupesh

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug Report] kdump crashes after latest EFI memblock changes on arm64 machines with large number of CPUs
@ 2018-11-05 11:11   ` Ard Biesheuvel
  0 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2018-11-05 11:11 UTC (permalink / raw)
  To: Bhupesh Sharma, Marc Zyngier
  Cc: Mark Rutland, linux-efi, kexec mailing list, Will Deacon,
	Bhupesh SHARMA, linux-arm-kernel

(+ Marc)

On 1 November 2018 at 22:14, Bhupesh Sharma <bhsharma@redhat.com> wrote:
> Hi,
>
> With the latest EFI changes for memblock reservation across kdump
> kernel from Ard (Commit 71e0940d52e107748b270213a01d3b1546657d74
> ["efi: honour memory reservations passed via a linux specific config
> table"]), we hit a panic while trying to boot the kdump kernel on
> machines which have large number of CPUs.
>

Just for my understanding: why do you boot all 224 CPus when running
the crash kernel?

I'm not saying we shouldn't fix the underlying issue, I'm just curious.

> I have a arm64 board which has 224 CPUS:
> # lscpu
> <..snip..>
> CPU(s):              224
> On-line CPU(s) list: 0-223
> <..snip..>
>
> Here are the crash logs in the kdump kernel on this machine:
>
> [    0.000000] Unable to handle kernel paging request at virtual
> address ffff80003ffe0000
> val____)nt EL), IL ata abort info:
> [    0.or: Oops: 960000inted 4.18.0+ #3
> [    0.000000] pstate: 20400089 (nzCv daIf +PAN -UAO)
> [    0.000000] pc : __memcpy+0x110/0x180
> [    0.000000] lr : memblock_double_array+0x240/0x348
> [    0.000000] sp : ffff0000092efc80 x28: 00000000bffe0000
> [    0.000000] x27: 0000000000001800 x26: ffff000009d59000
> [    0.000000] x25: ffff80003ffe0000 x24: 0000000000000000
> [    0.000000] x23: 0000000000010000 x22: ffff000009d594e8
> [    0.000000] x21: ffff000009d594f4 x20: ffff0000093c7268
> [    0.000000] x19: 0000000000000c00 x18: 0000000000000010
> [    0.000000] x17: 0000000000000000 x16: 0000000000000000
> [    0.000000] x15: ffffffffffffffff3: 0000000fc18d0000 x12: 0000000800000000
> [    0.000000] x11: 0000000000000018 x10: 00000000ddab9e18
> [    0.000000] x9 : 0000000800000000 x8 : 00000000000002c1
> [    0.000000] x7 : 0000000091b90000 x6 : ffff80003ffe0000
> [    0.000000] x5 : 0000000000000001 x4 : 0000000000000000
> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000b80
> [    0.000000] x1 : ffff000009d59540 x0 : ffff80003ffe0000
> [    0.000000] Process swapper)
> [    0.000000] Call trace:
> [    0.000000]  __memcpy+0x110/0x180
> [    0.000000]  memblock_add_range+0x134/0x2e8
> [    0.000000]  memblock_reserve+0x70/0xb8
> [    0.000000]  memblock_alloc_base_nid+0x6c/0x88
> [    0.000000]  __memblock_alloc_base+0x3c/0x4c
> [    0.000000]  memblock_alloc_base+0x28/0x4c
> [    0.000000]  memblock_alloc+0x2c/0x38
> [    0.000000]  early_pgtable_alloc+0x20/0xb0
> [    0.000000]  paging_init+0x28/0x7f8
> [   0.000000]  start_kernel+0x78/0x4cc
> [    0.000000] Code: a8c12027 a8c12829 a8c1302b a8c1382d (a88120c7)
> [    0.000000] random: get_random_bytes called from
> print_oops_end_marker+0x30/0x58 with crng_init=0
> [    0.000000] ---[ end trace 0000000000000000 ]---
> [    0.000000] Kernel panic - not syncing: Fatal exception
> [    0.000000] ---[ end Kernel panic - not syncing: Fatal exception ]---
>
> Adding more debug logs via 'memblock=debug' being passed to the kdump
> kernel, (and adding a few more prints to 'mm/memblock.c'), I can see
> that the panic happens while trying to resize array inside
> 'memblock_double_array' (which doubles the size of the memblock
> regions array):
>
> [    0.000000] Reserving 13KB of memory at 0xbfff0000 for elfcorehdr
> [    0.000000] memblock_reserve:
> [0x00000000bfff0000-0x00000000bfffffff]
> memblock_alloc_base_nid+0x6c/0x88
> [    0.000000] memblock: use_slab is 0, new_area_start=bfff0000,
> new_area_size=10000
> [    0.000000] memblock: use_slab is 0, addr=0, new_area_size=10000
> [    0.000000] memblock: addr=bffe0000, __va(addr)=ffff80003ffe0000
> [    0.00000 [0xbffe0000-0xbffe17ff]
> [    0.000000] Unable to handle kernel paging request at virtual
> address ffff80003ffe0000
>
> which indicates that after Ard's patch the memblocks being reserved
> across kdump swell up on systems which have large number of CPUs and
> hence 'memblock_double_array' is called up in early kdump boot code to
> double the size of the memblock regions array.
>
> To confirm the above, I reduced the number of SMP CPUs available to
> the kernel on this system, by specifying 'nr_cpus=46' in the kernel
> bootargs for the primary kernel. As expected this makes the kdump
> kernel boot successfully and also save the crash dump properly.
>
> I saw another arm64 kdump user report this issue to me privately, so I
> am sending this to a wider audience, so that kdump users are aware
> that this is a known issue.
>
> I am working on a RFC patch which seems to fix the issue on my board
> and will try to send it out for wider review in coming days after some
> more checks at my end.
>
> Any advices on the same are also welcome :)
>
> Thanks,
> Bhupesh

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug Report] kdump crashes after latest EFI memblock changes on arm64 machines with large number of CPUs
  2018-11-05 11:11   ` Ard Biesheuvel
  (?)
@ 2018-11-05 11:31       ` Marc Zyngier
  -1 siblings, 0 replies; 18+ messages in thread
From: Marc Zyngier @ 2018-11-05 11:31 UTC (permalink / raw)
  To: Ard Biesheuvel, Bhupesh Sharma
  Cc: Mark Rutland, linux-efi, kexec mailing list, Will Deacon,
	Bhupesh SHARMA, linux-arm-kernel

On 05/11/18 11:11, Ard Biesheuvel wrote:
> (+ Marc)
> 
> On 1 November 2018 at 22:14, Bhupesh Sharma <bhsharma-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> Hi,
>>
>> With the latest EFI changes for memblock reservation across kdump
>> kernel from Ard (Commit 71e0940d52e107748b270213a01d3b1546657d74
>> ["efi: honour memory reservations passed via a linux specific config
>> table"]), we hit a panic while trying to boot the kdump kernel on
>> machines which have large number of CPUs.
>>
> 
> Just for my understanding: why do you boot all 224 CPus when running
> the crash kernel?

FWIW, I've used these patches to kexec a kernel on a 256 CPUs system,
without any issue. Why am I not seeing this problem?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug Report] kdump crashes after latest EFI memblock changes on arm64 machines with large number of CPUs
@ 2018-11-05 11:31       ` Marc Zyngier
  0 siblings, 0 replies; 18+ messages in thread
From: Marc Zyngier @ 2018-11-05 11:31 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/11/18 11:11, Ard Biesheuvel wrote:
> (+ Marc)
> 
> On 1 November 2018 at 22:14, Bhupesh Sharma <bhsharma@redhat.com> wrote:
>> Hi,
>>
>> With the latest EFI changes for memblock reservation across kdump
>> kernel from Ard (Commit 71e0940d52e107748b270213a01d3b1546657d74
>> ["efi: honour memory reservations passed via a linux specific config
>> table"]), we hit a panic while trying to boot the kdump kernel on
>> machines which have large number of CPUs.
>>
> 
> Just for my understanding: why do you boot all 224 CPus when running
> the crash kernel?

FWIW, I've used these patches to kexec a kernel on a 256 CPUs system,
without any issue. Why am I not seeing this problem?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug Report] kdump crashes after latest EFI memblock changes on arm64 machines with large number of CPUs
@ 2018-11-05 11:31       ` Marc Zyngier
  0 siblings, 0 replies; 18+ messages in thread
From: Marc Zyngier @ 2018-11-05 11:31 UTC (permalink / raw)
  To: Ard Biesheuvel, Bhupesh Sharma
  Cc: Mark Rutland, linux-efi, kexec mailing list, Will Deacon,
	Bhupesh SHARMA, linux-arm-kernel

On 05/11/18 11:11, Ard Biesheuvel wrote:
> (+ Marc)
> 
> On 1 November 2018 at 22:14, Bhupesh Sharma <bhsharma@redhat.com> wrote:
>> Hi,
>>
>> With the latest EFI changes for memblock reservation across kdump
>> kernel from Ard (Commit 71e0940d52e107748b270213a01d3b1546657d74
>> ["efi: honour memory reservations passed via a linux specific config
>> table"]), we hit a panic while trying to boot the kdump kernel on
>> machines which have large number of CPUs.
>>
> 
> Just for my understanding: why do you boot all 224 CPus when running
> the crash kernel?

FWIW, I've used these patches to kexec a kernel on a 256 CPUs system,
without any issue. Why am I not seeing this problem?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug Report] kdump crashes after latest EFI memblock changes on arm64 machines with large number of CPUs
  2018-11-01 21:14 ` Bhupesh Sharma
  (?)
@ 2018-11-06  1:30     ` Will Deacon
  -1 siblings, 0 replies; 18+ messages in thread
From: Will Deacon @ 2018-11-06  1:30 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Mark Rutland, linux-efi-u79uwXL29TY76Z2rM5mHXA, Ard Biesheuvel,
	kexec mailing list, Bhupesh SHARMA, linux-arm-kernel

On Fri, Nov 02, 2018 at 02:44:10AM +0530, Bhupesh Sharma wrote:
> With the latest EFI changes for memblock reservation across kdump
> kernel from Ard (Commit 71e0940d52e107748b270213a01d3b1546657d74
> ["efi: honour memory reservations passed via a linux specific config
> table"]), we hit a panic while trying to boot the kdump kernel on
> machines which have large number of CPUs.
> 
> I have a arm64 board which has 224 CPUS:
> # lscpu
> <..snip..>
> CPU(s):              224
> On-line CPU(s) list: 0-223
> <..snip..>
> 
> Here are the crash logs in the kdump kernel on this machine:
> 
> [    0.000000] Unable to handle kernel paging request at virtual
> address ffff80003ffe0000
> val____)nt EL), IL ata abort info:
> [    0.or: Oops: 960000inted 4.18.0+ #3
> [    0.000000] pstate: 20400089 (nzCv daIf +PAN -UAO)
> [    0.000000] pc : __memcpy+0x110/0x180
> [    0.000000] lr : memblock_double_array+0x240/0x348
> [    0.000000] sp : ffff0000092efc80 x28: 00000000bffe0000
> [    0.000000] x27: 0000000000001800 x26: ffff000009d59000
> [    0.000000] x25: ffff80003ffe0000 x24: 0000000000000000
> [    0.000000] x23: 0000000000010000 x22: ffff000009d594e8
> [    0.000000] x21: ffff000009d594f4 x20: ffff0000093c7268
> [    0.000000] x19: 0000000000000c00 x18: 0000000000000010
> [    0.000000] x17: 0000000000000000 x16: 0000000000000000
> [    0.000000] x15: ffffffffffffffff3: 0000000fc18d0000 x12: 0000000800000000
> [    0.000000] x11: 0000000000000018 x10: 00000000ddab9e18
> [    0.000000] x9 : 0000000800000000 x8 : 00000000000002c1
> [    0.000000] x7 : 0000000091b90000 x6 : ffff80003ffe0000
> [    0.000000] x5 : 0000000000000001 x4 : 0000000000000000
> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000b80
> [    0.000000] x1 : ffff000009d59540 x0 : ffff80003ffe0000
> [    0.000000] Process swapper)
> [    0.000000] Call trace:
> [    0.000000]  __memcpy+0x110/0x180
> [    0.000000]  memblock_add_range+0x134/0x2e8
> [    0.000000]  memblock_reserve+0x70/0xb8
> [    0.000000]  memblock_alloc_base_nid+0x6c/0x88
> [    0.000000]  __memblock_alloc_base+0x3c/0x4c
> [    0.000000]  memblock_alloc_base+0x28/0x4c
> [    0.000000]  memblock_alloc+0x2c/0x38
> [    0.000000]  early_pgtable_alloc+0x20/0xb0

Hmm, so this seems to be the crux of the issue: early_pgtable_alloc() relies
on memblock to allocate page-table memory, but this can be called before the
linear mapping is up and running (or even as part of creating the linear
mapping itself!) so the use of __va in memblock_double_array() actually
returns an unmapped address.

So I guess we either need to implement early_pgtable_alloc() some other way
(how?) or get memblock_double_array() to use a fixmap if it's called too
early (yuck). Alternatively, would it be possible to postpone processing of
the EFI mem_reserve entries until after we've created the linear mapping?

Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug Report] kdump crashes after latest EFI memblock changes on arm64 machines with large number of CPUs
@ 2018-11-06  1:30     ` Will Deacon
  0 siblings, 0 replies; 18+ messages in thread
From: Will Deacon @ 2018-11-06  1:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Nov 02, 2018 at 02:44:10AM +0530, Bhupesh Sharma wrote:
> With the latest EFI changes for memblock reservation across kdump
> kernel from Ard (Commit 71e0940d52e107748b270213a01d3b1546657d74
> ["efi: honour memory reservations passed via a linux specific config
> table"]), we hit a panic while trying to boot the kdump kernel on
> machines which have large number of CPUs.
> 
> I have a arm64 board which has 224 CPUS:
> # lscpu
> <..snip..>
> CPU(s):              224
> On-line CPU(s) list: 0-223
> <..snip..>
> 
> Here are the crash logs in the kdump kernel on this machine:
> 
> [    0.000000] Unable to handle kernel paging request at virtual
> address ffff80003ffe0000
> val____)nt EL), IL ata abort info:
> [    0.or: Oops: 960000inted 4.18.0+ #3
> [    0.000000] pstate: 20400089 (nzCv daIf +PAN -UAO)
> [    0.000000] pc : __memcpy+0x110/0x180
> [    0.000000] lr : memblock_double_array+0x240/0x348
> [    0.000000] sp : ffff0000092efc80 x28: 00000000bffe0000
> [    0.000000] x27: 0000000000001800 x26: ffff000009d59000
> [    0.000000] x25: ffff80003ffe0000 x24: 0000000000000000
> [    0.000000] x23: 0000000000010000 x22: ffff000009d594e8
> [    0.000000] x21: ffff000009d594f4 x20: ffff0000093c7268
> [    0.000000] x19: 0000000000000c00 x18: 0000000000000010
> [    0.000000] x17: 0000000000000000 x16: 0000000000000000
> [    0.000000] x15: ffffffffffffffff3: 0000000fc18d0000 x12: 0000000800000000
> [    0.000000] x11: 0000000000000018 x10: 00000000ddab9e18
> [    0.000000] x9 : 0000000800000000 x8 : 00000000000002c1
> [    0.000000] x7 : 0000000091b90000 x6 : ffff80003ffe0000
> [    0.000000] x5 : 0000000000000001 x4 : 0000000000000000
> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000b80
> [    0.000000] x1 : ffff000009d59540 x0 : ffff80003ffe0000
> [    0.000000] Process swapper)
> [    0.000000] Call trace:
> [    0.000000]  __memcpy+0x110/0x180
> [    0.000000]  memblock_add_range+0x134/0x2e8
> [    0.000000]  memblock_reserve+0x70/0xb8
> [    0.000000]  memblock_alloc_base_nid+0x6c/0x88
> [    0.000000]  __memblock_alloc_base+0x3c/0x4c
> [    0.000000]  memblock_alloc_base+0x28/0x4c
> [    0.000000]  memblock_alloc+0x2c/0x38
> [    0.000000]  early_pgtable_alloc+0x20/0xb0

Hmm, so this seems to be the crux of the issue: early_pgtable_alloc() relies
on memblock to allocate page-table memory, but this can be called before the
linear mapping is up and running (or even as part of creating the linear
mapping itself!) so the use of __va in memblock_double_array() actually
returns an unmapped address.

So I guess we either need to implement early_pgtable_alloc() some other way
(how?) or get memblock_double_array() to use a fixmap if it's called too
early (yuck). Alternatively, would it be possible to postpone processing of
the EFI mem_reserve entries until after we've created the linear mapping?

Will

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug Report] kdump crashes after latest EFI memblock changes on arm64 machines with large number of CPUs
@ 2018-11-06  1:30     ` Will Deacon
  0 siblings, 0 replies; 18+ messages in thread
From: Will Deacon @ 2018-11-06  1:30 UTC (permalink / raw)
  To: Bhupesh Sharma
  Cc: Mark Rutland, linux-efi, Ard Biesheuvel, kexec mailing list,
	Bhupesh SHARMA, linux-arm-kernel

On Fri, Nov 02, 2018 at 02:44:10AM +0530, Bhupesh Sharma wrote:
> With the latest EFI changes for memblock reservation across kdump
> kernel from Ard (Commit 71e0940d52e107748b270213a01d3b1546657d74
> ["efi: honour memory reservations passed via a linux specific config
> table"]), we hit a panic while trying to boot the kdump kernel on
> machines which have large number of CPUs.
> 
> I have a arm64 board which has 224 CPUS:
> # lscpu
> <..snip..>
> CPU(s):              224
> On-line CPU(s) list: 0-223
> <..snip..>
> 
> Here are the crash logs in the kdump kernel on this machine:
> 
> [    0.000000] Unable to handle kernel paging request at virtual
> address ffff80003ffe0000
> val____)nt EL), IL ata abort info:
> [    0.or: Oops: 960000inted 4.18.0+ #3
> [    0.000000] pstate: 20400089 (nzCv daIf +PAN -UAO)
> [    0.000000] pc : __memcpy+0x110/0x180
> [    0.000000] lr : memblock_double_array+0x240/0x348
> [    0.000000] sp : ffff0000092efc80 x28: 00000000bffe0000
> [    0.000000] x27: 0000000000001800 x26: ffff000009d59000
> [    0.000000] x25: ffff80003ffe0000 x24: 0000000000000000
> [    0.000000] x23: 0000000000010000 x22: ffff000009d594e8
> [    0.000000] x21: ffff000009d594f4 x20: ffff0000093c7268
> [    0.000000] x19: 0000000000000c00 x18: 0000000000000010
> [    0.000000] x17: 0000000000000000 x16: 0000000000000000
> [    0.000000] x15: ffffffffffffffff3: 0000000fc18d0000 x12: 0000000800000000
> [    0.000000] x11: 0000000000000018 x10: 00000000ddab9e18
> [    0.000000] x9 : 0000000800000000 x8 : 00000000000002c1
> [    0.000000] x7 : 0000000091b90000 x6 : ffff80003ffe0000
> [    0.000000] x5 : 0000000000000001 x4 : 0000000000000000
> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000b80
> [    0.000000] x1 : ffff000009d59540 x0 : ffff80003ffe0000
> [    0.000000] Process swapper)
> [    0.000000] Call trace:
> [    0.000000]  __memcpy+0x110/0x180
> [    0.000000]  memblock_add_range+0x134/0x2e8
> [    0.000000]  memblock_reserve+0x70/0xb8
> [    0.000000]  memblock_alloc_base_nid+0x6c/0x88
> [    0.000000]  __memblock_alloc_base+0x3c/0x4c
> [    0.000000]  memblock_alloc_base+0x28/0x4c
> [    0.000000]  memblock_alloc+0x2c/0x38
> [    0.000000]  early_pgtable_alloc+0x20/0xb0

Hmm, so this seems to be the crux of the issue: early_pgtable_alloc() relies
on memblock to allocate page-table memory, but this can be called before the
linear mapping is up and running (or even as part of creating the linear
mapping itself!) so the use of __va in memblock_double_array() actually
returns an unmapped address.

So I guess we either need to implement early_pgtable_alloc() some other way
(how?) or get memblock_double_array() to use a fixmap if it's called too
early (yuck). Alternatively, would it be possible to postpone processing of
the EFI mem_reserve entries until after we've created the linear mapping?

Will

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug Report] kdump crashes after latest EFI memblock changes on arm64 machines with large number of CPUs
  2018-11-06  1:30     ` Will Deacon
  (?)
@ 2018-11-06  8:44       ` Ard Biesheuvel
  -1 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2018-11-06  8:44 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mark Rutland, linux-efi, Bhupesh Sharma, kexec mailing list,
	Bhupesh SHARMA, linux-arm-kernel

On 6 November 2018 at 02:30, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
> On Fri, Nov 02, 2018 at 02:44:10AM +0530, Bhupesh Sharma wrote:
>> With the latest EFI changes for memblock reservation across kdump
>> kernel from Ard (Commit 71e0940d52e107748b270213a01d3b1546657d74
>> ["efi: honour memory reservations passed via a linux specific config
>> table"]), we hit a panic while trying to boot the kdump kernel on
>> machines which have large number of CPUs.
>>
>> I have a arm64 board which has 224 CPUS:
>> # lscpu
>> <..snip..>
>> CPU(s):              224
>> On-line CPU(s) list: 0-223
>> <..snip..>
>>
>> Here are the crash logs in the kdump kernel on this machine:
>>
>> [    0.000000] Unable to handle kernel paging request at virtual
>> address ffff80003ffe0000
>> val____)nt EL), IL ata abort info:
>> [    0.or: Oops: 960000inted 4.18.0+ #3
>> [    0.000000] pstate: 20400089 (nzCv daIf +PAN -UAO)
>> [    0.000000] pc : __memcpy+0x110/0x180
>> [    0.000000] lr : memblock_double_array+0x240/0x348
>> [    0.000000] sp : ffff0000092efc80 x28: 00000000bffe0000
>> [    0.000000] x27: 0000000000001800 x26: ffff000009d59000
>> [    0.000000] x25: ffff80003ffe0000 x24: 0000000000000000
>> [    0.000000] x23: 0000000000010000 x22: ffff000009d594e8
>> [    0.000000] x21: ffff000009d594f4 x20: ffff0000093c7268
>> [    0.000000] x19: 0000000000000c00 x18: 0000000000000010
>> [    0.000000] x17: 0000000000000000 x16: 0000000000000000
>> [    0.000000] x15: ffffffffffffffff3: 0000000fc18d0000 x12: 0000000800000000
>> [    0.000000] x11: 0000000000000018 x10: 00000000ddab9e18
>> [    0.000000] x9 : 0000000800000000 x8 : 00000000000002c1
>> [    0.000000] x7 : 0000000091b90000 x6 : ffff80003ffe0000
>> [    0.000000] x5 : 0000000000000001 x4 : 0000000000000000
>> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000b80
>> [    0.000000] x1 : ffff000009d59540 x0 : ffff80003ffe0000
>> [    0.000000] Process swapper)
>> [    0.000000] Call trace:
>> [    0.000000]  __memcpy+0x110/0x180
>> [    0.000000]  memblock_add_range+0x134/0x2e8
>> [    0.000000]  memblock_reserve+0x70/0xb8
>> [    0.000000]  memblock_alloc_base_nid+0x6c/0x88
>> [    0.000000]  __memblock_alloc_base+0x3c/0x4c
>> [    0.000000]  memblock_alloc_base+0x28/0x4c
>> [    0.000000]  memblock_alloc+0x2c/0x38
>> [    0.000000]  early_pgtable_alloc+0x20/0xb0
>
> Hmm, so this seems to be the crux of the issue: early_pgtable_alloc() relies
> on memblock to allocate page-table memory, but this can be called before the
> linear mapping is up and running (or even as part of creating the linear
> mapping itself!) so the use of __va in memblock_double_array() actually
> returns an unmapped address.
>

OK, so this means we are calling memblock_allow_resize() too early in any case

> So I guess we either need to implement early_pgtable_alloc() some other way
> (how?) or get memblock_double_array() to use a fixmap if it's called too
> early (yuck). Alternatively, would it be possible to postpone processing of
> the EFI mem_reserve entries until after we've created the linear mapping?
>

We could move this until after paging_init(), I suppose. I'll cook something up.

Bhupesh: any comments?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug Report] kdump crashes after latest EFI memblock changes on arm64 machines with large number of CPUs
@ 2018-11-06  8:44       ` Ard Biesheuvel
  0 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2018-11-06  8:44 UTC (permalink / raw)
  To: linux-arm-kernel

On 6 November 2018 at 02:30, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Nov 02, 2018 at 02:44:10AM +0530, Bhupesh Sharma wrote:
>> With the latest EFI changes for memblock reservation across kdump
>> kernel from Ard (Commit 71e0940d52e107748b270213a01d3b1546657d74
>> ["efi: honour memory reservations passed via a linux specific config
>> table"]), we hit a panic while trying to boot the kdump kernel on
>> machines which have large number of CPUs.
>>
>> I have a arm64 board which has 224 CPUS:
>> # lscpu
>> <..snip..>
>> CPU(s):              224
>> On-line CPU(s) list: 0-223
>> <..snip..>
>>
>> Here are the crash logs in the kdump kernel on this machine:
>>
>> [    0.000000] Unable to handle kernel paging request at virtual
>> address ffff80003ffe0000
>> val____)nt EL), IL ata abort info:
>> [    0.or: Oops: 960000inted 4.18.0+ #3
>> [    0.000000] pstate: 20400089 (nzCv daIf +PAN -UAO)
>> [    0.000000] pc : __memcpy+0x110/0x180
>> [    0.000000] lr : memblock_double_array+0x240/0x348
>> [    0.000000] sp : ffff0000092efc80 x28: 00000000bffe0000
>> [    0.000000] x27: 0000000000001800 x26: ffff000009d59000
>> [    0.000000] x25: ffff80003ffe0000 x24: 0000000000000000
>> [    0.000000] x23: 0000000000010000 x22: ffff000009d594e8
>> [    0.000000] x21: ffff000009d594f4 x20: ffff0000093c7268
>> [    0.000000] x19: 0000000000000c00 x18: 0000000000000010
>> [    0.000000] x17: 0000000000000000 x16: 0000000000000000
>> [    0.000000] x15: ffffffffffffffff3: 0000000fc18d0000 x12: 0000000800000000
>> [    0.000000] x11: 0000000000000018 x10: 00000000ddab9e18
>> [    0.000000] x9 : 0000000800000000 x8 : 00000000000002c1
>> [    0.000000] x7 : 0000000091b90000 x6 : ffff80003ffe0000
>> [    0.000000] x5 : 0000000000000001 x4 : 0000000000000000
>> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000b80
>> [    0.000000] x1 : ffff000009d59540 x0 : ffff80003ffe0000
>> [    0.000000] Process swapper)
>> [    0.000000] Call trace:
>> [    0.000000]  __memcpy+0x110/0x180
>> [    0.000000]  memblock_add_range+0x134/0x2e8
>> [    0.000000]  memblock_reserve+0x70/0xb8
>> [    0.000000]  memblock_alloc_base_nid+0x6c/0x88
>> [    0.000000]  __memblock_alloc_base+0x3c/0x4c
>> [    0.000000]  memblock_alloc_base+0x28/0x4c
>> [    0.000000]  memblock_alloc+0x2c/0x38
>> [    0.000000]  early_pgtable_alloc+0x20/0xb0
>
> Hmm, so this seems to be the crux of the issue: early_pgtable_alloc() relies
> on memblock to allocate page-table memory, but this can be called before the
> linear mapping is up and running (or even as part of creating the linear
> mapping itself!) so the use of __va in memblock_double_array() actually
> returns an unmapped address.
>

OK, so this means we are calling memblock_allow_resize() too early in any case

> So I guess we either need to implement early_pgtable_alloc() some other way
> (how?) or get memblock_double_array() to use a fixmap if it's called too
> early (yuck). Alternatively, would it be possible to postpone processing of
> the EFI mem_reserve entries until after we've created the linear mapping?
>

We could move this until after paging_init(), I suppose. I'll cook something up.

Bhupesh: any comments?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug Report] kdump crashes after latest EFI memblock changes on arm64 machines with large number of CPUs
@ 2018-11-06  8:44       ` Ard Biesheuvel
  0 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2018-11-06  8:44 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mark Rutland, linux-efi, Bhupesh Sharma, kexec mailing list,
	Bhupesh SHARMA, linux-arm-kernel

On 6 November 2018 at 02:30, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Nov 02, 2018 at 02:44:10AM +0530, Bhupesh Sharma wrote:
>> With the latest EFI changes for memblock reservation across kdump
>> kernel from Ard (Commit 71e0940d52e107748b270213a01d3b1546657d74
>> ["efi: honour memory reservations passed via a linux specific config
>> table"]), we hit a panic while trying to boot the kdump kernel on
>> machines which have large number of CPUs.
>>
>> I have a arm64 board which has 224 CPUS:
>> # lscpu
>> <..snip..>
>> CPU(s):              224
>> On-line CPU(s) list: 0-223
>> <..snip..>
>>
>> Here are the crash logs in the kdump kernel on this machine:
>>
>> [    0.000000] Unable to handle kernel paging request at virtual
>> address ffff80003ffe0000
>> val____)nt EL), IL ata abort info:
>> [    0.or: Oops: 960000inted 4.18.0+ #3
>> [    0.000000] pstate: 20400089 (nzCv daIf +PAN -UAO)
>> [    0.000000] pc : __memcpy+0x110/0x180
>> [    0.000000] lr : memblock_double_array+0x240/0x348
>> [    0.000000] sp : ffff0000092efc80 x28: 00000000bffe0000
>> [    0.000000] x27: 0000000000001800 x26: ffff000009d59000
>> [    0.000000] x25: ffff80003ffe0000 x24: 0000000000000000
>> [    0.000000] x23: 0000000000010000 x22: ffff000009d594e8
>> [    0.000000] x21: ffff000009d594f4 x20: ffff0000093c7268
>> [    0.000000] x19: 0000000000000c00 x18: 0000000000000010
>> [    0.000000] x17: 0000000000000000 x16: 0000000000000000
>> [    0.000000] x15: ffffffffffffffff3: 0000000fc18d0000 x12: 0000000800000000
>> [    0.000000] x11: 0000000000000018 x10: 00000000ddab9e18
>> [    0.000000] x9 : 0000000800000000 x8 : 00000000000002c1
>> [    0.000000] x7 : 0000000091b90000 x6 : ffff80003ffe0000
>> [    0.000000] x5 : 0000000000000001 x4 : 0000000000000000
>> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000b80
>> [    0.000000] x1 : ffff000009d59540 x0 : ffff80003ffe0000
>> [    0.000000] Process swapper)
>> [    0.000000] Call trace:
>> [    0.000000]  __memcpy+0x110/0x180
>> [    0.000000]  memblock_add_range+0x134/0x2e8
>> [    0.000000]  memblock_reserve+0x70/0xb8
>> [    0.000000]  memblock_alloc_base_nid+0x6c/0x88
>> [    0.000000]  __memblock_alloc_base+0x3c/0x4c
>> [    0.000000]  memblock_alloc_base+0x28/0x4c
>> [    0.000000]  memblock_alloc+0x2c/0x38
>> [    0.000000]  early_pgtable_alloc+0x20/0xb0
>
> Hmm, so this seems to be the crux of the issue: early_pgtable_alloc() relies
> on memblock to allocate page-table memory, but this can be called before the
> linear mapping is up and running (or even as part of creating the linear
> mapping itself!) so the use of __va in memblock_double_array() actually
> returns an unmapped address.
>

OK, so this means we are calling memblock_allow_resize() too early in any case

> So I guess we either need to implement early_pgtable_alloc() some other way
> (how?) or get memblock_double_array() to use a fixmap if it's called too
> early (yuck). Alternatively, would it be possible to postpone processing of
> the EFI mem_reserve entries until after we've created the linear mapping?
>

We could move this until after paging_init(), I suppose. I'll cook something up.

Bhupesh: any comments?

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug Report] kdump crashes after latest EFI memblock changes on arm64 machines with large number of CPUs
  2018-11-06  8:44       ` Ard Biesheuvel
  (?)
@ 2018-11-08  5:58         ` Bhupesh Sharma
  -1 siblings, 0 replies; 18+ messages in thread
From: Bhupesh Sharma @ 2018-11-08  5:58 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Mark Rutland, linux-efi, Will Deacon, kexec mailing list,
	Bhupesh SHARMA, linux-arm-kernel

Hi All,

I am sorry for the delay. I was away for my Diwali holidays and came
back to office today.

On Tue, Nov 6, 2018 at 2:14 PM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>
> On 6 November 2018 at 02:30, Will Deacon <will.deacon@arm.com> wrote:
> > On Fri, Nov 02, 2018 at 02:44:10AM +0530, Bhupesh Sharma wrote:
> >> With the latest EFI changes for memblock reservation across kdump
> >> kernel from Ard (Commit 71e0940d52e107748b270213a01d3b1546657d74
> >> ["efi: honour memory reservations passed via a linux specific config
> >> table"]), we hit a panic while trying to boot the kdump kernel on
> >> machines which have large number of CPUs.
> >>
> >> I have a arm64 board which has 224 CPUS:
> >> # lscpu
> >> <..snip..>
> >> CPU(s):              224
> >> On-line CPU(s) list: 0-223
> >> <..snip..>
> >>
> >> Here are the crash logs in the kdump kernel on this machine:
> >>
> >> [    0.000000] Unable to handle kernel paging request at virtual
> >> address ffff80003ffe0000
> >> val____)nt EL), IL ata abort info:
> >> [    0.or: Oops: 960000inted 4.18.0+ #3
> >> [    0.000000] pstate: 20400089 (nzCv daIf +PAN -UAO)
> >> [    0.000000] pc : __memcpy+0x110/0x180
> >> [    0.000000] lr : memblock_double_array+0x240/0x348
> >> [    0.000000] sp : ffff0000092efc80 x28: 00000000bffe0000
> >> [    0.000000] x27: 0000000000001800 x26: ffff000009d59000
> >> [    0.000000] x25: ffff80003ffe0000 x24: 0000000000000000
> >> [    0.000000] x23: 0000000000010000 x22: ffff000009d594e8
> >> [    0.000000] x21: ffff000009d594f4 x20: ffff0000093c7268
> >> [    0.000000] x19: 0000000000000c00 x18: 0000000000000010
> >> [    0.000000] x17: 0000000000000000 x16: 0000000000000000
> >> [    0.000000] x15: ffffffffffffffff3: 0000000fc18d0000 x12: 0000000800000000
> >> [    0.000000] x11: 0000000000000018 x10: 00000000ddab9e18
> >> [    0.000000] x9 : 0000000800000000 x8 : 00000000000002c1
> >> [    0.000000] x7 : 0000000091b90000 x6 : ffff80003ffe0000
> >> [    0.000000] x5 : 0000000000000001 x4 : 0000000000000000
> >> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000b80
> >> [    0.000000] x1 : ffff000009d59540 x0 : ffff80003ffe0000
> >> [    0.000000] Process swapper)
> >> [    0.000000] Call trace:
> >> [    0.000000]  __memcpy+0x110/0x180
> >> [    0.000000]  memblock_add_range+0x134/0x2e8
> >> [    0.000000]  memblock_reserve+0x70/0xb8
> >> [    0.000000]  memblock_alloc_base_nid+0x6c/0x88
> >> [    0.000000]  __memblock_alloc_base+0x3c/0x4c
> >> [    0.000000]  memblock_alloc_base+0x28/0x4c
> >> [    0.000000]  memblock_alloc+0x2c/0x38
> >> [    0.000000]  early_pgtable_alloc+0x20/0xb0
> >
> > Hmm, so this seems to be the crux of the issue: early_pgtable_alloc() relies
> > on memblock to allocate page-table memory, but this can be called before the
> > linear mapping is up and running (or even as part of creating the linear
> > mapping itself!) so the use of __va in memblock_double_array() actually
> > returns an unmapped address.
> >
>
> OK, so this means we are calling memblock_allow_resize() too early in any case
>
> > So I guess we either need to implement early_pgtable_alloc() some other way
> > (how?) or get memblock_double_array() to use a fixmap if it's called too
> > early (yuck). Alternatively, would it be possible to postpone processing of
> > the EFI mem_reserve entries until after we've created the linear mapping?
> >
>
> We could move this until after paging_init(), I suppose. I'll cook something up.
>
> Bhupesh: any comments?

Since Ard has already shared a patchset which seems to fix this issue
[Thanks Ard :)], and my approach is still hackish, I will try to
verify his v2 patchset on the system I was having issues with and get
back with my results.

Thanks for all the help.

Regards,
Bhupesh

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug Report] kdump crashes after latest EFI memblock changes on arm64 machines with large number of CPUs
@ 2018-11-08  5:58         ` Bhupesh Sharma
  0 siblings, 0 replies; 18+ messages in thread
From: Bhupesh Sharma @ 2018-11-08  5:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hi All,

I am sorry for the delay. I was away for my Diwali holidays and came
back to office today.

On Tue, Nov 6, 2018 at 2:14 PM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>
> On 6 November 2018 at 02:30, Will Deacon <will.deacon@arm.com> wrote:
> > On Fri, Nov 02, 2018 at 02:44:10AM +0530, Bhupesh Sharma wrote:
> >> With the latest EFI changes for memblock reservation across kdump
> >> kernel from Ard (Commit 71e0940d52e107748b270213a01d3b1546657d74
> >> ["efi: honour memory reservations passed via a linux specific config
> >> table"]), we hit a panic while trying to boot the kdump kernel on
> >> machines which have large number of CPUs.
> >>
> >> I have a arm64 board which has 224 CPUS:
> >> # lscpu
> >> <..snip..>
> >> CPU(s):              224
> >> On-line CPU(s) list: 0-223
> >> <..snip..>
> >>
> >> Here are the crash logs in the kdump kernel on this machine:
> >>
> >> [    0.000000] Unable to handle kernel paging request at virtual
> >> address ffff80003ffe0000
> >> val____)nt EL), IL ata abort info:
> >> [    0.or: Oops: 960000inted 4.18.0+ #3
> >> [    0.000000] pstate: 20400089 (nzCv daIf +PAN -UAO)
> >> [    0.000000] pc : __memcpy+0x110/0x180
> >> [    0.000000] lr : memblock_double_array+0x240/0x348
> >> [    0.000000] sp : ffff0000092efc80 x28: 00000000bffe0000
> >> [    0.000000] x27: 0000000000001800 x26: ffff000009d59000
> >> [    0.000000] x25: ffff80003ffe0000 x24: 0000000000000000
> >> [    0.000000] x23: 0000000000010000 x22: ffff000009d594e8
> >> [    0.000000] x21: ffff000009d594f4 x20: ffff0000093c7268
> >> [    0.000000] x19: 0000000000000c00 x18: 0000000000000010
> >> [    0.000000] x17: 0000000000000000 x16: 0000000000000000
> >> [    0.000000] x15: ffffffffffffffff3: 0000000fc18d0000 x12: 0000000800000000
> >> [    0.000000] x11: 0000000000000018 x10: 00000000ddab9e18
> >> [    0.000000] x9 : 0000000800000000 x8 : 00000000000002c1
> >> [    0.000000] x7 : 0000000091b90000 x6 : ffff80003ffe0000
> >> [    0.000000] x5 : 0000000000000001 x4 : 0000000000000000
> >> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000b80
> >> [    0.000000] x1 : ffff000009d59540 x0 : ffff80003ffe0000
> >> [    0.000000] Process swapper)
> >> [    0.000000] Call trace:
> >> [    0.000000]  __memcpy+0x110/0x180
> >> [    0.000000]  memblock_add_range+0x134/0x2e8
> >> [    0.000000]  memblock_reserve+0x70/0xb8
> >> [    0.000000]  memblock_alloc_base_nid+0x6c/0x88
> >> [    0.000000]  __memblock_alloc_base+0x3c/0x4c
> >> [    0.000000]  memblock_alloc_base+0x28/0x4c
> >> [    0.000000]  memblock_alloc+0x2c/0x38
> >> [    0.000000]  early_pgtable_alloc+0x20/0xb0
> >
> > Hmm, so this seems to be the crux of the issue: early_pgtable_alloc() relies
> > on memblock to allocate page-table memory, but this can be called before the
> > linear mapping is up and running (or even as part of creating the linear
> > mapping itself!) so the use of __va in memblock_double_array() actually
> > returns an unmapped address.
> >
>
> OK, so this means we are calling memblock_allow_resize() too early in any case
>
> > So I guess we either need to implement early_pgtable_alloc() some other way
> > (how?) or get memblock_double_array() to use a fixmap if it's called too
> > early (yuck). Alternatively, would it be possible to postpone processing of
> > the EFI mem_reserve entries until after we've created the linear mapping?
> >
>
> We could move this until after paging_init(), I suppose. I'll cook something up.
>
> Bhupesh: any comments?

Since Ard has already shared a patchset which seems to fix this issue
[Thanks Ard :)], and my approach is still hackish, I will try to
verify his v2 patchset on the system I was having issues with and get
back with my results.

Thanks for all the help.

Regards,
Bhupesh

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Bug Report] kdump crashes after latest EFI memblock changes on arm64 machines with large number of CPUs
@ 2018-11-08  5:58         ` Bhupesh Sharma
  0 siblings, 0 replies; 18+ messages in thread
From: Bhupesh Sharma @ 2018-11-08  5:58 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Mark Rutland, linux-efi, Will Deacon, kexec mailing list,
	Bhupesh SHARMA, linux-arm-kernel

Hi All,

I am sorry for the delay. I was away for my Diwali holidays and came
back to office today.

On Tue, Nov 6, 2018 at 2:14 PM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>
> On 6 November 2018 at 02:30, Will Deacon <will.deacon@arm.com> wrote:
> > On Fri, Nov 02, 2018 at 02:44:10AM +0530, Bhupesh Sharma wrote:
> >> With the latest EFI changes for memblock reservation across kdump
> >> kernel from Ard (Commit 71e0940d52e107748b270213a01d3b1546657d74
> >> ["efi: honour memory reservations passed via a linux specific config
> >> table"]), we hit a panic while trying to boot the kdump kernel on
> >> machines which have large number of CPUs.
> >>
> >> I have a arm64 board which has 224 CPUS:
> >> # lscpu
> >> <..snip..>
> >> CPU(s):              224
> >> On-line CPU(s) list: 0-223
> >> <..snip..>
> >>
> >> Here are the crash logs in the kdump kernel on this machine:
> >>
> >> [    0.000000] Unable to handle kernel paging request at virtual
> >> address ffff80003ffe0000
> >> val____)nt EL), IL ata abort info:
> >> [    0.or: Oops: 960000inted 4.18.0+ #3
> >> [    0.000000] pstate: 20400089 (nzCv daIf +PAN -UAO)
> >> [    0.000000] pc : __memcpy+0x110/0x180
> >> [    0.000000] lr : memblock_double_array+0x240/0x348
> >> [    0.000000] sp : ffff0000092efc80 x28: 00000000bffe0000
> >> [    0.000000] x27: 0000000000001800 x26: ffff000009d59000
> >> [    0.000000] x25: ffff80003ffe0000 x24: 0000000000000000
> >> [    0.000000] x23: 0000000000010000 x22: ffff000009d594e8
> >> [    0.000000] x21: ffff000009d594f4 x20: ffff0000093c7268
> >> [    0.000000] x19: 0000000000000c00 x18: 0000000000000010
> >> [    0.000000] x17: 0000000000000000 x16: 0000000000000000
> >> [    0.000000] x15: ffffffffffffffff3: 0000000fc18d0000 x12: 0000000800000000
> >> [    0.000000] x11: 0000000000000018 x10: 00000000ddab9e18
> >> [    0.000000] x9 : 0000000800000000 x8 : 00000000000002c1
> >> [    0.000000] x7 : 0000000091b90000 x6 : ffff80003ffe0000
> >> [    0.000000] x5 : 0000000000000001 x4 : 0000000000000000
> >> [    0.000000] x3 : 0000000000000000 x2 : 0000000000000b80
> >> [    0.000000] x1 : ffff000009d59540 x0 : ffff80003ffe0000
> >> [    0.000000] Process swapper)
> >> [    0.000000] Call trace:
> >> [    0.000000]  __memcpy+0x110/0x180
> >> [    0.000000]  memblock_add_range+0x134/0x2e8
> >> [    0.000000]  memblock_reserve+0x70/0xb8
> >> [    0.000000]  memblock_alloc_base_nid+0x6c/0x88
> >> [    0.000000]  __memblock_alloc_base+0x3c/0x4c
> >> [    0.000000]  memblock_alloc_base+0x28/0x4c
> >> [    0.000000]  memblock_alloc+0x2c/0x38
> >> [    0.000000]  early_pgtable_alloc+0x20/0xb0
> >
> > Hmm, so this seems to be the crux of the issue: early_pgtable_alloc() relies
> > on memblock to allocate page-table memory, but this can be called before the
> > linear mapping is up and running (or even as part of creating the linear
> > mapping itself!) so the use of __va in memblock_double_array() actually
> > returns an unmapped address.
> >
>
> OK, so this means we are calling memblock_allow_resize() too early in any case
>
> > So I guess we either need to implement early_pgtable_alloc() some other way
> > (how?) or get memblock_double_array() to use a fixmap if it's called too
> > early (yuck). Alternatively, would it be possible to postpone processing of
> > the EFI mem_reserve entries until after we've created the linear mapping?
> >
>
> We could move this until after paging_init(), I suppose. I'll cook something up.
>
> Bhupesh: any comments?

Since Ard has already shared a patchset which seems to fix this issue
[Thanks Ard :)], and my approach is still hackish, I will try to
verify his v2 patchset on the system I was having issues with and get
back with my results.

Thanks for all the help.

Regards,
Bhupesh

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-11-08  5:58 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-01 21:14 [Bug Report] kdump crashes after latest EFI memblock changes on arm64 machines with large number of CPUs Bhupesh Sharma
2018-11-01 21:14 ` Bhupesh Sharma
2018-11-01 21:14 ` Bhupesh Sharma
2018-11-05 11:11 ` Ard Biesheuvel
2018-11-05 11:11   ` Ard Biesheuvel
2018-11-05 11:11   ` Ard Biesheuvel
     [not found]   ` <CAKv+Gu_1PNiGbgRqd3_0k6CGzh-2P+iAmqk=oSSek69kBCnb8Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-11-05 11:31     ` Marc Zyngier
2018-11-05 11:31       ` Marc Zyngier
2018-11-05 11:31       ` Marc Zyngier
     [not found] ` <CACi5LpOyy0YkhEzWWqt8hAQfKku2Vzp5+da_dGS23JMyHReNew-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-11-06  1:30   ` Will Deacon
2018-11-06  1:30     ` Will Deacon
2018-11-06  1:30     ` Will Deacon
2018-11-06  8:44     ` Ard Biesheuvel
2018-11-06  8:44       ` Ard Biesheuvel
2018-11-06  8:44       ` Ard Biesheuvel
2018-11-08  5:58       ` Bhupesh Sharma
2018-11-08  5:58         ` Bhupesh Sharma
2018-11-08  5:58         ` Bhupesh Sharma

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.